Distributed Systems

The Durable Function Tree - Part 2

In part 1 we covered how durable function trees work mechanically and the importance of function suspension. Now let's zoom out and consider where they fit in broader system architecture, and ask what durable execution actually provides us.

Function Trees and Responsibility Boundaries

Durable function trees are great, but they aren’t the only kid in town. In fact, they’re like the new kid on the block, trying to prove themselves against other more established kids.

Earlier this year I wrote Coordinated Progress, a conceptual model exploring how event-driven architecture, stream processing, microservices and durable execution fit into architecture, within the context of multi-step business processes, aka, workflows. I also wrote about responsibility boundaries, exploring how multi-step work is made reliable inside and across boundaries. I’ll revisit that now, with this function tree model in mind.

The Durable Function Tree - Part 1

In my last post I wrote about why and where determinism is needed in durable execution (DE). In this post I'm going to explore how workflows can be formed from trees of durable function calls based on durable promises and continuations. 

Here's how I'll approach this:

  • Part 1

    • Building blocks: Start with promises and continuations and how they work in traditional programming.

    • Making them durable: How promises and continuations are made durable.

    • The durable function tree: How these pieces combine to create hierarchical workflows with nested fault boundaries.

    • Function trees in practice: A look at Temporal, Restate, Resonate and DBOS.

  • Part 2

    • Responsibility boundaries: How function trees fit into my Coordinated Progress model and responsibility boundaries

    • Value-add: What value does durable execution actually provide?

    • Architecture discussion: Where function trees sit alongside event-driven choreography, and when to use each.

Demystifying Determinism in Durable Execution

Determinism is a key concept to understand when writing code using durable execution frameworks such as Temporal, Restate, DBOS, and Resonate. If you read the docs you see that some parts of your code must be deterministic while other parts do not have to be.  This can be confusing to a developer new to these frameworks. 

This post explains why determinism is important and where it is needed and where it is not. Hopefully, you’ll have a better mental model that makes things less confusing.

A Fork in the Road: Deciding Kafka’s Diskless Future

The Kafka community is currently seeing an unprecedented situation with three KIPs (KIP-1150, KIP-1176, KIP-1183) simultaneously addressing the same challenge of high replication costs when running Kafka across multiple cloud availability zones.” — Luke Chen, The Path Forward for Saving Cross-AZ Replication Costs KIPs

At the time of writing the Kafka project finds itself at a fork in the road where choosing the right path forward for implementing S3 topics has implications for the long-term success of the project. Not just the next couple of years, but the next decade. Open-source projects live and die by these big decisions and as a community, we need to make sure we take the right one.

This post explains the competing KIPs, but goes further and asks bigger questions about the future direction of Kafka.

Understanding Apache Fluss

This is a data system internals blog post. So if you enjoyed my table formats internals blog posts, or writing on Apache Kafka internals or Apache BookKeeper internals, you might enjoy this one. But beware, it’s long and detailed. Also note that I work for Confluent, which also runs Apache Flink but does not run nor contributes to Apache Fluss. However, this post aims to be a faithful and objective description of Fluss.

Apache Fluss is a table storage engine for Flink being developed by Alibaba in collaboration with Ververica. To write this blog post, I reverse engineered a high level architecture by reading the Fluss code from the main branch (and running tests), in August 2025. This follows my same approach to my writing about Kafka, Pulsar, BookKeeper, and the table formats (Iceberg, Delta, Hudi and Paimon) as the code is always the true source of information. Unlike the rest, I have not had time to formally verify Fluss in TLA+ or Fizzbee, though I did not notice any obvious issues that are not already logged in a GitHub issue.

Let’s get started. We’ll start with some high level discussion in the Fluss Overview section, then get into the internals in the Fluss Cluster Core Architecture and Fluss Lakehouse Architecture sections.

A Conceptual Model for Storage Unification

Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and formats as one coherent resource. Not one storage system and storage format to rule them all, but virtualizing them into a single logical view. 

The primary use case for this unification is stitching real-time and historical data together under one abstraction. We see such unification in various data systems:

  • Tiered storage in event streaming systems such as Apache Kafka and Pulsar

  • HTAP databases such as SingleStore and TiDB

  • Real-time analytics databases such as Apache Pinot, Druid and Clickhouse

The next frontier in this unification are lakehouses, where real-time data is combined with historical lakehouse data. Over time we will see greater and greater lakehouse integration with lower latency data systems.

In this post, I create a high-level conceptual framework for understanding the different building blocks that data systems can use for storage unification, and what kinds of trade-offs are involved. I’ll cover seven key considerations when evaluating design approaches. I’m doing this because I want to talk in the future about how different real-world systems do storage unification and I want to use a common set of terms that I will define in this post.

Responsibility Boundaries in the Coordinated Progress model

Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries. Where a reliable trigger exists, a new boundary is created where that trigger becomes responsible for ensuring the eventual execution of the sub-graph of work downstream of it. The boundaries can even layer and nest, especially in orchestrated systems that overlay finer-grained boundaries.

Coordinated Progress – Part 4 – A Loose Decision Framework

Microservices, functions, stream processors and AI agents represent nodes in our graph. An incoming edge represents a trigger of work in the node, and the node must do the work reliably. I have been using the term reliable progress but I might have used durable execution if it hadn’t already been used to define a specific type of tool.

Coordinated Progress – Part 3 – Coupling, Synchrony and Complexity

In part 2, we built a mental framework using a graph of nodes and edges to represent distributed work. Workflows are subgraphs coordinated via choreography or orchestration. Reliability, in this model, means reliable progress: the result of reliable triggers and progressable work.

In part 3 we refine this graph model in terms of different types of coupling between nodes, and how edges can be synchronous or asynchronous. Let’s set the scene with an example, then dissect that example with the concepts of coupling and communication styles.

Coordinated Progress – Part 2 – Making Progress Reliable

In part 1, we described distributed computation as a graph and constrained the graph for this analysis to microservices, functions, stream processing jobs and AI Agents as nodes, and RPC, queues, and topics as the edges. 

Within our definition of The Graph, a node might be a function (FaaS or microservice), a stream processing job, an AI Agent, or some kind of third-party service. An edge might be an RPC channel, a queue or a topic.

For a workflow to be reliable, it must be able to make progress despite failures and other adverse conditions. Progress typically depends on durability at the node and edge levels.