Distributed Systems

Can We Agree on a Storage/Workload Architecture Taxonomy?

The lines between transactional systems, analytical systems, hybrid systems, and shared storage architectures are getting blurry. This post proposes a small taxonomy for describing the different ways systems, workloads, storage tiers, visibility, and durable copies relate to each other.

OLTP, OLAP, HTAP, and now LTAP.

We can think of the first two as two types of workload which have specialized query engines and storage systems to support them. OLTP such as the RDBMS like Postgres and MySQL use row-based storage engines. OLAP, such as Clickhouse, cloud data warehouse and the lakehouse use column-based storage.

HTAP is a hybrid workload system: one system -> both transactional and analytical workloads. The HTAP system therefore has specialized storage and specialized query engine to stitch together the row-based and columnar data.

So far, we’re dealing with a single system. A Postgres (OLTP), a Clickhouse (OLAP), a SingleStore or TiDB (HTAP).

So what is LTAP?

The Three Durable Function Forms

Durable execution engines (DEEs) talk about “workflows”, “activities”, “virtual objects”, “handlers”, and “functions”, but they’re often describing the same underlying execution patterns. This post proposes a model that extends the generic durable function into three forms: stateless functions, sessions, and actors. This complements my previous posts (on determinism and durable function trees) in this series I dub “The Theory of Durable Execution”.

I’ll cover this in three parts:

  1. The behavior-state continuum

  2. The three durable function forms and associated properties

  3. Mapping the DE frameworks to these forms

The Durable Function Tree - Part 2

In part 1 we covered how durable function trees work mechanically and the importance of function suspension. Now let's zoom out and consider where they fit in broader system architecture, and ask what durable execution actually provides us.

Function Trees and Responsibility Boundaries

Durable function trees are great, but they aren’t the only kid in town. In fact, they’re like the new kid on the block, trying to prove themselves against other more established kids.

Earlier this year I wrote Coordinated Progress, a conceptual model exploring how event-driven architecture, stream processing, microservices and durable execution fit into architecture, within the context of multi-step business processes, aka, workflows. I also wrote about responsibility boundaries, exploring how multi-step work is made reliable inside and across boundaries. I’ll revisit that now, with this function tree model in mind.

The Durable Function Tree - Part 1

In my last post I wrote about why and where determinism is needed in durable execution (DE). In this post I'm going to explore how workflows can be formed from trees of durable function calls based on durable promises and continuations. 

Here's how I'll approach this:

  • Part 1

    • Building blocks: Start with promises and continuations and how they work in traditional programming.

    • Making them durable: How promises and continuations are made durable.

    • The durable function tree: How these pieces combine to create hierarchical workflows with nested fault boundaries.

    • Function trees in practice: A look at Temporal, Restate, Resonate and DBOS.

  • Part 2

    • Responsibility boundaries: How function trees fit into my Coordinated Progress model and responsibility boundaries

    • Value-add: What value does durable execution actually provide?

    • Architecture discussion: Where function trees sit alongside event-driven choreography, and when to use each.

Demystifying Determinism in Durable Execution

Determinism is a key concept to understand when writing code using durable execution frameworks such as Temporal, Restate, DBOS, and Resonate. If you read the docs you see that some parts of your code must be deterministic while other parts do not have to be.  This can be confusing to a developer new to these frameworks. 

This post explains why determinism is important and where it is needed and where it is not. Hopefully, you’ll have a better mental model that makes things less confusing.

A Fork in the Road: Deciding Kafka’s Diskless Future

The Kafka community is currently seeing an unprecedented situation with three KIPs (KIP-1150, KIP-1176, KIP-1183) simultaneously addressing the same challenge of high replication costs when running Kafka across multiple cloud availability zones.” — Luke Chen, The Path Forward for Saving Cross-AZ Replication Costs KIPs

At the time of writing the Kafka project finds itself at a fork in the road where choosing the right path forward for implementing S3 topics has implications for the long-term success of the project. Not just the next couple of years, but the next decade. Open-source projects live and die by these big decisions and as a community, we need to make sure we take the right one.

This post explains the competing KIPs, but goes further and asks bigger questions about the future direction of Kafka.

Understanding Apache Fluss

This is a data system internals blog post. So if you enjoyed my table formats internals blog posts, or writing on Apache Kafka internals or Apache BookKeeper internals, you might enjoy this one. But beware, it’s long and detailed. Also note that I work for Confluent, which also runs Apache Flink but does not run nor contributes to Apache Fluss. However, this post aims to be a faithful and objective description of Fluss.

Apache Fluss is a table storage engine for Flink being developed by Alibaba in collaboration with Ververica. To write this blog post, I reverse engineered a high level architecture by reading the Fluss code from the main branch (and running tests), in August 2025. This follows my same approach to my writing about Kafka, Pulsar, BookKeeper, and the table formats (Iceberg, Delta, Hudi and Paimon) as the code is always the true source of information. Unlike the rest, I have not had time to formally verify Fluss in TLA+ or Fizzbee, though I did not notice any obvious issues that are not already logged in a GitHub issue.

Let’s get started. We’ll start with some high level discussion in the Fluss Overview section, then get into the internals in the Fluss Cluster Core Architecture and Fluss Lakehouse Architecture sections.

A Conceptual Model for Storage Unification

Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and formats as one coherent resource. Not one storage system and storage format to rule them all, but virtualizing them into a single logical view. 

The primary use case for this unification is stitching real-time and historical data together under one abstraction. We see such unification in various data systems:

  • Tiered storage in event streaming systems such as Apache Kafka and Pulsar

  • HTAP databases such as SingleStore and TiDB

  • Real-time analytics databases such as Apache Pinot, Druid and Clickhouse

The next frontier in this unification are lakehouses, where real-time data is combined with historical lakehouse data. Over time we will see greater and greater lakehouse integration with lower latency data systems.

In this post, I create a high-level conceptual framework for understanding the different building blocks that data systems can use for storage unification, and what kinds of trade-offs are involved. I’ll cover seven key considerations when evaluating design approaches. I’m doing this because I want to talk in the future about how different real-world systems do storage unification and I want to use a common set of terms that I will define in this post.

Responsibility Boundaries in the Coordinated Progress model

Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries. Where a reliable trigger exists, a new boundary is created where that trigger becomes responsible for ensuring the eventual execution of the sub-graph of work downstream of it. The boundaries can even layer and nest, especially in orchestrated systems that overlay finer-grained boundaries.

Coordinated Progress – Part 4 – A Loose Decision Framework

Microservices, functions, stream processors and AI agents represent nodes in our graph. An incoming edge represents a trigger of work in the node, and the node must do the work reliably. I have been using the term reliable progress but I might have used durable execution if it hadn’t already been used to define a specific type of tool.