December 10, 2025

The Three Durable Function Forms

December 10, 2025

Durable execution engines (DEEs) talk about “workflows”, “activities”, “virtual objects”, “handlers”, and “functions”, but they’re often describing the same underlying execution patterns. This post proposes a model that extends the generic durable function into three forms: stateless functions, sessions, and actors. This complements my previous posts (on determinism and durable function trees) in this series I dub “The Theory of Durable Execution”.

I’ll cover this in three parts:

The behavior-state continuum
The three durable function forms and associated properties
Mapping the DE frameworks to these forms

Jack Vanlightly

December 4, 2025

Distributed Systems

The Durable Function Tree - Part 2

Jack Vanlightly

December 4, 2025

Distributed Systems

In part 1 we covered how durable function trees work mechanically and the importance of function suspension. Now let's zoom out and consider where they fit in broader system architecture, and ask what durable execution actually provides us.

Function Trees and Responsibility Boundaries

Durable function trees are great, but they aren’t the only kid in town. In fact, they’re like the new kid on the block, trying to prove themselves against other more established kids.

Earlier this year I wrote Coordinated Progress, a conceptual model exploring how event-driven architecture, stream processing, microservices and durable execution fit into architecture, within the context of multi-step business processes, aka, workflows. I also wrote about responsibility boundaries, exploring how multi-step work is made reliable inside and across boundaries. I’ll revisit that now, with this function tree model in mind.

Jack Vanlightly

December 4, 2025

Distributed Systems

The Durable Function Tree - Part 1

Jack Vanlightly

December 4, 2025

Distributed Systems

In my last post I wrote about why and where determinism is needed in durable execution (DE). In this post I'm going to explore how workflows can be formed from trees of durable function calls based on durable promises and continuations.

Here's how I'll approach this:

Part 1
- Building blocks: Start with promises and continuations and how they work in traditional programming.
- Making them durable: How promises and continuations are made durable.
- The durable function tree: How these pieces combine to create hierarchical workflows with nested fault boundaries.
- Function trees in practice: A look at Temporal, Restate, Resonate and DBOS.
Part 2
- Responsibility boundaries: How function trees fit into my Coordinated Progress model and responsibility boundaries
- Value-add: What value does durable execution actually provide?
- Architecture discussion: Where function trees sit alongside event-driven choreography, and when to use each.

Jack Vanlightly

November 24, 2025

Distributed Systems

Demystifying Determinism in Durable Execution

Jack Vanlightly

November 24, 2025

Distributed Systems

Determinism is a key concept to understand when writing code using durable execution frameworks such as Temporal, Restate, DBOS, and Resonate. If you read the docs you see that some parts of your code must be deterministic while other parts do not have to be. This can be confusing to a developer new to these frameworks.

This post explains why determinism is important and where it is needed and where it is not. Hopefully, you’ll have a better mental model that makes things less confusing.

Jack Vanlightly

November 19, 2025

Data

Have your Iceberg Cubed, Not Sorted: Meet Qbeast, the OTree Spatial Index

Jack Vanlightly

November 19, 2025

Data

In today’s post I want to walk through a fascinating indexing technique for data lakehouses which flips the role of the index in open table formats like Apache Iceberg and Delta Lake.

We are going to turn the tables on two key points:

Indexes are primarily for reads. Indexes are usually framed as read optimizations paid for by write overhead: they make read queries fast, but inserts and updates slower. That isn’t the full story as indexes also support writes such as with faster uniqueness enforcement and reducing lock contention (for example, by avoiding range locks during table scans) but the dominant mental model is that indexing serves reads while writes pay the bill.
OTFs don’t use tree-based indexes. Open-table format indexes are data-skipping indexes scoped to data files or even blocks within data files. They are a loose collection of column statistics and Bloom filters.

Qbeast, a start-up with a presence here in Barcelona where I live, is reimagining indexes for open table formats, showing that neither assumption has to be true.

Jack Vanlightly

November 5, 2025

Data

How Would You Like Your Iceberg Sir? Stream or Batch Ordered?

Jack Vanlightly

November 5, 2025

Data

Today I want to talk about stream analytics, batch analytics and Apache Iceberg. Stream and batch analytics work differently but both can be built on top of Iceberg, but due to their differences there can be a tug-of-war over the Iceberg table itself. In this post I am going to use two real-world systems, Apache Fluss (streaming tabular storage) and Confluent Tableflow (Kafka-to-Iceberg), as a case study for these tensions between stream and batch analytics.

Apache Fluss uses zero-copy tiering to Iceberg. Recent data is stored on Fluss servers (using Kafka replication protocol for high availability and durability) but is then moved to Iceberg for long-term storage. This results in one copy of the data.
Confluent Kora and Tableflow uses internal topic tiering and Iceberg materialization, copying Kafka topic data to Iceberg, such that we have two copies (one in Kora, one in Iceberg).

This post will explain why both have chosen different approaches and why both are totally sane, defensible decisions.

Jack Vanlightly

October 22, 2025

Distributed Systems

A Fork in the Road: Deciding Kafka’s Diskless Future

Jack Vanlightly

October 22, 2025

Distributed Systems

“The Kafka community is currently seeing an unprecedented situation with three KIPs (KIP-1150, KIP-1176, KIP-1183) simultaneously addressing the same challenge of high replication costs when running Kafka across multiple cloud availability zones.” — Luke Chen, The Path Forward for Saving Cross-AZ Replication Costs KIPs

At the time of writing the Kafka project finds itself at a fork in the road where choosing the right path forward for implementing S3 topics has implications for the long-term success of the project. Not just the next couple of years, but the next decade. Open-source projects live and die by these big decisions and as a community, we need to make sure we take the right one.

This post explains the competing KIPs, but goes further and asks bigger questions about the future direction of Kafka.

Jack Vanlightly

October 15, 2025

Data

Why I’m not a fan of zero-copy Apache Kafka-Apache Iceberg

Jack Vanlightly

October 15, 2025

Data

Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics could live directly as Iceberg tables. On the surface it sounds efficient: one copy of the data, unified access for both streaming and analytics. But from a systems point of view, I think this is the wrong direction for the Apache Kafka project. In this post, I’ll explain why.

Jack Vanlightly

October 8, 2025

Data

Beyond Indexes: How Open Table Formats Optimize Query Performance

Jack Vanlightly

October 8, 2025

Data

My career in data started as a SQL Server performance specialist, which meant I was deep into the nuances of indexes, locking and blocking, execution plan analysis and query design. These days I’m more in the world of the open table format such as Apache Iceberg. Having learned the internals of both transactional and analytical database systems, I find the use of the word “index” interesting as they mean very different things to different systems.

I see the term “index” used loosely when discussing open table format performance, both in their current designs and in speculation about future features that might make it into their specs. But what actually counts as an index in this world?

Some formats, like Apache Hudi, do maintain record-level indexes such as, primary-key-to-filegroup maps that enable upserts and deletes to be directed efficiently to the right filegroup in order to support primary key tables. But they don’t help accelerate read performance across arbitrary predicates like the secondary indexes we rely on in OLTP databases.

Traditional secondary indexes (like the B-trees used in relational databases) don’t exist in Iceberg, Delta Lake, or even Hudi. But why? Can't we solve some performance issues if we just added secondary indexes to the Iceberg spec?

The short answer is: “no and it's complicated”. There are real and practical reasons why the answer isn’t just "we haven't gotten around to it yet."

Jack Vanlightly

September 2, 2025

Distributed Systems

Understanding Apache Fluss

Jack Vanlightly

September 2, 2025

Distributed Systems

This is a data system internals blog post. So if you enjoyed my table formats internals blog posts, or writing on Apache Kafka internals or Apache BookKeeper internals, you might enjoy this one. But beware, it’s long and detailed. Also note that I work for Confluent, which also runs Apache Flink but does not run nor contributes to Apache Fluss. However, this post aims to be a faithful and objective description of Fluss.

Apache Fluss is a table storage engine for Flink being developed by Alibaba in collaboration with Ververica. To write this blog post, I reverse engineered a high level architecture by reading the Fluss code from the main branch (and running tests), in August 2025. This follows my same approach to my writing about Kafka, Pulsar, BookKeeper, and the table formats (Iceberg, Delta, Hudi and Paimon) as the code is always the true source of information. Unlike the rest, I have not had time to formally verify Fluss in TLA+ or Fizzbee, though I did not notice any obvious issues that are not already logged in a GitHub issue.

Let’s get started. We’ll start with some high level discussion in the Fluss Overview section, then get into the internals in the Fluss Cluster Core Architecture and Fluss Lakehouse Architecture sections.