Jack Vanlightly

December 22, 2024

Distributed systems

Verifying Kafka transactions - Diary entry 5 - AddPartitionsToTxn in TLA+

Jack Vanlightly

December 22, 2024

Distributed systems

I have written the second iteration of the TLA+ specification which now includes:

The AddPartitionsToTxn request/response.
Transaction coordinator (TC) elections so that leadership of a transaction log partition can now move between brokers.
Producer fencing and some of the abort transaction logic.

Jack Vanlightly

December 5, 2024

Distributed systems

Verifying Kafka transactions - Diary entry 4 - Writing an initial Fizzbee spec

Jack Vanlightly

December 5, 2024

Distributed systems

I have written a Fizzbee spec that is the equivalent of the TLA+ spec.

Let’s dive into the decisions I had to make with this initial Fizzbee spec.

Jack Vanlightly

December 4, 2024

Distributed systems

Verifying Kafka transactions - Diary entry 3 - Getting confidence in the TLA+ spec

Jack Vanlightly

December 4, 2024

Distributed systems

One thing I forgot to write about yesterday was how I get confidence that the TLA+ specification is doing what I expect. The thing about model checking a spec is that because there are usually millions or billions of states to explore, you can’t always see that certain states that should be reachable do get reached. It’s common to have a specification bug that causes your spec to have valid areas of the state space be unreachable, without you knowing it. This is why TLA+ practitioners learn to be a skeptical bunch and treat early success with great cynicism.

This is what I did to get confidence in this initial specification.

Jack Vanlightly

December 3, 2024

Distributed systems

Verifying Kafka transactions - Diary entry 2 - Writing an initial TLA+ spec

Jack Vanlightly

December 3, 2024

Distributed systems

Strap in and fire up your nerd engines—this one's gonna be intense!

In my previous diary entry, I explained I would begin the formal modelling of Kafka transactions by implementing a tiny initial piece of the protocol - how a producer obtains a producer id. In this post I will discuss some aspects of the TLA+ specification I have written of this initial model.

Jack Vanlightly

December 2, 2024

Distributed systems

Verifying Kafka transactions - Diary entry 1 - A first step

Jack Vanlightly

December 2, 2024

Distributed systems

A few days ago I started my work of formally verifying Apache Kafka transactions in both TLA+ and Fizzbee. I’ve decided to write a kind of diary of my progress so that others who are interested in formal verification of complex systems can read along and see how I go about it.

Jack Vanlightly

October 9, 2024

Table formats

Change query support in Apache Paimon (0.8)

Jack Vanlightly

October 9, 2024

Table formats

This is the Apache Paimon deep dive associated with the table format comparisons - Change queries and CDC blog post, which looks at how each table format supports change queries, including full CDC. This is not a how-to guide but an examination of Apache Paimon capabilities through the lens of its internal design. Note that this post is not about CDC log ingestion but about Paimon’s support for querying the changes of the table itself.

Jack Vanlightly

September 30, 2024

Table formats

Change query support in Apache Hudi (0.15)

Jack Vanlightly

September 30, 2024

Table formats

This is the Apache Hudi deep dive associated with the table format comparisons - Change queries and CDC blog post, which looks at how each table format supports change queries tables, including full CDC. This is not a how-to guide but an examination of Apache Hudi capabilities through the lens of its internal design.

Jack Vanlightly

September 24, 2024

Table formats

Change query support in Delta Lake (3.2.0)

Jack Vanlightly

September 24, 2024

Table formats

This is the Delta Lake deep dive associated with the table format comparisons - Change queries and CDC blog post, which looks at how each table format supports change queries tables, including full CDC. This is not a how-to guide but an examination of Delta Lake capabilities through the lens of its internal design.

Jack Vanlightly

September 23, 2024

Table formats

Change query support in Apache Iceberg v2

Jack Vanlightly

September 23, 2024

Table formats

This is the Apache Iceberg deep dive associated with the Table format comparisons - Change queries and CDC blog post, that looks at how each table format supports change queries tables, including full CDC. This is not a how-to guide but an examination of Iceberg capabilities through the lens of its internal design.

Jack Vanlightly

August 6, 2024

Table formats

Understanding Apache Iceberg’s Consistency Model Part 3

Jack Vanlightly

August 6, 2024

Table formats

In this final part of my Apache Iceberg series on its consistency model, I’ll cover the formal specification I wrote for it and the results of model checking.

Jack Vanlightly

August 5, 2024

Table formats

Understanding Apache Iceberg’s Consistency Model Part 2

Jack Vanlightly

August 5, 2024

Table formats

In this post, we will explore Apache Iceberg's concurrency control and data conflict checks which provide compute engines with support for offering transactions with Serializable and Snapshot Isolation. We will focus on multi-writer scenarios, as we have throughout this table-format series.

Jack Vanlightly

July 30, 2024

Table formats

Understanding Apache Iceberg's Consistency Model Part 1

Jack Vanlightly

July 30, 2024

Table formats

Apache Iceberg is the last table format I am covering in this series and is perhaps the most widely adopted and well-known of the table formats. I wasn’t going to write this analysis originally as I felt the book Apache Iceberg: The Definitive Guide was detailed enough. Now, having gone through the other formats, I see that the book is too high-level for what I have been covering in this series—so here we go—a deep dive into Apache Iceberg internals to understand its basic mechanics and consistency model.

Jack Vanlightly

July 3, 2024

Table formats

Understanding Apache Paimon's Consistency Model Part 3

Jack Vanlightly

July 3, 2024

Table formats

In this final part of this Apache Paimon series, I’ll go over the formal verification with Fizzbee.

Normally I use TLA+ for formal verification but this time I decided to try out Fizzbee, a language and model checker that maps closely to TLA+ semantics but uses a subset of Python called Starlark. Fizzbee is still relatively immature but it shows a lot of potential. I’ll be writing about my experiences with Fizzbee in a future blog post.

Jack Vanlightly

July 3, 2024

Table formats

Understanding Apache Paimon's Consistency Model Part 2

Jack Vanlightly

July 3, 2024

Table formats

In part 1, we built a mental model for the basic mechanics of primary key tables in Apache Paimon. Now we’ll dig into the consistency model which allows Paimon to support concurrency and parallelism.

Jack Vanlightly

July 3, 2024

Table formats

Understanding Apache Paimon's Consistency Model Part 1

Jack Vanlightly

July 3, 2024

Table formats

Apache Paimon is an open-source table format that has come after the more established Apache Iceberg, Delta Lake and Apache Hudi projects. It was born in the Apache Flink project where it was known as Flink Table Store, but has since spun out as a top-level Apache project. When I first started digging into Paimon I remarked that if Iceberg, Delta and Hudi had a baby, it might be Paimon. But Paimon has a number of its own innovations that set it apart from the Big Three table formats.

Jack Vanlightly

April 29, 2024

Table formats

Understanding Delta Lake's consistency model

Jack Vanlightly

April 29, 2024

Table formats

A few days ago I released my analysis of Apache Hudi’s consistency model, with the help of a TLA+ specification. This post will do the same for Delta Lake. Just like the Hudi post, I will not comment on subjects such as performance, efficiency or how use cases such as batch and streaming are supported. This post focuses solely on the consistency model using a logical model of the core Delta Lake protocol.

Jack Vanlightly

April 24, 2024

Table formats

Understanding Apache Hudi's Consistency Model Part 3

Jack Vanlightly

April 24, 2024

Table formats

In part 1 we built a logical model for how copy-on-write tables work in Apache Hudi, and posed a number of questions regarding consistency with regard to types of concurrency control, timestamp monotonicity and more. In part 2 we studied timestamp collisions, their probabilities and how to avoid them (and be conformant to the Hudi spec). In part 3, we’ll be focusing on the results of model checking the TLA+ specification, and answering those questions.

Jack Vanlightly

April 24, 2024

Table formats

Understanding Apache Hudi's Consistency Model Part 2

Jack Vanlightly

April 24, 2024

Table formats

In part 1 we built up an understanding of the mechanics of Hudi Copy-on-write tables, with a special regard to multi-writer scenarios, using a simplified logical model. In this part we’ll look at:

Understanding why the Hudi spec instructs the use of monotonic timestamps, by looking at the impact of timestamp collisions.
The probability of collisions in multi-writer scenarios where writers use their local clocks as timestamp sources.
Various options for avoiding collisions.

Jack Vanlightly

April 24, 2024

Table formats

Understanding Apache Hudi's Consistency Model Part 1

Jack Vanlightly

April 24, 2024

Table formats

Apache Hudi is one of the leading three table formats (Apache Iceberg and Delta Lake being the other two). Whereas Apache Iceberg internals are relatively easy to understand, I found that Apache Hudi was more complex and hard to reason about. As a distributed systems engineer, I wanted to understand it and I was especially interested to understand its consistency model with regard to multiple concurrent writers. Ultimately, I wrote a TLA+ specification to help me nail down the design and understand its consistency model.

Jack Vanlightly

March 12, 2024

Serverless data systems

Scaling models and multi-tenant data systems - ASDS Chapter 6

Jack Vanlightly

March 12, 2024

Serverless data systems

Scaling models and multi-tenant data systems - ASDS Chapter 6

What is scaling in large-scale multi-tenant data systems, and how does that compare to single-tenant data systems? How does per-tenant scaling relate to system-wide scaling? How do scale-to-zero and cold starts come into play? Answering these questions is chapter 6 of The Architecture of Serverless Data Systems.

Rather than looking at a new system, this chapter focuses on some patterns that emerge after analyzing the first five systems. So far, I’ve included storage-oriented systems (Amazon DynamoDB, Confluent Cloud’s Kora engine), OLTP databases (Neon, CockroachDB serverless), and an OLAP database (ClickHouse Cloud). The workloads across these systems diverge a lot, so their architectural decisions also diverge, but some patterns emerge.