Analyses — Jack Vanlightly

Jack Vanlightly

Verifying Kafka transactions - Diary entry 3 - Getting confidence in the TLA+ spec

One thing I forgot to write about yesterday was how I get confidence that the TLA+ specification is doing what I expect. The thing about model checking a spec is that because there are usually millions or billions of states to explore, you can’t always see that certain states that should be reachable do get reached. It’s common to have a specification bug that causes your spec to have valid areas of the state space be unreachable, without you knowing it. This is why TLA+ practitioners learn to be a skeptical bunch and treat early success with great cynicism.

This is what I did to get confidence in this initial specification.

Share

Verifying Kafka transactions - Diary entry 2 - Writing an initial TLA+ spec

Strap in and fire up your nerd engines—this one's gonna be intense!

In my previous diary entry, I explained I would begin the formal modelling of Kafka transactions by implementing a tiny initial piece of the protocol - how a producer obtains a producer id. In this post I will discuss some aspects of the TLA+ specification I have written of this initial model.

Share

Change query support in Apache Paimon (0.8)

This is the Apache Paimon deep dive associated with the table format comparisons - Change queries and CDC blog post, which looks at how each table format supports change queries, including full CDC. This is not a how-to guide but an examination of Apache Paimon capabilities through the lens of its internal design. Note that this post is not about CDC log ingestion but about Paimon’s support for querying the changes of the table itself.

Share

Understanding Apache Iceberg's Consistency Model Part 1

Apache Iceberg is the last table format I am covering in this series and is perhaps the most widely adopted and well-known of the table formats. I wasn’t going to write this analysis originally as I felt the book Apache Iceberg: The Definitive Guide was detailed enough. Now, having gone through the other formats, I see that the book is too high-level for what I have been covering in this series—so here we go—a deep dive into Apache Iceberg internals to understand its basic mechanics and consistency model.

Share

Understanding Apache Paimon's Consistency Model Part 3

In this final part of this Apache Paimon series, I’ll go over the formal verification with Fizzbee.

Normally I use TLA+ for formal verification but this time I decided to try out Fizzbee, a language and model checker that maps closely to TLA+ semantics but uses a subset of Python called Starlark. Fizzbee is still relatively immature but it shows a lot of potential. I’ll be writing about my experiences with Fizzbee in a future blog post.

Share

Understanding Apache Paimon's Consistency Model Part 1

Apache Paimon is an open-source table format that has come after the more established Apache Iceberg, Delta Lake and Apache Hudi projects. It was born in the Apache Flink project where it was known as Flink Table Store, but has since spun out as a top-level Apache project. When I first started digging into Paimon I remarked that if Iceberg, Delta and Hudi had a baby, it might be Paimon. But Paimon has a number of its own innovations that set it apart from the Big Three table formats. 

Share

Understanding Delta Lake's consistency model

A few days ago I released my analysis of Apache Hudi’s consistency model, with the help of a TLA+ specification. This post will do the same for Delta Lake. Just like the Hudi post, I will not comment on subjects such as performance, efficiency or how use cases such as batch and streaming are supported. This post focuses solely on the consistency model using a logical model of the core Delta Lake protocol.

Share

Understanding Apache Hudi's Consistency Model Part 3

In part 1 we built a logical model for how copy-on-write tables work in Apache Hudi, and posed a number of questions regarding consistency with regard to types of concurrency control, timestamp monotonicity and more. In part 2 we studied timestamp collisions, their probabilities and how to avoid them (and be conformant to the Hudi spec). In part 3, we’ll be focusing on the results of model checking the TLA+ specification, and answering those questions.

Share

Understanding Apache Hudi's Consistency Model Part 2

In part 1 we built up an understanding of the mechanics of Hudi Copy-on-write tables, with a special regard to multi-writer scenarios, using a simplified logical model. In this part we’ll look at:

  • Understanding why the Hudi spec instructs the use of monotonic timestamps, by looking at the impact of timestamp collisions.

  • The probability of collisions in multi-writer scenarios where writers use their local clocks as timestamp sources.

  • Various options for avoiding collisions.

Share

Understanding Apache Hudi's Consistency Model Part 1

Apache Hudi is one of the leading three table formats (Apache Iceberg and Delta Lake being the other two). Whereas Apache Iceberg internals are relatively easy to understand, I found that Apache Hudi was more complex and hard to reason about. As a distributed systems engineer, I wanted to understand it and I was especially interested to understand its consistency model with regard to multiple concurrent writers. Ultimately, I wrote a TLA+ specification to help me nail down the design and understand its consistency model. 

Share

Scaling models and multi-tenant data systems - ASDS Chapter 6

Scaling models and multi-tenant data systems - ASDS Chapter 6

What is scaling in large-scale multi-tenant data systems, and how does that compare to single-tenant data systems? How does per-tenant scaling relate to system-wide scaling? How do scale-to-zero and cold starts come into play? Answering these questions is chapter 6 of The Architecture of Serverless Data Systems.

Rather than looking at a new system, this chapter focuses on some patterns that emerge after analyzing the first five systems. So far, I’ve included storage-oriented systems (Amazon DynamoDB, Confluent Cloud’s Kora engine), OLTP databases (Neon, CockroachDB serverless), and an OLAP database (ClickHouse Cloud). The workloads across these systems diverge a lot, so their architectural decisions also diverge, but some patterns emerge.

Share