Table formats

Understanding Delta Lake's consistency model

A few days ago I released my analysis of Apache Hudi’s consistency model, with the help of a TLA+ specification. This post will do the same for Delta Lake. Just like the Hudi post, I will not comment on subjects such as performance, efficiency or how use cases such as batch and streaming are supported. This post focuses solely on the consistency model using a logical model of the core Delta Lake protocol.

Understanding Apache Hudi's Consistency Model Part 3

In part 1 we built a logical model for how copy-on-write tables work in Apache Hudi, and posed a number of questions regarding consistency with regard to types of concurrency control, timestamp monotonicity and more. In part 2 we studied timestamp collisions, their probabilities and how to avoid them (and be conformant to the Hudi spec). In part 3, we’ll be focusing on the results of model checking the TLA+ specification, and answering those questions.

Understanding Apache Hudi's Consistency Model Part 2

In part 1 we built up an understanding of the mechanics of Hudi Copy-on-write tables, with a special regard to multi-writer scenarios, using a simplified logical model. In this part we’ll look at:

  • Understanding why the Hudi spec instructs the use of monotonic timestamps, by looking at the impact of timestamp collisions.

  • The probability of collisions in multi-writer scenarios where writers use their local clocks as timestamp sources.

  • Various options for avoiding collisions.

Understanding Apache Hudi's Consistency Model Part 1

Apache Hudi is one of the leading three table formats (Apache Iceberg and Delta Lake being the other two). Whereas Apache Iceberg internals are relatively easy to understand, I found that Apache Hudi was more complex and hard to reason about. As a distributed systems engineer, I wanted to understand it and I was especially interested to understand its consistency model with regard to multiple concurrent writers. Ultimately, I wrote a TLA+ specification to help me nail down the design and understand its consistency model.