The ultimate guide to table format internals - all my writing so far

I’ve created this page to make it easier for me to share links about my writing on table format internals. Currently, it includes Apache Iceberg, Delta Lake, Apache Hudi, and Apache Paimon.

Basic mechanics and consistency model

Part 1 of each the following explains the internal mechanics of these table formats, and other remaining parts explains the consistency model (including under multi-writer scenarios). I have written formal models of each either in TLA+ or Fizzbee.

Some comparisons of internals regarding specific capabilities

Change query support deep dives

These analyses look at the internals of each table format, to understand the current support for change queries (inc. CDC queries).

PS: I also maintain the Humans of the Data Sphere publication, a biweekly look at what people are saying in the world of databases, AI, streaming, distributed systems and the data engineering/analytics space.