I’ve created this page to make it easier for me to share links about my writing on table format internals. Currently, it includes Apache Iceberg, Delta Lake, Apache Hudi, and Apache Paimon.
Basic mechanics and consistency model
Part 1 of each the following explains the internal mechanics of these table formats, and other remaining parts explains the consistency model (including under multi-writer scenarios). I have written formal models of each either in TLA+ or Fizzbee.
Apache Iceberg
Part 1 (internals)
Fizzbee specification (formal model of COW and MOR tables)
Delta Lake
One post for internals and consistency model.
TLA+ specification (formal model of COW tables)
Apache Hudi
Part 1 (internals)
Part 2 (on timestamp collisions)
Part 3 (consistency model)
TLA+ specification (formal model of COW tables)
Apache Paimon
Part 1 (internals)
Fizzbee specification (formal model)
Some comparisons of internals regarding specific capabilities
Change query support deep dives
These analyses look at the internals of each table format, to understand the current support for change queries (inc. CDC queries).
PS: I also maintain the Humans of the Data Sphere publication, a biweekly look at what people are saying in the world of databases, AI, streaming, distributed systems and the data engineering/analytics space.