This is the first in a series of short comparisons of table format internals. While I have written in some detail about each, I think it’s interesting to look at what is the same or similar and what sets them apart from each other.
Question: How do the table formats represent the canonical list of data and delete files?
All the table formats store references to a canonical set of data and delete files within a set of metadata files. Each table format takes a slightly different approach but I’ll classify them into two categories:
The log of deltas approach (Hudi and Delta Lake)
The log of snapshots approach (Iceberg and Paimon)
