Technology changes can be sudden (like generative AI) or slower juggernauts that kick off a slow chain reaction that takes years to play out. I would place object storage and its enablement of disaggregated architectures in that latter category. The open table formats, such as Apache Iceberg, Delta Lake, and Apache Hudi, form part of this chain reaction, but things aren’t stopping there.
I’ve written extensively about the open table formats (OTFs). In my original Tableflow post, I wrote that shared tables were one of the major trends, enabled by the OTFs. But why is it that OTFs make for a good sharing primitive? I have been focused mainly on the separation of compute and storage. That OTFs allow for a headless architecture where different platforms can bring their own compute to the same data. This is all true.
But we can also view OTFs as enabling a kind of virtualization. In this post, I will start by explaining my take on OTFs and virtualization. Finally, I’ll bring it back to Confluent, the Confluent/Databricks partnership, and the future of composable data platforms.