Today I want to talk about stream analytics, batch analytics and Apache Iceberg. Stream and batch analytics work differently but both can be built on top of Iceberg, but due to their differences there can be a tug-of-war over the Iceberg table itself. In this post I am going to use two real-world systems, Apache Fluss (streaming tabular storage) and Confluent Tableflow (Kafka-to-Iceberg), as a case study for these tensions between stream and batch analytics.
Apache Fluss uses zero-copy tiering to Iceberg. Recent data is stored on Fluss servers (using Kafka replication protocol for high availability and durability) but is then moved to Iceberg for long-term storage. This results in one copy of the data.
Confluent Kora and Tableflow uses internal topic tiering and Iceberg materialization, copying Kafka topic data to Iceberg, such that we have two copies (one in Kora, one in Iceberg).
This post will explain why both have chosen different approaches and why both are totally sane, defensible decisions.





