February 21, 2025

Log Replication Disaggregation Survey - Kafka Replication Protocol

February 21, 2025

In this post, we’re going to look at the Kafka Replication Protocol and how it separates control plane and data plane responsibilities. It’s worth noting there are other systems that separate concerns in a similar way, with RabbitMQ Streams being one that I am aware of.

Jack Vanlightly

February 19, 2025

Distributed Systems

Log Replication Disaggregation Survey - Neon and MultiPaxos

Jack Vanlightly

February 19, 2025

Distributed Systems

Over the next series of posts, we'll explore how various real-world systems and some academic papers have implemented log replication with some form of disaggregation. In this first post we’ll look at MultiPaxos. There are no doubt many real-world implementations of MultiPaxos out there, but I want to focus on Neon’s architecture as it is illustrative of the benefits of thinking in terms of logical abstractions and responsibilities when designing complex systems.

Jack Vanlightly

February 17, 2025

Strategy and commentary

Towards composable data platforms

Jack Vanlightly

February 17, 2025

Strategy and commentary

Technology changes can be sudden (like generative AI) or slower juggernauts that kick off a slow chain reaction that takes years to play out. I would place object storage and its enablement of disaggregated architectures in that latter category. The open table formats, such as Apache Iceberg, Delta Lake, and Apache Hudi, form part of this chain reaction, but things aren’t stopping there.

I’ve written extensively about the open table formats (OTFs). In my original Tableflow post, I wrote that shared tables were one of the major trends, enabled by the OTFs. But why is it that OTFs make for a good sharing primitive? I have been focused mainly on the separation of compute and storage. That OTFs allow for a headless architecture where different platforms can bring their own compute to the same data. This is all true.

But we can also view OTFs as enabling a kind of virtualization. In this post, I will start by explaining my take on OTFs and virtualization. Finally, I’ll bring it back to Confluent, the Confluent/Databricks partnership, and the future of composable data platforms.

Jack Vanlightly

February 10, 2025

Distributed Systems

How to disaggregate a log replication protocol

Jack Vanlightly

February 10, 2025

Distributed Systems

This post continues my series looking at log replication protocols, within the context of state-machine replication (SMR) or just when the log itself is the product (such as Kafka). So far I’ve been looking at Virtual Consensus, but now I’m going to widen the view to look at how log replication protocols can be disaggregated in general (there are many ways). In the next post, I’ll do a survey of log replication systems in terms of the types of disaggregation described in this post.

Jack Vanlightly

February 6, 2025

Distributed Systems

Steady on! Separating Failure-Free Ordering from Fault-Tolerant Consensus

Jack Vanlightly

February 6, 2025

Distributed Systems

"True stability results when presumed order and presumed disorder are balanced. A truly stable system expects the unexpected, is prepared to be disrupted, waits to be transformed." — Tom Robbins

This post continues my series looking at log replication protocols, within the context of state-machine replication (SMR) or just when the log itself is the product (such as Kafka). I’m going to cover some of the same ground from the Introduction to Virtual Consensus in Delos post, but focus on one aspect specifically and see how it generalizes.

Jack Vanlightly

February 5, 2025

Distributed Systems

An Introduction to Virtual Consensus in Delos

Jack Vanlightly

February 5, 2025

Distributed Systems

This is the first of a number of posts looking at log replication protocols, mainly in the context of state machine replication (SMR). This first post will look at a log replication protocol design called Virtual Consensus from the paper: Virtual Consensus in Delos.

In 2020, a team of researchers and engineers from Facebook, led by Mahesh Balakrishnan, published their work (linked above) on a log replication design called Virtual Consensus that they had built as the log replication layer of their database, Delos.

As an Apache BookKeeper committer (non-active), I immediately saw the similarities to BookKeeper. Yet, the Virtual Consensus paper went further than BookKeeper, describing clean abstractions with clear separations of concerns. Just as the Raft paper has helped a lot of engineers implement SMR over the last 10 years, I believe the Virtual Consensus paper could do the same for the next 10. There are a few reasons to believe this that I will explain in this post.

Jack Vanlightly

February 3, 2025

Strategy and commentary

Why Snowflake wants streaming

Jack Vanlightly

February 3, 2025

Strategy and commentary

Rumors are swirling that Snowflake intends to acquire Redpanda and many are questioning why and what impact this might have on Confluent. First, let’s remember that these are just rumors and there’s nothing official. But given that people are speculating, here are my thoughts on how to interpret such an acquisition, whether it ends up happening or not.

There are a number of market trends in play right now, such as the rise of Iceberg and open data, as well as the war with Databricks and Snowflake’s refocus on AI. While it may not be evident at first, these are all driving Snowflake towards streaming.