Log Replication Disaggregation Survey - Neon and MultiPaxos — Jack Vanlightly

Log Replication Disaggregation Survey - Neon and MultiPaxos

Over the next series of posts, we'll explore how various real-world systems and some academic papers have implemented log replication with some form of disaggregation. In this first post we’ll look at MultiPaxos. There are no doubt many real-world implementations of MultiPaxos out there, but I want to focus on Neon’s architecture as it is illustrative of the benefits of thinking in terms of logical abstractions and responsibilities when designing complex systems.

Preamble

There are so many systems out there, too many for me to list without this becoming a huge research project. So I’m going to stick to systems I have directly been involved with, or have a decent level of knowledge about already. Feel free to comment on social media about interesting systems that I haven’t included.

I have classified a few ways of breaking apart a monolithic replication protocol such as Raft. The classifications are:

  • (A) Disaggregated roles/participants. The participants form clearly delineated roles that can be run as separate processes. The protocol itself may still be converged (control plane and data plane intertwined) but exercised by disaggregated components.

  • (B) Disaggregated protocol that separates the control plane from the data plane. The protocol itself is broken up into control plane and data plane duties, with clean and limited interaction between them.

  • (C) Segmented logs. Logs are formed from a chain of logical log segments.

  • (D) Pointer-based logs. Separates data from ordering.

  • (E) Separating ordering from IO. Avoid the need for a leader by allowing all nodes to write, but coordinate via a sequencer component.

  • (F) Leaderless proxies. Abstracts the specifics of the replicated log protocol from clients, via another disaggregated abstraction layer of leaderless proxies.

MultiPaxos in the wild - Neon’s Safekeepers

Disaggregation categories: (A) Disaggregated roles/participants.

Neon is a superb example of disaggregating the replication protocol roles/participants. I wrote about Neon in November 2023. 

Neon is a serverless Postgres service that runs Postgres instances with a modified storage engine. Rather than durably storing data on local disks, all write transactions go through a remote, distributed write-ahead-log (WAL) service based on MultiPaxos.

Neon separates durable storage from Postgres instances. Each Postgres instance stores soft state (not required to be durable), and durable storage consists of:

  • A remote distributed write-ahead-log (WAL) for high performance durable writes. This WAL acts as a replicated short-term durable storage component.

  • An object store is used for long-term durable storage, with a set of Pageservers acting as a serving layer.

Fig 1. The main components of the Neon architecture.

The Safekeeper protocol employs a MultiPaxos-based approach to ensure data consistency, ensuring that only one Postgres primary can performs writes at a time, data is redundantly stored across Safekeepers, primary failovers do not lose data and so on.

The following is an excerpt from that post (it perfectly explains why a disaggregated Paxos fits their needs).

Neon post excerpt start —

Rather than adopt Raft, Neon has chosen a Paxos implementation for its WAL service. Paxos defines the roles of Proposer, Acceptor, and Learner. Each role is responsible for a different part of the protocol and there are no rules regarding where the different roles run. 

  • Proposers. A proposer simply proposes a value to be written. In Multi-Paxos, one proposer at a time is elected as the leader who proposes a sequence of values. This leader is also known as a Distinguished Proposer.

  • Acceptors. Acceptors store the proposed values, and values are committed once accepted by a majority.

  • Learners. A learner learns of committed values from the acceptors.

With Multi-Paxos, one leadership term consists of a Prepare phase where a Proposer is elected as the Distinguished Proposer by the Acceptors. Then, the second (steady-state) phase is the Accept phase, where the leader proposes a sequence of values to the Acceptors, who must store the proposed values. Learners learn of the committed values from the acceptors.

Implementations can choose to have these processes running on different machines in a disaggregated way or have a single process act as all three roles. The latter is precisely what Raft does. The Raft leader is the Distinguished Proposer; the leader and followers are all Acceptors, and the state machine on each member that sits atop this replication layer acts as a Learner.

Coming back to Neon, it chose Paxos because of the ability to disaggregate the roles. If a Postgres database fails, a new one must be spun up to take its place. But what happens if the first database node is actually still operating? Now we have two primaries and what is known as split-brain. Split-brain leads to data inconsistency which we really want to avoid. What we need is a way of ensuring that only one primary can write to the WAL service at a time. We can’t prevent two primaries from existing, but we can ensure that only one can make progress while the other remains blocked. Paxos solves this problem.

Each Postgres database is a proposer, each Safekeeper is an acceptor and the Pageservers are learners. Before a Neon Postgres database can write WAL records, it must get elected by the Safekeepers as the leader (distinguished proposer). Once elected, it is free to write (or propose) a sequence of WAL records to the Safekeepers. Once a majority of Safekeepers have acknowledged a WAL record, the database treats that record as committed.

Fig 2. Neon components mapping onto MultiPaxos roles.

Pageservers learn of the committed WAL records in their role as Paxos learners. WAL records are replayed over the latest image file, and those files are eventually uploaded to S3. The Pageservers then communicate with the Safekeepers about the index of the last applied WAL record, allowing Safekeepers to safely garbage collect that data.

Excerpt end —

There is a lot more to the Safekeeper protocol if you are interested, and it shares some commonality with how Aurora handles primary failover. But this survey is more concerned with how systems get disaggregated, so I’ll leave it there.

Why I like this example

The brilliance of Paxos lies in how it formalizes consensus into distinct roles - Proposers, Acceptors, and Learners. This fundamental separation of responsibilities creates a flexible building block that engineers can compose in creative ways to solve real-world problems. Neon demonstrates this power perfectly: instead of being constrained to a traditional cluster of identical nodes all running the same code, they were able to weave consensus from heterogeneous components. Postgres instances act as Proposers, specializing in generating new values, Safekeepers focus solely on their Acceptor role of durably storing and voting on proposals, while Pageservers operate as Learners, consuming the committed log. This composition allows each component to be optimized for its specific role - Postgres for query processing, Safekeepers for durability, and Pageservers for log consumption and storage management.

I like this example because it breaks you free from looking at a replication protocol implementation being formed by a dedicated cluster with a rigid deployment model. That might be the right choice, or it might not, but making the choice consciously (not in ignorance) is the key. In Neon’s case, Paxos' role separation enabled Neon to build a consensus protocol that maps elegantly to their serverless architecture's natural boundaries. For me it is the perfect example of why abstractions in design matter.

Share