Applying Flexible Paxos to Raft — Jack Vanlightly

Applying Flexible Paxos to Raft

Image credit: ESO/G.Hüdepohl (atacamaphoto.com)

Flexible Paxos provides us the insight that Paxos only needs that election and replication quorums intersect, not all quorums. Before this it was assumed that majority quorums we required because majority quorums always intersect. So what does that mean exactly and can it be applied to Raft?

In standard Raft, an election quorum is a subset of the set of servers that have voted for the same server in the same election term and that quorum is formed of a majority. For a 3 node cluster we need 2 votes and a 5 node cluster we need 3 votes and so on.

The next question is: what are all the possible quorums that exist and are there any two quorums that do not intersect? The possible majority quorums are {n1, n2}, {n2, n3} and {n1, n3} and we see that there are no two quorums that do not intersect. This is the property we get from majority quorums.

All majorities overlap

All majority quorums overlap

If we only had minority quorums then we could have two quorums that don’t intersect. For example, a 5 node cluster has many strict minority quorums that do not overlap, for example a couple are:

  • {n1, n2} - {n3, n4}

  • {n1, n5} - {n2, n3}

If every quorum is a minority then we get split-brain scenarios. In a 5 node cluster with quorums of size 2 then two leaders can be elected and both make progress! For this reason both Raft and Paxos use majority quorums for both elections and replication.

But Flexible Paxos gave us the insight that for Paxos at least, we don’t need that all election quorums overlap or all replication quorums overlap - only that every possible replication quorum intersects with every possible election quorum.

However, the above is not true for Raft! With Raft we still need that all election quorums overlap or else we can have the problem of split-brain where two leaders get elected in the same Raft term!

Applying all of this to Raft, it means for a 5 node cluster we can have can have a replication quorum of just 1 as long as our election quorum is 5. Or a replication quorum of 2 but only with an election quorum of 4.

| Replication quorum | Election quorum |
|--------------------|-----------------|
| 5                  | 3               |
| 4                  | 3               |
| 3                  | 3               |
| 2                  | 4               |
| 1                  | 5               |

How does this change the safety and liveness properties of Raft? In the end it isn’t quite as compelling as you might think at first. Increasing or decreasing the replication quorum is fraught with consequences for safety, availability and latency. Also note that changing the replication quorum in Raft does not change the replication factor (the number of copies of each entry) - only the minimum number of copies before an entry is considered committed.

Scenarios

Let’s look at three scenarios with a 5 node Raft cluster:

  • RQ=3, EQ=3 (standard Raft)

  • RQ=2,EQ=4

  • RQ=4, EQ=2

5 node cluster, RQ=3, EQ=3 (standard Raft)

  • Safety. Tolerates up to 2 lost nodes.

  • Write availability. Tolerates up to 2 lost nodes.

  • Election availability. Tolerates up to 2 lost nodes.

  • Latency: Tolerates 2 slow nodes.

5 node cluster, RQ=2, EQ=4

We lower the replication quorum and increase the election quorum.

  • Safety. Tolerates up to 1 lost node. Less safe!

  • Write availability. Tolerates up to 3 lost nodes. More write available!

  • Election availability. Tolerates up to 1 lost node. Less election availability!

  • Latency: Tolerates 3 slow nodes. Lower, more predictable write latency!

5 node cluster, RQ=4, EQ=3

We increase the replication quorum and but can’t decrease the election quorum as the election quorum must be a majority quorum.

  • Safety. Tolerates up to 3 lost nodes. More safe!

  • Write availability. Tolerates up to 1 lost node. Less write available!

  • Election availability. Tolerates up to 2 lost nodes. Same election availability.

  • Latency: Tolerates 1 slow node. Higher, less predictable write latency.

Summary

The interesting thing is that whether we increase or decrease the replication quorum our availability decreases!

Predictably, increasing the replication quorum gives us more safety at the cost of less write availability. But you may feel that trade-off is worth it.

Lowering the replication quorum gives us less safety as expected but we don’t get the corresponding jump in availability. Availability is not just whether a leader can make progress, but that a leader can be elected at all. A system without a functioning leader but not enough nodes to elect a new one is unavailable. So in fact we get less safety and less availability - not such an attractive option.

Flexible quorums in Raft is a nuanced subject and the trade-offs should be carefully considered - though I know of no flexible Raft implementation in the wild in any case.

TLA+ specifications

The nice thing about applying flexible quorums to Raft is how simple it is. I have my own Raft TLA+ specification which is based on the original but fixes a few bugs and is optimized for model checking.

Applying flexible quorums requires only two new constants and a simple change in the BecomeLeader action and the AdvanceCommitIndex action. You can experiment with changing the replication and election quorums and see things blow up when you set quorums sizes that produce non-intersecting election and replication quorums! See the Flexible Raft spec here.

A new Raft variant

I’m working on another Raft variant which makes use of flexible quorums but borrows from other protocols to get around some of the availability issues that Flexible Raft has when lowering the replication quorum. It’s a weekend project so no timeline but watch this space.