Applying Flexible Paxos to Raft

Applying Flexible Paxos to Raft

Flexible Paxos provides us the insight that Paxos (and Raft) only need that election and replication quorums intersect. But standard Raft and Paxos are configured so that every quorum intersects. So what does that mean exactly?

Let’s take the election quorum and Raft. An election quorum is a subset of the set of servers that have voted for the same server in the same election term and that quorum is formed of a majority. For a 3 node cluster we need 2 votes and a 5 node cluster we need 3 votes and so on.

The next question is: what are all the possible quorums that exist and are there any two quorums that do not intersect? The possible majority quorums are {n1, n2}, {n2, n3} and {n1, n3} and there are no two quorums that do not intersect. This is the property we get from majority quorums.

Write for others but mostly for yourself

Write for others but mostly for yourself

I started my blog originally to help me get to the next level in my career and help establish myself as an authority in the areas of tech that I was focusing on. I liked writing and thought I had something to say.

Looking back at my 6 years of blogging now it’s hard to recognise myself from the engineer I was back then before writing was a regular habit for me. It’s funny because in the end my blog was the key to unlock the next door in my career but not necessarily for the reasons I expected. I figured if I could write some interesting posts I could turn up to an interview and use it as a kind of portfolio, but it became so much more than that.

Tweaking the BookKeeper protocol - Unbounded Ledgers

In the last post I described the necessary protocol changes to ensure that all entries in closed ledgers reached Write Quorum (WQ) and all entries in all but the last fragment in open ledgers reach write quorum.

In this post we’re going to look at another tweak to the protocol to allow ledgers to be unbounded and allow writes from multiple clients over their lifetime.

Tweaking the BookKeeper protocol - Guaranteeing write quorum

Introduction

Recently I wrote a blog post on my team blog about the differences between Raft and the Apache BookKeeper replication protocol. In it I covered one difference that surprises people which is that a ledger can have multiple blocks of entries that only ever reach Ack Quorum and not Write Quorum due to how ensemble changes work. A Raft log on the other hand has the property that the replication factor (RF) reached by any given entry matches the following:

Prefix RF >= Entry RF >= Suffix RF

That is to say, if a given entry has reached RF of 3, then the entire log prefix must also be at 3 or above (depending on the desired RF configured). But with BookKeeper that is not the case. For example, with WQ=3/AQ=2, a given entry that has reached RF of 3 may have entries before it that only reached RF of 2

Learn about TLA+ and the formal verification of Apache BookKeeper

At the time of writing I work at Splunk in the messaging-as-a-service team (we offer Apache Pulsar as in internal Splunk service). In late 2020, early 2021 I decided to formally verify the Apache BookKeeper protocol in TLA+. My main objective was to simply learn the protocol by reverse engineering the code into a specification and that worked extremely well. I also found a protocol bug and an implementation bug as a result which was an added bonus.

Using TLA+ to learn how an existing system works is an amazingly effective learning method. Yes you can read code and docs and you might end up with a hand-wavy level of clarity. But modelling a system in something like TLA+ leaves no room for ambiguity. So I highly recommend it.

You can read about it on the Splunk messaging-as-a-service team blog https://medium.com/splunk-maas:

You can see the current state of the project in my GitHub repo: https://github.com/Vanlightly/bookkeeper-tlaplus.

At the time of writing the Splunk messaging-as-a-service is hiring software engineers, so do contact me if you are interested in working on Apache Pulsar, Apache BookKeeper and all the tooling required to run these systems as a service.

Kafka and RabbitMQ blog posts I wrote elsewhere in 2019

Since I started working at companies that run Messaging-as-a-service (84codes) or actually build the messaging systems themselves (VMware, Splunk) I have been writing blog posts but not on my own blog. I don’t want the confusion of double posting so I’m just going to start posting links this content on my blog and perhaps add some commentary. So here goes for 2019:

Why I'm Not Writing Much On My Blog These Days

Firstly, I joined the RabbitMQ core team which is a demanding job that takes most of my energy, and the second reason is that I pretty much only blog about RabbitMQ now and those posts go on the RabbitMQ blog. So if you are interested in my writing about RabbitMQ, then please head over to our blog.

I also have posts I’d like to write about Apache Pulsar, Apache Kafka, Pravega, Redis and NATS. But I don’t have much time and while I think I would be impartial, I wouldn’t expect others to think so. I have skin in the game now.

But I still spend time understanding how other systems work and how they are positioned in the market. Knowing how the industry evolves and what customers expect help us evolve RabbitMQ while keeping it “rabbity”. RabbitMQ will always aim to be a general purpose message broker, not a data platform nor a big data complex event processing system. But just like object oriented languages have benefited from incorporating some functional language paradigms, RabbitMQ can benefit from incorporating aspects of other messaging paradigms - but without losing its soul or the reasons why users already love it.

Back to writing… blog posts can be a bit like benchmarks: if it’s one vendor vs another then your scepticism level should go through the roof, probably into orbit. Not only might it be an apples to oranges comparison, but a biased one. Likewise if I am writing about why I don’t like some aspect of another messaging system, is that biased or is it an impartial analysis? So I’ll stick to RabbitMQ for now.

If you like my writing about RabbitMQ, I will be posting at least monthly on the RabbitMQ blog about things that I find interesting and that I think will be valuable to the community. Feel free to suggest subjects to me that you’d like me to cover.

A Look at Multi-Topic Subscriptions with Apache Pulsar

A Look at Multi-Topic Subscriptions with Apache Pulsar

This is a sister post to one I am writing about multi-topic subscriptions with Apache Kafka that you can read soon on the Cloud Karafka blog (link coming soon). I will provide a summary of those results before we get started with Apache Pulsar. The run the same tests in my tests of both technologies.

The objective is to get an understanding of what to expect from multi-topic subscriptions, specifically we are testing message ordering. Message ordering is a fundamental component of messaging systems and even though cross topic ordering is not guaranteed by Pulsar or Kafka, I find it interesting and useful to know what to expect.