Jack Vanlightly

Apache BookKeeper

Tweaking the BookKeeper protocol - Unbounded Ledgers

In the last post I described the necessary protocol changes to ensure that all entries in closed ledgers reached Write Quorum (WQ) and all entries in all but the last fragment in open ledgers reach write quorum.

In this post we’re going to look at another tweak to the protocol to allow ledgers to be unbounded and allow writes from multiple clients over their lifetime.

Tweaking the BookKeeper protocol - Guaranteeing write quorum

Introduction

Recently I wrote a blog post on my team blog about the differences between Raft and the Apache BookKeeper replication protocol. In it I covered one difference that surprises people which is that a ledger can have multiple blocks of entries that only ever reach Ack Quorum and not Write Quorum due to how ensemble changes work. A Raft log on the other hand has the property that the replication factor (RF) reached by any given entry matches the following:

Prefix RF >= Entry RF >= Suffix RF

That is to say, if a given entry has reached RF of 3, then the entire log prefix must also be at 3 or above (depending on the desired RF configured). But with BookKeeper that is not the case. For example, with WQ=3/AQ=2, a given entry that has reached RF of 3 may have entries before it that only reached RF of 2

Learn about TLA+ and the formal verification of Apache BookKeeper

At the time of writing I work at Splunk in the messaging-as-a-service team (we offer Apache Pulsar as in internal Splunk service). In late 2020, early 2021 I decided to formally verify the Apache BookKeeper protocol in TLA+. My main objective was to simply learn the protocol by reverse engineering the code into a specification and that worked extremely well. I also found a protocol bug and an implementation bug as a result which was an added bonus.

Using TLA+ to learn how an existing system works is an amazingly effective learning method. Yes you can read code and docs and you might end up with a hand-wavy level of clarity. But modelling a system in something like TLA+ leaves no room for ambiguity. So I highly recommend it.

You can read about it on the Splunk messaging-as-a-service team blog https://medium.com/splunk-maas:

You can see the current state of the project in my GitHub repo: https://github.com/Vanlightly/bookkeeper-tlaplus.

At the time of writing the Splunk messaging-as-a-service is hiring software engineers, so do contact me if you are interested in working on Apache Pulsar, Apache BookKeeper and all the tooling required to run these systems as a service.