October 9, 2021

Learn about TLA+ and the formal verification of Apache BookKeeper

October 9, 2021

At the time of writing I work at Splunk in the messaging-as-a-service team (we offer Apache Pulsar as in internal Splunk service). In late 2020, early 2021 I decided to formally verify the Apache BookKeeper protocol in TLA+. My main objective was to simply learn the protocol by reverse engineering the code into a specification and that worked extremely well. I also found a protocol bug and an implementation bug as a result which was an added bonus.

Using TLA+ to learn how an existing system works is an amazingly effective learning method. Yes you can read code and docs and you might end up with a hand-wavy level of clarity. But modelling a system in something like TLA+ leaves no room for ambiguity. So I highly recommend it.

You can read about it on the Splunk messaging-as-a-service team blog https://medium.com/splunk-maas:

You can see the current state of the project in my GitHub repo: https://github.com/Vanlightly/bookkeeper-tlaplus.

At the time of writing the Splunk messaging-as-a-service is hiring software engineers, so do contact me if you are interested in working on Apache Pulsar, Apache BookKeeper and all the tooling required to run these systems as a service.