Jack Vanlightly

Building A "Simple" Distributed System - Formal Verification

Building A "Simple" Distributed System - Formal Verification

In the last post, we described a protocol that should satisfy the requirements and invariants established in the first post. Today we will look at formal verification with TLA+.

Formal verification is just another (niche) tool in the toolbox. Some tools require more skill than others to use. Some tools are more expensive than others. It is up to the practioner to decide if/when/how to use them.

The hard part is that you won't necessarily know if it is beneficial to a given problem you face, if you aren't already skilled in it. If a tool is very difficult to learn, then you might never invest in it enough to be able to make that call. Or you might invest a lot of time into it, to find it isn't a great match for your problem. At which point it gets stowed in your toolbox where it may or may not get used again. I expect many software engineers see learning formal methods as a difficult (it is) and high risk venture.

So, given the above, my aim of this post is for software engineers without prior experience of TLA+ to be able to get the gist of the spec and see why it was useful for this project. Please give me feedback if I succeeded or not.

Building A "Simple" Distributed System - The Protocol

Building A "Simple" Distributed System - The Protocol

In the last post we covered what our distributed resource allocation library, Rebalanser, should do. In this post we’ll look at a protocol that could achieve those requirements, always respecting our invariants (described in the last post).

A protocol is basically a set of rules which govern how each node in a Rebalanser group acts in order to achieve the desired behaviours. Each node must communicate with the others in such a way that it can achieve consensus about the resource allocations and also guarantee that it does not start accessing a resource until another node has stopped accessing it.

Building a "Simple" Distributed System - The What

Building a "Simple" Distributed System - The What

This is a blog series where I share my approach and experience of building a distributed resource allocation library. As far as distributed systems go, it is a simple one and ideal as a tool for learning about distributed systems design, programming and testing.

The field of distributed systems is large, encompassing a myriad of academic work, algorithms, consistency models, data types, testing tools/techniques, formal verification tools and more. I will be covering just the theory, tools and techniques that were relevant for my little project.