Jack Vanlightly

Formal verification

To be atomic or non-atomic, that is the question (Fizzbee)

To be atomic or non-atomic, that is the question (Fizzbee)

After posting my last Kafka transactions diary entry, JP (the Fizzbee maintainer) wrote a refactored version using non-atomic actions and a different way of representing the network. It’s a very interesting variant and I’m tempted to switch over to his version.

When an action is not atomic, execution of an action could yield at any moment to a different action in a different role instance or even the same role instance. With this yielding we can also replace explicit message passing with direct invocation of a function on another node. The invocation and the response can all yield to other concurrent events happening across the system.

Let me use a simple example to demonstrate - Ping-Pong.

Share

An introduction to symmetry in TLA+

An introduction to symmetry in TLA+

Symmetry reduction in TLA+ is a clever trick for cutting down the size of the state space we need to explore during model checking. Think about a distributed system with interchangeable components—servers, nodes, or processes that behave identically. Without symmetry reduction, the model checker wastes time exploring states that are essentially duplicates, differing only in the labels we’ve assigned to these components. Symmetry reduction says, “Hey, if swapping the identities of these components doesn’t change the behavior of the system, let’s treat those states as one.” This massively reduces the computational effort while keeping the results valid.

In this post, I’ll show some simple examples of symmetry using trivial specs where we can actually visualize the state space. The idea is to build a mental model of how symmetry reduction works.

Share

Obtaining statistical properties through modeling and simulation

Obtaining statistical properties through modeling and simulation

Sophisticated, simulations need not be. Valuable insights, even simple scripts reveal. — Formal Methods Yoda

A couple of weeks ago I was a guest on The Geek Narrator to talk about formal verification. I spoke a lot about how modeling and simulation are tremendously powerful tools, whether you use a formal verification language (such as TLA+) or just a Python script.

This post goes through a real world example of how I used modeling and simulation to understand the statistical properties of a proposed distributed system protocol, using both Python and TLA+. There is a talk version of this post from TLA+ Conf 2022.

Share

The importance of liveness properties (with TLA+ Part 2)

The importance of liveness properties (with TLA+ Part 2)

In part 1 we introduced the concept of safety and liveness properties, then a stupidly simple gossip protocol called Gossa. Our aim is to find liveness bugs in the design and improve the design until all liveness issues are fixed.

Gossa had some problems. First it had cycles due to nodes contesting whether a peer was dead or alive. We fixed that by making deadness take precedence over aliveness but still the cluster could not converge. The next problem was that a falsely accused dead node was unable to refute its deadness as no-one would pay attention to it - deadness ruled.

The proposed fix I mentioned in part 1 was to allow a falsely accused node to refute its deadness via the introduction of a monotonic counter.

The importance of liveness properties (with TLA+ Part 1)

The importance of liveness properties (with TLA+ Part 1)

Invariants get most of the attention because they are easy to write, easy to check and find those histories which lead to really bad outcomes, such as lost data. But liveness properties are really important too and after a years of writing TLA+ specifications, I couldn’t imagine having confidence in a specification without one. This post and the next is a random walk through the world of model checking liveness properties in TLA+.

The outline is like this:

  • Part 1: I (hopefully) convince you that liveness properties are important. Then implement a gossip algorithm in TLA+ and use liveness properties to find problems.

  • Part 2: Continue evolving the algorithm, finding more and more liveness problems, overcome some challenges such as infinite state-space and impart some helpful principles - making you a better engineer and thinker by the end.