Repository Has Moved

IMPORTANT NOTICE:

As of May 2022, this repository has been migrated to a new location. The development, maintenance, and distribution of the project will now be done from the new repository at https://github.com/tlaplus-workshops/ewd998/.

Please update your bookmarks, clones, and remotes to the new repository location.

ewd998

Experience TLA+ in action by specifying distributed termination detection on a ring, due to Shmuel Safra. Each git commit introduces a new TLA+ concept. Go back to the very first commit to follow along!

v00: IDE

Click either one of the buttons to launch a zero-install IDE to give the TLA+ specification language a try:

(=> Screencast how to create the TLA+ Codespace)

v01: Problem statement - Termination detection in a ring

v01a: Termination of pleasingly parallel

For this tutorial, we assume that the distributed system nodes are organized as a ring, with one the (unique) leader[^1]. If we further assume that nodes execute independent computations, (global) termination detection becomes trivial--the leader initiates a token transfer around the ring, and each node passes the token to its next neighbor, iff the node finished its computation. When the initiator receives back the token, it knows that all (other) nodes have terminated.

This problem is too simple, and we don't need TLA+ to model it.

[^1] Perhaps by some leader election algorithm.

v01b: Termination of collaborative computation

A more interesting problem is to look at a "collaborative" computation, which implies that nodes can re-activate each other. For example, the result of a computation at node 23 is (atomically!) sent to and further processed at node 42. With the previous protocol, node 42 might have already passed on the token, causing the initiator to eventually detect (global) termination; a bug that is at least difficult to reproduce with testing! A solution is offered in EWD840:

Initiator sends a "stateful" token around the ring
Each node remembers if it activated another node
Activation taints the token (when the activator gets the token)
Initiator keeps running rounds until it receives an untainted token

v01c: Termination detection with asynchronous message delivery

What happens if we loosen the restriction that message delivery is atomic (it seldom is)? Clearly, we are back at square one:

Node 23 sends a message to 42
23 taints the token
Initiator starts a new round
Node 42 received the fresh token before receiving the activation message from 23
Boom!

The fix proposed in Shmuel Safra's EWD998, is to count in-flight messages. But will this work?

Throughout the chapters of this tutorial, we will use the TLA+ specification language to model EWD998, and check interesting properties.

v02: High-level spec AsyncTerminationDetection

TLA+ is all about abstraction, and, as we will later see, has first-class support to connect different levels of abstraction. Let's use this and write a basic spec that either falsifies our design above, or gives us sufficient confidence to invest in writing a more detailed spec.

(Credit: Stephan Merz wrote AsyncTerminationDetection)

v02a: Spec skeleton

Instead of modeling message channels, let alone modeling the transport layer, we will write a spec that models:

A ring of N nodes
The activation status of each node
The number of messages pending[^2] at a node
A send action
A receive action
A terminate action
The initial configuration of the system

Please switch to AsyncTerminationDetection.tla and read its comments. From here on, the tutorial continues there...

[^2] It's difficult to (efficiently) count pending messages in an implementation. In a TLA+ spec, we don't care about that notion of efficiency. Also, all variables are global.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository Has Moved

ewd998

v00: IDE

v01: Problem statement - Termination detection in a ring

v01a: Termination of pleasingly parallel

v01b: Termination of collaborative computation

v01c: Termination detection with asynchronous message delivery

v02: High-level spec AsyncTerminationDetection

v02a: Spec skeleton

Files

README.md

Latest commit

History

README.md

File metadata and controls

Repository Has Moved

ewd998

v00: IDE

v01: Problem statement - Termination detection in a ring

v01a: Termination of pleasingly parallel

v01b: Termination of collaborative computation

v01c: Termination detection with asynchronous message delivery

v02: High-level spec AsyncTerminationDetection

v02a: Spec skeleton