Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erik's mods to EWD998 from class on 2022-07-12 #7

Open
wants to merge 2 commits into
base: 072022
Choose a base branch
from

Conversation

xkrogen
Copy link

@xkrogen xkrogen commented Jul 13, 2022

Primarily closely read the paper's description of the algorithm and made changes to match the spec to the text description. One of the biggest changes was that nodes get marked as tainted when they RECEIVE a message, not when they SEND a message. Whoops! There were also minor changes necessary to match the token's initial state (I think we previously marked it tainted at the start if the initiator is tainted, but that's not right, the token always starts untained and the initiator's taint status gets checked when the token makes its way back around).

I also had to make small adjustments to the ATD spec, most notably to change the WF criteria to only check for Terminate, instead of Next. Not sure if that's "cheating", but I think it makes sense based on my understanding: as with EWD998, we don't want to force nodes to send/rcv messages indefinitely, we just want to ensure that we detect termination.

I validated this up to 3 nodes, and am trying on 4, though it has been running for multiple hours and generated 10M+ states without finishing...

EWD998.tla Outdated
\* /\ WF_vars(Next)
/\ \A i \in Node: WF_vars(PassToken(i) \/ InitiateToken(i))

/\ (\A i \in Node: WF_vars(PassToken(i) \/ InitiateToken(i) \/ Terminate(i)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if we should enforce Terminate here, as our termination algorithm should ideally be safe and live even for a system that never terminates? But realistically a valid computation task should terminate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking if WF termination might be too strong as it might ask the state chain to terminate sooner?

Primarily closely read the paper's description of the algorithm and made changes to match the spec to the text description. One of the biggest changes was that nodes get marked as tainted when they RECEIVE a message, not when they SEND a message. Whoops! There were also minor changes necessary to match the token's initial state (I think we previously marked it tainted at the start if the initiator is tainted, but that's not right, the token always starts untained and the initiator's taint status gets checked when the token makes its way back around).

I also had to make small adjustments to the ATD spec, most notably to change the WF criteria to only check for Terminate, instead of Next. Not sure if that's "cheating", but I think it makes sense based on my understanding: as with EWD998, we don't want to force nodes to send/rcv messages indefinitely, we just want to ensure that we detect termination.

I validated this up to 4 nodes. Note that it took about 2-3 hours to validate on a 2019 MBP for 4 nodes, generate 10M states and exploring over 350M.

Below is the output for the following config file:
```
SPECIFICATION Spec
INVARIANT  TypeOK
CONSTANT N = 4
CONSTRAINT Constraint
CONSTANT Init <- MCInit
ALIAS  Alias
INVARIANT IInv
PROPERTY ATDSpec
```

```
/Library/Java/JavaVirtualMachines/jdk11.0.13.8-msft.jdk/Contents/Home/bin/java -XX:+UseParallelGC -cp /Users/ekrogen/.vscode/extensions/alygin.vscode-tlaplus-nightly-2022.5.2815/tools/tla2tools.jar -Dtlc2.TLC.ide=vscode tlc2.TLC EWD998.tla -tool -modelcheck -config EWD998.cfg -deadlock -noTE

TLC2 Version 2.18 of Day Month 20?? (rev: 6ee6e9f)
Running breadth-first search Model-Checking with fp 57 and seed 3965354632801637035 with 1 worker on 16 cores with 7282MB heap and 64MB offheap memory [pid: 78154] (Mac OS X 12.4 x86_64, Microsoft 11.0.13 x86_64, MSBDiskFPSet, DiskStateQueue).
Starting SANY...
Parsing file /Users/ekrogen/dev/tlaplus-workshop/EWD998.tla
...
Starting... (2022-07-12 18:06:50)
Implied-temporal checking--satisfiability problem has 1 branches.
....
Finished computing initial states: 16 distinct states generated at 2022-07-12 18:06:50.
Checking temporal properties for the current state space with 2834 total distinct states at (2022-07-12 18:06:53)
Finished checking temporal properties in 00s at 2022-07-12 18:06:53
...
Progress(5) at 2022-07-12 18:06:53: 90,980 states generated (90,980 s/min), 9,131 distinct states found (9,131 ds/min), 6,296 states left on queue.
Checking temporal properties for the current state space with 96397 total distinct states at (2022-07-12 18:07:53)
Finished checking temporal properties in 01s at 2022-07-12 18:07:55
....
Progress(26) at 2022-07-13 09:01:07: 346,168,712 states generated (4,791,239 s/min), 10,338,601 distinct states found (105,372 ds/min), 163,790 states left on queue.
Progress(27) at 2022-07-13 09:02:07: 351,116,470 states generated (4,947,758 s/min), 10,434,714 distinct states found (96,113 ds/min), 110,216 states left on queue.
Progress(29) at 2022-07-13 09:03:07: 355,834,008 states generated (4,717,538 s/min), 10,532,678 distinct states found (97,964 ds/min), 65,050 states left on queue.
Checkpointing of run states/22-07-12-18-06-49.957
Checkpointing completed at (2022-07-13 09:04:08)
Progress(32) at 2022-07-13 09:04:08: 360,659,672 states generated (4,825,664 s/min), 10,629,235 distinct states found (96,557 ds/min), 17,243 states left on queue.
Progress(49) at 2022-07-13 09:04:25: 362,113,272 states generated, 10,656,918 distinct states found, 0 states left on queue.
Checking temporal properties for the complete state space with 10656918 total distinct states at (2022-07-13 09:04:25)
Finished checking temporal properties in 02min 36s at 2022-07-13 09:07:01
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 2.0E-4
  based on the actual fingerprints:  val = 6.6E-9
Progress(49) at 2022-07-13 09:07:02: 362,113,272 states generated (402,253 s/min), 10,656,918 distinct states found (11,838 ds/min), 0 states left on queue.
362113272 states generated, 10656918 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 49.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 18 and the 95th percentile is 3).
Finished in 54013024ms at (2022-07-13 09:07:02)
```
…Termination action and apply fairness on that
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants