Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts to produce performance benchmarks across all examples #118

Open
tchajed opened this issue Jul 14, 2023 · 3 comments
Open

Add scripts to produce performance benchmarks across all examples #118

tchajed opened this issue Jul 14, 2023 · 3 comments

Comments

@tchajed
Copy link
Collaborator

tchajed commented Jul 14, 2023

We should have a way to run performance benchmarks across all the examples, which encodes the configuration for algorithms like inference and bounded model checking and gathers timing and memory usage.

I envision this being written in Rust, with a small library of functions for running the flyvy binary and gathering statistics. We'll use the same mechanism as time, but perhaps implemented directly in Rust calling getrusage directly rather than wrapping the system binary. This will make it possible to get gather memory usage over time as well by sampling getrusage.

Using getrusage isn't completely trivial because it only gets statistics for the calling process (and children), not for an arbitrary child pid. I may implement this by starting out using time and then replacing it with our own version that reports the results in JSON.

@wilcoxjay
Copy link
Collaborator

Thanks for filing this issue! I have a couple of questions.

  • Will it work on macOS as well as Linux?
  • When you say "not for an arbitrary child pid", you seem to imply that /usr/bin/time (or the shell buitin time maybe? I wasn't clear) does have that feature. How does it implement that?

@tchajed
Copy link
Collaborator Author

tchajed commented Jul 17, 2023

macOS has the same getrusage so yes, it should be the same.

You can get resource usage for all children of the calling process with RUSAGE_CHILDREN. /usr/bin/time works because it measures itself + the child it spawns, and we assume that time itself has basically no overhead. The benchmarking can also do this but then it needs to spawn a process that spawns temporal-verifier.

@wilcoxjay
Copy link
Collaborator

I see, so the problem with using getrusage with RUSAGE_CHILDREN from our hypothetical top-level rust benchmarking script is that it would measure the sum of all the benchmarks, rather than the individual benchmarks? And so the intermediate process would be responsible for calling getrusage with RUSAGE_CHILDREN so that it only measures the one benchmark.

tchajed added a commit that referenced this issue Aug 24, 2023
The user-visible part of this is a library that allows running
`temporal-verifier` with a timeout and gathering some runtime resource
usage statistics as well as exit status, and a function to produce a
nice table of all the results. I've used this in the simplest possible
way to generate the results of running `temporal-verifier verify` over
all the benchmarks, with the results shown below.

This is the core infrastructure needed for #118. The remaining work
(hopefully small) is to add configurations to do something similar for
running qalpha and the bounded model checkers. These will be a bit more
work since those have more parameters to configure.

The infrastructure to monitor a benchmark is highly Unix-specific so it
works on macOS and Linux and will not work on Windows (but it should
work on WSL).

To produce this table run `cargo run -r --bin benchmark -- verify
--time-limit 30s`.

```txt
╭─────────┬──────────────────────────────────────────┬─────────┬────────┬──────────┬──────────┬────────┬─────────────╮
│ command │ file                                     │ outcome │ time s │ cpu util │ solver % │    mem │ params      │
├─────────┼──────────────────────────────────────────┼─────────┼────────┼──────────┼──────────┼────────┼─────────────┤
│ verify  │ consensus.fly                            │ fail    │    0.1 │     2.5× │      99% │   24MB │ --solver=z3 │
│ verify  │ consensus_epr.fly                        │         │    0.1 │     2.4× │      99% │   24MB │ --solver=z3 │
│ verify  │ consensus_forall.fly                     │         │    0.1 │     2.6× │      99% │   23MB │ --solver=z3 │
│ verify  │ fast_paxos_epr.fly                       │         │   24.1 │     1.4× │     100% │  172MB │ --solver=z3 │
│ verify  │ fol/block_cache_system_frozen_async.fly  │ timeout │   30.0 │     1.3× │     100% │ 1817MB │ --solver=z3 │
│ verify  │ fol/bosco_3t_safety.fly                  │ timeout │   30.0 │     0.7× │     100% │  313MB │ --solver=z3 │
│ verify  │ fol/cache.fly                            │         │    0.4 │     6.5× │     100% │   25MB │ --solver=z3 │
│ verify  │ fol/client_server_ae.fly                 │ fail    │    0.0 │     1.3× │      98% │   23MB │ --solver=z3 │
│ verify  │ fol/client_server_db_ae.fly              │         │    0.0 │     2.0× │      99% │   21MB │ --solver=z3 │
│ verify  │ fol/consensus_epr.fly                    │         │    0.1 │     2.6× │      99% │   24MB │ --solver=z3 │
│ verify  │ fol/consensus_forall.fly                 │         │    0.1 │     2.8× │      99% │   23MB │ --solver=z3 │
│ verify  │ fol/consensus_wo_decide.fly              │         │    0.1 │     2.3× │      99% │   23MB │ --solver=z3 │
│ verify  │ fol/fast_paxos_epr.fly                   │         │   24.2 │     1.5× │     100% │  175MB │ --solver=z3 │
│ verify  │ fol/fast_paxos_forall.fly                │ timeout │   30.0 │     0.9× │     100% │  195MB │ --solver=z3 │
│ verify  │ fol/fast_paxos_forall_choosable.fly      │ timeout │   30.0 │     0.0× │     100% │   35MB │ --solver=z3 │
│ verify  │ fol/firewall.fly                         │         │    0.0 │     1.0× │      97% │   21MB │ --solver=z3 │
│ verify  │ fol/flexible_paxos_epr.fly               │         │    0.2 │     2.5× │     100% │   25MB │ --solver=z3 │
│ verify  │ fol/flexible_paxos_forall.fly            │         │    0.4 │     1.9× │     100% │   32MB │ --solver=z3 │
│ verify  │ fol/flexible_paxos_forall_choosable.fly  │         │    2.9 │     1.1× │     100% │   56MB │ --solver=z3 │
│ verify  │ fol/hybrid_reliable_broadcast_cisa.fly   │         │    0.4 │     4.2× │     100% │   31MB │ --solver=z3 │
│ verify  │ fol/learning_switch.fly                  │         │    0.1 │     2.1× │      99% │   21MB │ --solver=z3 │
│ verify  │ fol/lockserv.fly                         │         │    0.1 │     3.1× │      99% │   22MB │ --solver=z3 │
│ verify  │ fol/multi_paxos_epr.fly                  │ timeout │   30.0 │     0.0× │     100% │   26MB │ --solver=z3 │
│ verify  │ fol/paxos_epr.fly                        │         │    1.0 │     1.4× │     100% │   39MB │ --solver=z3 │
│ verify  │ fol/paxos_forall.fly                     │         │    0.2 │     3.3× │     100% │   28MB │ --solver=z3 │
│ verify  │ fol/paxos_forall_choosable.fly           │ timeout │   30.0 │     0.0× │     100% │   28MB │ --solver=z3 │
│ verify  │ fol/raft.fly                             │ fail    │    0.0 │     0.4× │      88% │    1MB │ --solver=z3 │
│ verify  │ fol/ring_id.fly                          │         │    0.1 │     1.9× │      99% │   23MB │ --solver=z3 │
│ verify  │ fol/ring_id_not_dead.fly                 │         │    0.5 │     1.5× │     100% │   33MB │ --solver=z3 │
│ verify  │ fol/sharded_kv.fly                       │         │    0.1 │     1.9× │      99% │   23MB │ --solver=z3 │
│ verify  │ fol/sharded_kv_no_lost_keys.fly          │         │    0.1 │     1.8× │      99% │   24MB │ --solver=z3 │
│ verify  │ fol/stoppable_paxos_epr.fly              │ timeout │   30.0 │     0.2× │     100% │   94MB │ --solver=z3 │
│ verify  │ fol/stoppable_paxos_forall.fly           │ timeout │   30.0 │     0.3× │     100% │   90MB │ --solver=z3 │
│ verify  │ fol/stoppable_paxos_forall_choosable.fly │ timeout │   30.0 │     0.1× │     100% │   46MB │ --solver=z3 │
│ verify  │ fol/ticket.fly                           │         │    0.1 │     4.4× │     100% │   23MB │ --solver=z3 │
│ verify  │ fol/toy_consensus_epr.fly                │         │    0.1 │     1.9× │      99% │   23MB │ --solver=z3 │
│ verify  │ fol/toy_consensus_forall.fly             │         │    0.0 │     1.8× │      98% │   22MB │ --solver=z3 │
│ verify  │ fol/vertical_paxos_epr.fly               │         │    3.1 │     1.2× │     100% │   68MB │ --solver=z3 │
│ verify  │ fol/vertical_paxos_forall.fly            │         │    0.3 │     3.7× │     100% │   27MB │ --solver=z3 │
│ verify  │ fol/vertical_paxos_forall_choosable.fly  │         │    1.2 │     1.7× │     100% │   49MB │ --solver=z3 │
│ verify  │ lockserver.fly                           │         │    0.0 │     3.2× │      99% │   22MB │ --solver=z3 │
│ verify  │ paxos_epr.fly                            │         │    0.9 │     1.2× │     100% │   37MB │ --solver=z3 │
│ verify  │ pingpong.fly                             │         │    0.0 │     0.5× │      92% │   19MB │ --solver=z3 │
╰─────────┴──────────────────────────────────────────┴─────────┴────────┴──────────┴──────────┴────────┴─────────────╯
total:    43
ok:       31
timeout:  9
fail:     3
```

To produce this table run `cargo run --bin benchmark -- verify
--time-limit 30s --solver cvc5`.

```txt
╭─────────┬──────────────────────────────────────────┬─────────┬────────┬──────────┬──────────┬──────┬───────────────╮
│ command │ file                                     │ outcome │ time s │ cpu util │ solver % │  mem │ params        │
├─────────┼──────────────────────────────────────────┼─────────┼────────┼──────────┼──────────┼──────┼───────────────┤
│ verify  │ consensus.fly                            │ fail    │    0.2 │     2.2× │     100% │ 20MB │ --solver=cvc5 │
│ verify  │ consensus_epr.fly                        │         │    1.1 │     1.5× │     100% │ 33MB │ --solver=cvc5 │
│ verify  │ consensus_forall.fly                     │         │    0.3 │     1.9× │     100% │ 25MB │ --solver=cvc5 │
│ verify  │ fast_paxos_epr.fly                       │ timeout │   30.0 │     0.1× │     100% │ 39MB │ --solver=cvc5 │
│ verify  │ fol/block_cache_system_frozen_async.fly  │         │    7.3 │     6.7× │     100% │ 59MB │ --solver=cvc5 │
│ verify  │ fol/bosco_3t_safety.fly                  │ timeout │   30.0 │     0.5× │     100% │ 78MB │ --solver=cvc5 │
│ verify  │ fol/cache.fly                            │         │    1.6 │     8.2× │     100% │ 36MB │ --solver=cvc5 │
│ verify  │ fol/client_server_ae.fly                 │ fail    │    0.0 │     1.6× │      98% │ 13MB │ --solver=cvc5 │
│ verify  │ fol/client_server_db_ae.fly              │         │    0.1 │     2.8× │     100% │ 17MB │ --solver=cvc5 │
│ verify  │ fol/consensus_epr.fly                    │         │    1.0 │     1.6× │     100% │ 32MB │ --solver=cvc5 │
│ verify  │ fol/consensus_forall.fly                 │         │    0.3 │     2.1× │     100% │ 25MB │ --solver=cvc5 │
│ verify  │ fol/consensus_wo_decide.fly              │         │    0.2 │     1.8× │     100% │ 25MB │ --solver=cvc5 │
│ verify  │ fol/fast_paxos_epr.fly                   │ timeout │   30.0 │     0.1× │     100% │ 43MB │ --solver=cvc5 │
│ verify  │ fol/fast_paxos_forall.fly                │ timeout │   30.0 │     0.1× │     100% │ 45MB │ --solver=cvc5 │
│ verify  │ fol/fast_paxos_forall_choosable.fly      │ timeout │   30.0 │     0.5× │     100% │ 63MB │ --solver=cvc5 │
│ verify  │ fol/firewall.fly                         │         │    0.2 │     1.9× │     100% │ 23MB │ --solver=cvc5 │
│ verify  │ fol/flexible_paxos_epr.fly               │         │   22.8 │     2.2× │     100% │ 90MB │ --solver=cvc5 │
│ verify  │ fol/flexible_paxos_forall.fly            │         │    2.4 │     2.4× │     100% │ 42MB │ --solver=cvc5 │
│ verify  │ fol/flexible_paxos_forall_choosable.fly  │         │    3.8 │     2.6× │     100% │ 56MB │ --solver=cvc5 │
│ verify  │ fol/hybrid_reliable_broadcast_cisa.fly   │         │   22.8 │     3.3× │     100% │ 74MB │ --solver=cvc5 │
│ verify  │ fol/learning_switch.fly                  │         │    1.0 │     1.6× │     100% │ 29MB │ --solver=cvc5 │
│ verify  │ fol/lockserv.fly                         │         │    0.1 │     4.5× │     100% │ 15MB │ --solver=cvc5 │
│ verify  │ fol/multi_paxos_epr.fly                  │         │   18.1 │     2.9× │     100% │ 91MB │ --solver=cvc5 │
│ verify  │ fol/paxos_epr.fly                        │ timeout │   30.0 │     0.2× │     100% │ 61MB │ --solver=cvc5 │
│ verify  │ fol/paxos_forall.fly                     │         │    2.4 │     3.8× │     100% │ 44MB │ --solver=cvc5 │
│ verify  │ fol/paxos_forall_choosable.fly           │         │    3.6 │     2.8× │     100% │ 56MB │ --solver=cvc5 │
│ verify  │ fol/raft.fly                             │ fail    │    0.0 │     0.4× │      89% │  1MB │ --solver=cvc5 │
│ verify  │ fol/ring_id.fly                          │         │    0.4 │     1.5× │     100% │ 30MB │ --solver=cvc5 │
│ verify  │ fol/ring_id_not_dead.fly                 │         │    4.4 │     2.5× │     100% │ 45MB │ --solver=cvc5 │
│ verify  │ fol/sharded_kv.fly                       │         │    0.2 │     2.8× │     100% │ 18MB │ --solver=cvc5 │
│ verify  │ fol/sharded_kv_no_lost_keys.fly          │         │    1.0 │     1.4× │     100% │ 35MB │ --solver=cvc5 │
│ verify  │ fol/stoppable_paxos_epr.fly              │ timeout │   30.0 │     0.1× │     100% │ 36MB │ --solver=cvc5 │
│ verify  │ fol/stoppable_paxos_forall.fly           │ timeout │   30.0 │     2.1× │     100% │ 81MB │ --solver=cvc5 │
│ verify  │ fol/stoppable_paxos_forall_choosable.fly │ timeout │   30.0 │     0.6× │     100% │ 85MB │ --solver=cvc5 │
│ verify  │ fol/ticket.fly                           │         │    0.3 │     3.9× │     100% │ 27MB │ --solver=cvc5 │
│ verify  │ fol/toy_consensus_epr.fly                │         │    0.4 │     2.1× │     100% │ 34MB │ --solver=cvc5 │
│ verify  │ fol/toy_consensus_forall.fly             │         │    0.1 │     2.6× │      99% │ 19MB │ --solver=cvc5 │
│ verify  │ fol/vertical_paxos_epr.fly               │ timeout │   30.0 │     0.1× │     100% │ 50MB │ --solver=cvc5 │
│ verify  │ fol/vertical_paxos_forall.fly            │ timeout │   30.0 │     0.3× │     100% │ 56MB │ --solver=cvc5 │
│ verify  │ fol/vertical_paxos_forall_choosable.fly  │ timeout │   30.0 │     1.0× │     100% │ 72MB │ --solver=cvc5 │
│ verify  │ lockserver.fly                           │         │    0.1 │     4.7× │     100% │ 15MB │ --solver=cvc5 │
│ verify  │ paxos_epr.fly                            │ timeout │   30.0 │     0.2× │     100% │ 59MB │ --solver=cvc5 │
│ verify  │ pingpong.fly                             │         │    0.0 │     0.6× │      89% │  7MB │ --solver=cvc5 │
╰─────────┴──────────────────────────────────────────┴─────────┴────────┴──────────┴──────────┴──────┴───────────────╯
total:    43
ok:       27
timeout:  13
fail:     3
```

---------

Signed-off-by: Tej Chajed <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants