-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scripts to produce performance benchmarks across all examples #118
Comments
Thanks for filing this issue! I have a couple of questions.
|
macOS has the same You can get resource usage for all children of the calling process with |
I see, so the problem with using |
The user-visible part of this is a library that allows running `temporal-verifier` with a timeout and gathering some runtime resource usage statistics as well as exit status, and a function to produce a nice table of all the results. I've used this in the simplest possible way to generate the results of running `temporal-verifier verify` over all the benchmarks, with the results shown below. This is the core infrastructure needed for #118. The remaining work (hopefully small) is to add configurations to do something similar for running qalpha and the bounded model checkers. These will be a bit more work since those have more parameters to configure. The infrastructure to monitor a benchmark is highly Unix-specific so it works on macOS and Linux and will not work on Windows (but it should work on WSL). To produce this table run `cargo run -r --bin benchmark -- verify --time-limit 30s`. ```txt ╭─────────┬──────────────────────────────────────────┬─────────┬────────┬──────────┬──────────┬────────┬─────────────╮ │ command │ file │ outcome │ time s │ cpu util │ solver % │ mem │ params │ ├─────────┼──────────────────────────────────────────┼─────────┼────────┼──────────┼──────────┼────────┼─────────────┤ │ verify │ consensus.fly │ fail │ 0.1 │ 2.5× │ 99% │ 24MB │ --solver=z3 │ │ verify │ consensus_epr.fly │ │ 0.1 │ 2.4× │ 99% │ 24MB │ --solver=z3 │ │ verify │ consensus_forall.fly │ │ 0.1 │ 2.6× │ 99% │ 23MB │ --solver=z3 │ │ verify │ fast_paxos_epr.fly │ │ 24.1 │ 1.4× │ 100% │ 172MB │ --solver=z3 │ │ verify │ fol/block_cache_system_frozen_async.fly │ timeout │ 30.0 │ 1.3× │ 100% │ 1817MB │ --solver=z3 │ │ verify │ fol/bosco_3t_safety.fly │ timeout │ 30.0 │ 0.7× │ 100% │ 313MB │ --solver=z3 │ │ verify │ fol/cache.fly │ │ 0.4 │ 6.5× │ 100% │ 25MB │ --solver=z3 │ │ verify │ fol/client_server_ae.fly │ fail │ 0.0 │ 1.3× │ 98% │ 23MB │ --solver=z3 │ │ verify │ fol/client_server_db_ae.fly │ │ 0.0 │ 2.0× │ 99% │ 21MB │ --solver=z3 │ │ verify │ fol/consensus_epr.fly │ │ 0.1 │ 2.6× │ 99% │ 24MB │ --solver=z3 │ │ verify │ fol/consensus_forall.fly │ │ 0.1 │ 2.8× │ 99% │ 23MB │ --solver=z3 │ │ verify │ fol/consensus_wo_decide.fly │ │ 0.1 │ 2.3× │ 99% │ 23MB │ --solver=z3 │ │ verify │ fol/fast_paxos_epr.fly │ │ 24.2 │ 1.5× │ 100% │ 175MB │ --solver=z3 │ │ verify │ fol/fast_paxos_forall.fly │ timeout │ 30.0 │ 0.9× │ 100% │ 195MB │ --solver=z3 │ │ verify │ fol/fast_paxos_forall_choosable.fly │ timeout │ 30.0 │ 0.0× │ 100% │ 35MB │ --solver=z3 │ │ verify │ fol/firewall.fly │ │ 0.0 │ 1.0× │ 97% │ 21MB │ --solver=z3 │ │ verify │ fol/flexible_paxos_epr.fly │ │ 0.2 │ 2.5× │ 100% │ 25MB │ --solver=z3 │ │ verify │ fol/flexible_paxos_forall.fly │ │ 0.4 │ 1.9× │ 100% │ 32MB │ --solver=z3 │ │ verify │ fol/flexible_paxos_forall_choosable.fly │ │ 2.9 │ 1.1× │ 100% │ 56MB │ --solver=z3 │ │ verify │ fol/hybrid_reliable_broadcast_cisa.fly │ │ 0.4 │ 4.2× │ 100% │ 31MB │ --solver=z3 │ │ verify │ fol/learning_switch.fly │ │ 0.1 │ 2.1× │ 99% │ 21MB │ --solver=z3 │ │ verify │ fol/lockserv.fly │ │ 0.1 │ 3.1× │ 99% │ 22MB │ --solver=z3 │ │ verify │ fol/multi_paxos_epr.fly │ timeout │ 30.0 │ 0.0× │ 100% │ 26MB │ --solver=z3 │ │ verify │ fol/paxos_epr.fly │ │ 1.0 │ 1.4× │ 100% │ 39MB │ --solver=z3 │ │ verify │ fol/paxos_forall.fly │ │ 0.2 │ 3.3× │ 100% │ 28MB │ --solver=z3 │ │ verify │ fol/paxos_forall_choosable.fly │ timeout │ 30.0 │ 0.0× │ 100% │ 28MB │ --solver=z3 │ │ verify │ fol/raft.fly │ fail │ 0.0 │ 0.4× │ 88% │ 1MB │ --solver=z3 │ │ verify │ fol/ring_id.fly │ │ 0.1 │ 1.9× │ 99% │ 23MB │ --solver=z3 │ │ verify │ fol/ring_id_not_dead.fly │ │ 0.5 │ 1.5× │ 100% │ 33MB │ --solver=z3 │ │ verify │ fol/sharded_kv.fly │ │ 0.1 │ 1.9× │ 99% │ 23MB │ --solver=z3 │ │ verify │ fol/sharded_kv_no_lost_keys.fly │ │ 0.1 │ 1.8× │ 99% │ 24MB │ --solver=z3 │ │ verify │ fol/stoppable_paxos_epr.fly │ timeout │ 30.0 │ 0.2× │ 100% │ 94MB │ --solver=z3 │ │ verify │ fol/stoppable_paxos_forall.fly │ timeout │ 30.0 │ 0.3× │ 100% │ 90MB │ --solver=z3 │ │ verify │ fol/stoppable_paxos_forall_choosable.fly │ timeout │ 30.0 │ 0.1× │ 100% │ 46MB │ --solver=z3 │ │ verify │ fol/ticket.fly │ │ 0.1 │ 4.4× │ 100% │ 23MB │ --solver=z3 │ │ verify │ fol/toy_consensus_epr.fly │ │ 0.1 │ 1.9× │ 99% │ 23MB │ --solver=z3 │ │ verify │ fol/toy_consensus_forall.fly │ │ 0.0 │ 1.8× │ 98% │ 22MB │ --solver=z3 │ │ verify │ fol/vertical_paxos_epr.fly │ │ 3.1 │ 1.2× │ 100% │ 68MB │ --solver=z3 │ │ verify │ fol/vertical_paxos_forall.fly │ │ 0.3 │ 3.7× │ 100% │ 27MB │ --solver=z3 │ │ verify │ fol/vertical_paxos_forall_choosable.fly │ │ 1.2 │ 1.7× │ 100% │ 49MB │ --solver=z3 │ │ verify │ lockserver.fly │ │ 0.0 │ 3.2× │ 99% │ 22MB │ --solver=z3 │ │ verify │ paxos_epr.fly │ │ 0.9 │ 1.2× │ 100% │ 37MB │ --solver=z3 │ │ verify │ pingpong.fly │ │ 0.0 │ 0.5× │ 92% │ 19MB │ --solver=z3 │ ╰─────────┴──────────────────────────────────────────┴─────────┴────────┴──────────┴──────────┴────────┴─────────────╯ total: 43 ok: 31 timeout: 9 fail: 3 ``` To produce this table run `cargo run --bin benchmark -- verify --time-limit 30s --solver cvc5`. ```txt ╭─────────┬──────────────────────────────────────────┬─────────┬────────┬──────────┬──────────┬──────┬───────────────╮ │ command │ file │ outcome │ time s │ cpu util │ solver % │ mem │ params │ ├─────────┼──────────────────────────────────────────┼─────────┼────────┼──────────┼──────────┼──────┼───────────────┤ │ verify │ consensus.fly │ fail │ 0.2 │ 2.2× │ 100% │ 20MB │ --solver=cvc5 │ │ verify │ consensus_epr.fly │ │ 1.1 │ 1.5× │ 100% │ 33MB │ --solver=cvc5 │ │ verify │ consensus_forall.fly │ │ 0.3 │ 1.9× │ 100% │ 25MB │ --solver=cvc5 │ │ verify │ fast_paxos_epr.fly │ timeout │ 30.0 │ 0.1× │ 100% │ 39MB │ --solver=cvc5 │ │ verify │ fol/block_cache_system_frozen_async.fly │ │ 7.3 │ 6.7× │ 100% │ 59MB │ --solver=cvc5 │ │ verify │ fol/bosco_3t_safety.fly │ timeout │ 30.0 │ 0.5× │ 100% │ 78MB │ --solver=cvc5 │ │ verify │ fol/cache.fly │ │ 1.6 │ 8.2× │ 100% │ 36MB │ --solver=cvc5 │ │ verify │ fol/client_server_ae.fly │ fail │ 0.0 │ 1.6× │ 98% │ 13MB │ --solver=cvc5 │ │ verify │ fol/client_server_db_ae.fly │ │ 0.1 │ 2.8× │ 100% │ 17MB │ --solver=cvc5 │ │ verify │ fol/consensus_epr.fly │ │ 1.0 │ 1.6× │ 100% │ 32MB │ --solver=cvc5 │ │ verify │ fol/consensus_forall.fly │ │ 0.3 │ 2.1× │ 100% │ 25MB │ --solver=cvc5 │ │ verify │ fol/consensus_wo_decide.fly │ │ 0.2 │ 1.8× │ 100% │ 25MB │ --solver=cvc5 │ │ verify │ fol/fast_paxos_epr.fly │ timeout │ 30.0 │ 0.1× │ 100% │ 43MB │ --solver=cvc5 │ │ verify │ fol/fast_paxos_forall.fly │ timeout │ 30.0 │ 0.1× │ 100% │ 45MB │ --solver=cvc5 │ │ verify │ fol/fast_paxos_forall_choosable.fly │ timeout │ 30.0 │ 0.5× │ 100% │ 63MB │ --solver=cvc5 │ │ verify │ fol/firewall.fly │ │ 0.2 │ 1.9× │ 100% │ 23MB │ --solver=cvc5 │ │ verify │ fol/flexible_paxos_epr.fly │ │ 22.8 │ 2.2× │ 100% │ 90MB │ --solver=cvc5 │ │ verify │ fol/flexible_paxos_forall.fly │ │ 2.4 │ 2.4× │ 100% │ 42MB │ --solver=cvc5 │ │ verify │ fol/flexible_paxos_forall_choosable.fly │ │ 3.8 │ 2.6× │ 100% │ 56MB │ --solver=cvc5 │ │ verify │ fol/hybrid_reliable_broadcast_cisa.fly │ │ 22.8 │ 3.3× │ 100% │ 74MB │ --solver=cvc5 │ │ verify │ fol/learning_switch.fly │ │ 1.0 │ 1.6× │ 100% │ 29MB │ --solver=cvc5 │ │ verify │ fol/lockserv.fly │ │ 0.1 │ 4.5× │ 100% │ 15MB │ --solver=cvc5 │ │ verify │ fol/multi_paxos_epr.fly │ │ 18.1 │ 2.9× │ 100% │ 91MB │ --solver=cvc5 │ │ verify │ fol/paxos_epr.fly │ timeout │ 30.0 │ 0.2× │ 100% │ 61MB │ --solver=cvc5 │ │ verify │ fol/paxos_forall.fly │ │ 2.4 │ 3.8× │ 100% │ 44MB │ --solver=cvc5 │ │ verify │ fol/paxos_forall_choosable.fly │ │ 3.6 │ 2.8× │ 100% │ 56MB │ --solver=cvc5 │ │ verify │ fol/raft.fly │ fail │ 0.0 │ 0.4× │ 89% │ 1MB │ --solver=cvc5 │ │ verify │ fol/ring_id.fly │ │ 0.4 │ 1.5× │ 100% │ 30MB │ --solver=cvc5 │ │ verify │ fol/ring_id_not_dead.fly │ │ 4.4 │ 2.5× │ 100% │ 45MB │ --solver=cvc5 │ │ verify │ fol/sharded_kv.fly │ │ 0.2 │ 2.8× │ 100% │ 18MB │ --solver=cvc5 │ │ verify │ fol/sharded_kv_no_lost_keys.fly │ │ 1.0 │ 1.4× │ 100% │ 35MB │ --solver=cvc5 │ │ verify │ fol/stoppable_paxos_epr.fly │ timeout │ 30.0 │ 0.1× │ 100% │ 36MB │ --solver=cvc5 │ │ verify │ fol/stoppable_paxos_forall.fly │ timeout │ 30.0 │ 2.1× │ 100% │ 81MB │ --solver=cvc5 │ │ verify │ fol/stoppable_paxos_forall_choosable.fly │ timeout │ 30.0 │ 0.6× │ 100% │ 85MB │ --solver=cvc5 │ │ verify │ fol/ticket.fly │ │ 0.3 │ 3.9× │ 100% │ 27MB │ --solver=cvc5 │ │ verify │ fol/toy_consensus_epr.fly │ │ 0.4 │ 2.1× │ 100% │ 34MB │ --solver=cvc5 │ │ verify │ fol/toy_consensus_forall.fly │ │ 0.1 │ 2.6× │ 99% │ 19MB │ --solver=cvc5 │ │ verify │ fol/vertical_paxos_epr.fly │ timeout │ 30.0 │ 0.1× │ 100% │ 50MB │ --solver=cvc5 │ │ verify │ fol/vertical_paxos_forall.fly │ timeout │ 30.0 │ 0.3× │ 100% │ 56MB │ --solver=cvc5 │ │ verify │ fol/vertical_paxos_forall_choosable.fly │ timeout │ 30.0 │ 1.0× │ 100% │ 72MB │ --solver=cvc5 │ │ verify │ lockserver.fly │ │ 0.1 │ 4.7× │ 100% │ 15MB │ --solver=cvc5 │ │ verify │ paxos_epr.fly │ timeout │ 30.0 │ 0.2× │ 100% │ 59MB │ --solver=cvc5 │ │ verify │ pingpong.fly │ │ 0.0 │ 0.6× │ 89% │ 7MB │ --solver=cvc5 │ ╰─────────┴──────────────────────────────────────────┴─────────┴────────┴──────────┴──────────┴──────┴───────────────╯ total: 43 ok: 27 timeout: 13 fail: 3 ``` --------- Signed-off-by: Tej Chajed <[email protected]>
We should have a way to run performance benchmarks across all the examples, which encodes the configuration for algorithms like inference and bounded model checking and gathers timing and memory usage.
I envision this being written in Rust, with a small library of functions for running the flyvy binary and gathering statistics. We'll use the same mechanism as
time
, but perhaps implemented directly in Rust callinggetrusage
directly rather than wrapping the system binary. This will make it possible to get gather memory usage over time as well by samplinggetrusage
.Using
getrusage
isn't completely trivial because it only gets statistics for the calling process (and children), not for an arbitrary child pid. I may implement this by starting out usingtime
and then replacing it with our own version that reports the results in JSON.The text was updated successfully, but these errors were encountered: