The modulAr consTrained Horn clauses modEl validatioN frAmework, ATHENA for short, is a framework for the validation of models produced by CHC solvers. It executes the selected CHC solver(s) for a given set of inputs and if a result is SAT it produces and solves the necessary SMT instances to validate the CHC model. Different SMT solvers can be selected for validation, and if the selected solver produces UNSAT proofs capable of being checked, the framework automatically executes the necessary proof checker.
CHC solvers supported: Eldarica, Golem, and Spacer.
SMT solvers supported: cvc5, OpenSMT, SMTInterpol, veriT, and Z3.
Proof checkers supported: alfc 1, Carcara, LFSC checker, SMTInterpol checker, and TSWC.
ATHENA has been developed for a Linux environment and requires the following dependencies:
- GNU parallel to execute the selected tools.
- Python 3 to gather and display the results obtained.
- gnuplot to produce plots on demand.
To execute the framework, go to the scripts
folder and execute:
evaluate_all.sh [CHC solver] [SMT solver] [SMT mode] [Target] [Num. of threads]
[CHC solver]
is one of all
, none
, eldarica
, golem
, or spacer
.
[SMT solver]
is one of all
, none
, cvc5
, opensmt
, smtinterpol
, verit
, or z3
.
[SMT mode]
is one of all
, proof
, or noProof
.
[Target]
is the folder containing the CHC benchmarks, e.g., test
.
[Num. of threads]
is a positive integer.
By default ATHENA searches for the tools it needs in a local folder named binaries
, containing subfolders for chc_solvers
, smt_solvers
, and smt_checkers
. The tools' paths and options, as well as the time and memory limits, can be set in the run_*.sh
files. Please follow, or update, the current tool paths before executing; the tools' binaries have to be gathered separately.
The input CHC files have to follow the CHC-COMP format. If your files are following a different format, please use the CHC-COMP formatter to adjust them.
To gather the average sizes of models and proofs, as well as the average runtimes and memory consumptions, execute log_stats_all.sh
followed by python3 avg_stats_all.py
in the scripts
folder. After that, the classification of individual tool results obtained, e.g., SAT or UNSAT, can be seen by executing stats_all.sh
. To generate plots displaying the sizes of models and proofs execute make_gnuplot_proof-size.sh
.
To aggregate all individual tool results and to establish the validity of the CHC benchmarks execute log_results_all.sh
. After that, the aggregate results can be displayed by executing results_all.sh
.
The framework is currently capable of validating models produced for LIA and ALIA benchmarks, and has been used in the validation of the benchmarks from CHC-COMP 2022 and CHC-COMP 2024.
-
The 2022 benchmarks used came from the LIA-lin and LIA-nonlin tracks, and have been validated with the following tools: Eldarica v2.0.8, Golem v0.4.2, Z3 (Spacer) v4.12.1, cvc5 v1.0.5, OpenSMT v2.5.0, SMTInterpol (checker) v2.5-1259-gf8124b1f, veriT v2021.06.2-rmx, Carcara v1.0.0, LFSC checker 9aab068, and the latest version of TSWC.
-
The 2024 benchmarks used came from the LIA-lin, LIA-nonlin, LIA-Arrays-lin, and LIA-Arrays-nonlin tracks, and have been validated with the following tools: Eldarica v2.1, Golem v0.5.0, Z3 (Spacer) v4.13, cvc5 v1.1.2, OpenSMT v2.6.0, SMTInterpol (checker) v2.5-1256-g55d6ba76, veriT v2021.06.2-rmx, alfc eca2cbd, Carcara v1.1.0, LFSC checker 5a127db, and the latest version of TSWC.
A large part of the SMT instance generator was implemented by Fedor Gromov.
To know more about the theory behind ATHENA, as well as its applications, have a look at our iFM23 paper.
Footnotes
-
The alfc checker has been recently rebranded ethos. According to @ajreynol this was only a name change on the checker side, with the codebase carrying over from alfc. Together with this change, however, the relation between cvc5 and alfc/ethos has been refactored and ATHENA has not yet been updated to accommodate for it. For the time being, support is available for cvc5 v1.1.2 and alfc eca2cbd. ↩