This repository contains the code for multiple Preparedness evals that use nanoeval and alcatraz.
- Python 3.11 (3.12 is untested; 3.13 will break chz)
for proj in nanoeval alcatraz nanoeval_alcatraz; do
pip install -e project/"$proj"
done
- PaperBench
- SWELancer (Forthcoming)
- MLE-bench (Forthcoming)