Codebase for quickly implementing experiments with toy models of superposition.
To set up the environment, see setup docs for detailed instructions.
To reproduce figures from Anthropic's Toy Models of Superposition, see sample script
This codebase is heavily adapted from the ARENA 3.0 codebase, designed and maintained by Callum McDougall. Many thanks to Callum and the ARENA team!