This repository contains the code for a thesis exploring how large language models can derive state abstractions for a simple grid‑world game. Heavy computation runs in Rust for performance and memory safety, while Python orchestrates configuration, LLM calls and evaluation.
For full docs please visit: dennislent.github.io/llm-abstraction
├── src/ # Rust crate with core game logic and utilities
├── llm_abstraction/ # Python package wrapping the Rust library and analysis helpers
├── main.py # Command line entry point for experiments
├── container/ # Apptainer/Singularity definition used in CI and HPC environments
└── tests/ # Python and Rust tests
- Python 3.10+
- rustup (Nightly toolchain)
- Linux (tested on Ubuntu 24.04 LTS & Manjaro 6.12)
Install all dependencies and build the Rust extension with:
./setup.shConfiguration lives in config.yml and config_prompts.yml. The CLI exposes several commands:
python main.py preview-prompts # print generated prompts
python main.py preview-maps # save map PNGs and metadata to outputs/
python main.py mcts # run baseline MCTS agents
python main.py score-prompts -i 0 -m llama2 # score abstractions for a model
python main.py benchmark-llm -i 0 -m llama2 # run MCTS with LLM abstraction
python main.py analysis # produce plots and ranking tablesResults are written to the outputs/ directory.
- config.yml – specifies grid maps, simulation settings under
mcts_variables, and which prompt compositions to use viallm. - config_prompts.yml – defines reusable prompt fragments referenced by
config.yml.
The utilities read these files to decide which maps to process, how prompts are assembled and how evaluations run.
Python style is enforced with flake8. The list of ignored rules and their justification is documented in docs/flake8-ignores.md.
The rust_core Python module exposes the Rust computation routines:
PyRunner– run simulations and MCTS from Pythonmax_returnsandmin_turns– compute theoretical bounds for a worldvisualize_world_mapandvisualize_abstraction– render grids and abstractions as PNGsgenerate_representations_py– produce JSON, text and adjacency list representationsgenerate_mdp– build transition and reward matrices with cluster labels
Run the full test suite (Rust and Python):
cargo test
pytestContinuous integration additionally runs a small end‑to‑end check that executes the CLI on sample data to ensure the Python and Rust components integrate correctly.
This project is released under the MIT license.