Agent leveraging auxiliary tools to improve performance in selected problem domains. This is the reproduction package for our research paper Logic.py: Bridging the Gap between LLMs and Constraint Solvers.
- Currently, there is no default LLM inference provider availabe. We use an
internal provider for our experiments, which is not part of the open source
release. To get started, create an implementation of
chat_completion.pyfor your inference back-end, e.g. ininference/your_inference_provider.py. Then set the following two variables in a.envfile:
CHAT_COMPLETION_MODULE_PATH=path/to/your/chat_completion.py
CHAT_COMPLETION_CLASS_NAME=YourChatCompletionIf your provider requires secrets, we suggest to add them to the .env file as
well and load them using os.getenv(...). You can use the .env-example as a
starting point.
- The next step is to set up the Conda environment:
conda env create --file environment.yml
conda activate polymath- Log into your huggingface account to download datasets:
huggingface-cli loginOn HuggingFace, you need to request access to the following two datasets. Access is granted immediately upon filling out a form:
- https://huggingface.co/datasets/yale-nlp/FOLIO
- https://huggingface.co/datasets/allenai/ZebraLogicBench-private
- Finally, install datasets and remaining dependencies:
./scripts/setup.shconda env update --file environment.ymlNote: Some unit tests expect a working LLM inference set up.
To run all tests, use:
python -m unittest discoverTo only run specifc tests, you can run:
python -m unittest agent.symex.tests.test_module_with_type_info_factory -k test_singleTo run the benchmark set using our logic agent, use:
python -m agent.logic.zebra_benchmarkThis will produce an output JSON file that we evaluate using the original
ZeroEval environment.
To set up a ZeroEval Conda environment, follow these instructions adapted
from their README.md:
cd lib/ZeroEval
conda create -n zeroeval python=3.10
conda activate zeroeval
pip install vllm -U
pip install -r requirements.txtAfterwards, you can run their evaluation using:
python src/evaluation/zebra_grid_eval.pyThis will update result_dirs/zebra-grid.summary.md to now include the output
JSON generated by our logic agent.
To run the benchmark set using our logic agent, use:
python -m agent.logic.folio_benchmarkSupport for τ-bench is a work in progress and will be updated once the integration is complete.
This repository uses open source repositories, or public forks thereof, to make it easy for users to build the respective libraries and tools. This is purely for the purpose of convenience, and users are free to download these same tools and libraries from their original repositories.
Polymath is CC BY NC 4.0 licensed, as found in the LICENSE file.