Polymath

Agent leveraging auxiliary tools to improve performance in selected problem domains. This is the reproduction package for our research paper Logic.py: Bridging the Gap between LLMs and Constraint Solvers.

Project configuration

Initial Setup

Currently, there is no default LLM inference provider availabe. We use an internal provider for our experiments, which is not part of the open source release. To get started, create an implementation of chat_completion.py for your inference back-end, e.g. in inference/your_inference_provider.py. Then set the following two variables in a .env file:

CHAT_COMPLETION_MODULE_PATH=path/to/your/chat_completion.py
CHAT_COMPLETION_CLASS_NAME=YourChatCompletion

If your provider requires secrets, we suggest to add them to the .env file as well and load them using os.getenv(...). You can use the .env-example as a starting point.

The next step is to set up the Conda environment:

conda env create --file environment.yml
conda activate polymath

Log into your huggingface account to download datasets:

huggingface-cli login

On HuggingFace, you need to request access to the following two datasets. Access is granted immediately upon filling out a form:

Finally, install datasets and remaining dependencies:

./scripts/setup.sh

Update dependencies

conda env update --file environment.yml

Run tests

Note: Some unit tests expect a working LLM inference set up.

To run all tests, use:

python -m unittest discover

To only run specifc tests, you can run:

python -m unittest agent.symex.tests.test_module_with_type_info_factory -k test_single

Benchmarks

ZebraLogicBench

To run the benchmark set using our logic agent, use:

python -m agent.logic.zebra_benchmark

This will produce an output JSON file that we evaluate using the original ZeroEval environment.

To set up a ZeroEval Conda environment, follow these instructions adapted from their README.md:

cd lib/ZeroEval
conda create -n zeroeval python=3.10
conda activate zeroeval
pip install vllm -U
pip install -r requirements.txt

Afterwards, you can run their evaluation using:

python src/evaluation/zebra_grid_eval.py

This will update result_dirs/zebra-grid.summary.md to now include the output JSON generated by our logic agent.

FOLIO

To run the benchmark set using our logic agent, use:

python -m agent.logic.folio_benchmark

τ-bench:

Support for τ-bench is a work in progress and will be updated once the integration is complete.

Submodules

This repository uses open source repositories, or public forks thereof, to make it easy for users to build the respective libraries and tools. This is purely for the purpose of convenience, and users are free to download these same tools and libraries from their original repositories.

License

Polymath is CC BY NC 4.0 licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
agent		agent
concurrency		concurrency
encoding		encoding
inference		inference
judge		judge
lib		lib
logger		logger
plugin		plugin
scripts		scripts
server		server
training		training
.env-example		.env-example
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polymath

Project configuration

Initial Setup

Update dependencies

Run tests

Benchmarks

ZebraLogicBench

FOLIO

τ-bench:

Submodules

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

facebookresearch/polymath

Folders and files

Latest commit

History

Repository files navigation

Polymath

Project configuration

Initial Setup

Update dependencies

Run tests

Benchmarks

ZebraLogicBench

FOLIO

τ-bench:

Submodules

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages