All notable changes to this project will be documented in this file.
- The InterCode webpage has been modified to be a leaderboard style 🏆.
- If you evaluate on InterCode and would like to put your results on the leaderboard, please create an issue or email John directly 📧.
- We wrote a standalone report describing the operational InterCode-CTF 🚩 environment, a dataset of 100 task instances, and our initial experiments.
- 🚨 New Environment! The recently released SWE-bench benchmark introduces software engineering as a task. To support agent-based approaches, we have released the IC-SWE-bench environment, which presents the SWE-bench task in an interactive setting!
✍🏻 John
Since its initial release, I am pleased to announce that InterCode has been extended to support a number of new languages and datasets. They are summarized as follows:
- New Supported Datasets:
- Python Support:
- Interpreter-Style Environment + Dockerfile
- Single Turn + Try Again results on Python Environment + MBPP will be uploaded soon to the
data/results
folder - Try it out with
python run_demo.py python
- CTF Environment:
ctf_env.py
has been rewritten to:- Depend a single Dockerfile for multiple task instances
- Uses the
InterCodeEnv
abstraction such that it is implemented in just 30 lines
- CTF environment has been integrated into the
run_demo.py
script. Try it out withpython run_demo.py ctf
- The CTF dataset will continually we increased in quantity as we source and create more problems.
✍🏻 John
Introducing the initial release of InterCode, a lightweight, flexible, and easy-to-use framework for designing interactive code environments. Please view the README.md
and wiki pages for information on how to build and use InterCode.