OpenELM

This repository is a replication of Evolution Through Large Models, a recent paper from OpenAI exploring the links between large language models (LLMs) and evolutionary computing, particularly focused on code generation.

LLMs trained on datasets of code, such as OpenAI’s Codex, have shown good results in automated code generation. However, in cases where we are interested in a class of programs which are rarely found in the training distribution, evolutionary algorithms provide a way to generate code by making mutations to known, or "seed" programs. The ELM approach shows that an LLM trained on code can suggest intelligent mutations for genetic programming (GP) algorithms. Genetic algorithms explore the search space with random perturbations, but typically need to be highly customised with domain knowledge to allow them to make desirable changes — LLMs provide a way of encoding this domain knowledge and guiding the genetic algorithm towards intelligent exploration of the search space.

This project aims to replicate the ELM paper in the original Sodarace environment, before applying the technique to more complex code generation problems.

For more details, see our full research proposal at https://carperai.notion.site/ELM-e8f37b2649944259b1abf9ccaa4edae2. The release blog post: https://carper.ai/openelm-release.

Architecture

Roughly, ELM consists of a pipeline of different components:

+-------------+                     +-------------+         
|  MapElites  | <-----------------> | Environment | 
+------+------+                     +------+------+         
       |                                   ^                         
       | collect samples                   |                         
       v                                   v                         
+------+---------+     finetune    +-------+--------+    mutate and execute   +----------------+
| Conditional RL | --------------> | Language model | <---------------------> | Sandbox server |
+----------------+                 +----------------+                         +----------------+

We currently implemented MapElites, Environment, a part of the Language model mutation operator (prompt mutation), and the sandbox server.

In the next stage, we will complete the conditional generation with RL pipeline.

Running ELM

Currently, we can run the MAP-Elites algorithm on a few environments, apply prompt mutations, and connect with sandbox server. The RL components are still on-going.

Setting up the sandbox

Ideally, please follow the sandboxing readme to set it up in a docker container. But for quick testing purpose, one may try the following:

cd elm/sandbox/server
export FLASK_APP=index.py
flask run

Running the MAP-Elites

We have a few toy environments implemented as well as the Sodarace environment in the ELM paper. One may try to do the following (after setting up with the sandbox server in the same machine).

First, download the codegen-350M model.

wget -P checkpoints https://storage.googleapis.com/sfr-codegen-research/checkpoints/codegen-350M-mono.tar.gz && tar -xvf checkpoints/codegen-350M-mono.tar.gz -C checkpoints/

Once it is done, run the MAP-Elites with prompt mutations using codegen-350M.

python3 run_elm.py run_name=test
python3 run_elm.py --config-name=elm_image_cfg  run_name=test

Milestones & Progress

Weekly meetings are in the EleutherAI discord at 20:00 UTC on Fridays.

Sodarace environment implemented
Stage 1: Diff Models & MAP-Elites
- Prompt Engineering on CodeGen
- Train diff model
- MAP-Elites implemented
Stage 2: Train LLM on generated data
Stage 3: Conditional generation with PPO

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.github		.github
docs		docs
src/openelm		src/openelm
tests		tests
trlx_example		trlx_example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_elm.py		run_elm.py
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenELM

Architecture

Running ELM

Setting up the sandbox

Running the MAP-Elites

Milestones & Progress

About

Releases

Packages

Languages

License

dsctt/OpenELM

Folders and files

Latest commit

History

Repository files navigation

OpenELM

Architecture

Running ELM

Setting up the sandbox

Running the MAP-Elites

Milestones & Progress

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages