We provide a lightweight implementation of the PPO finetuning performed in "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning".
We leverage Lamorel's custom modules and updaters to add a value head on top of the LLM and finetune all the weights using the PPO loss.
- Install BabyAI-Text environment
- Install required packages:
pip install -r requirements.txt
To launch the example using a single GPU on a local machine:
- Spawn both processes (RL collecting data and LLM):
python -m lamorel_launcher.launch \
--config-path PROJECT_PATH/examples/PPO_finetuning/ \
--config-name PROJECT_PATH/examples/PPO_finetuning/local_gpu_config \
rl_script_args.path=PROJECT_PATH/examples/PPO_finetuning/main.py \
rl_script_args.output_dir=YOUR_OUTPUT_DIR \