lamorel/examples/PPO_finetuning at main · flowersteam/lamorel

History

Name		Name	Last commit message	Last commit date
parent directory ..
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
babyai_text_env.py		babyai_text_env.py
local_gpu_config.yaml		local_gpu_config.yaml
main.py		main.py
requirements.txt		requirements.txt

README.md

Context

We provide a lightweight implementation of the PPO finetuning performed in "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning".

We leverage Lamorel's custom modules and updaters to add a value head on top of the LLM and finetune all the weights using the PPO loss.

Installation

Install BabyAI-Text environment
Install required packages: pip install -r requirements.txt

Launch

To launch the example using a single GPU on a local machine:

Spawn both processes (RL collecting data and LLM):

python -m lamorel_launcher.launch \
       --config-path PROJECT_PATH/examples/PPO_finetuning/ \ 
       --config-name PROJECT_PATH/examples/PPO_finetuning/local_gpu_config \
       rl_script_args.path=PROJECT_PATH/examples/PPO_finetuning/main.py \
       rl_script_args.output_dir=YOUR_OUTPUT_DIR \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO_finetuning

PPO_finetuning

README.md

Context

Installation

Launch

Files

PPO_finetuning

Directory actions

More options

Directory actions

More options

Latest commit

History

PPO_finetuning

Folders and files

parent directory

README.md

Context

Installation

Launch