Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP "Faster" grpo trainer #371

Draft
wants to merge 46 commits into
base: main
Choose a base branch
from
Draft

WIP "Faster" grpo trainer #371

wants to merge 46 commits into from

Conversation

edbeeching
Copy link
Collaborator

@edbeeching edbeeching commented Feb 19, 2025

GRPO trainer to train on N + 1 nodes, with 1 node allocated for generation. Very experimental, so expect hard edges!

Usage:

For training, run:

accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml scripts/remote_grpo.py \
    --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_remote.yaml

This will automatically spin up an SGLang server on a separate Slurm node and use it for generation.

For development, first spin up an SGLang sever on a separate node:

python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-1.5B-Instruct   --port=30010 --skip-tokenizer-init --mem-fraction-static 0.7 --host=0.0.0.0 --dp-size=8

Then run training by providing the IP address of the server:

accelerate launch --config_file=recipes/accelerate_configs/zero3.yaml scripts/remote_grpo.py \
    --config recipes/Qwen2.5-1.5B-Instruct/grpo/config_remote.yaml \
    --remote_gen_model_url ip-26-0-160-103

TODO

  • Remove hard-coded filepath for temporary checkpoints
  • Refactor reference model log probs to happen within generation step
  • Implement μ iterations from GRPO
  • Validate against TRL

@troy12x
Copy link

troy12x commented Feb 22, 2025

please my goat push this to main pleaseeee

@troy12x
Copy link

troy12x commented Feb 22, 2025

my project is waiting for ur code no joke

@troy12x
Copy link

troy12x commented Feb 24, 2025

hi 👉👈

@troy12x
Copy link

troy12x commented Feb 24, 2025

please finish this my goat

@qgallouedec
Copy link
Member

@troy12x please avoid spamming 🙏 it doesn't help, be sure that we're working hard on this.

@troy12x
Copy link

troy12x commented Feb 25, 2025

sry i dont mean but i really need you guys to finish this fast

@troy12x
Copy link

troy12x commented Feb 25, 2025

good luck !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants