simple-rlvr

A simple toy implementation of GRPO with LLMs.

Usage

$ git clone https://github.com/Ktakuya332C/simple-rlvr.git
$ cd simple-rlvr
$ poetry install
$ # Below command will run, but scores won't go up because the scale is too small
$ poetry run python -m rlvr.main \
  --model=sbintuitions/tiny-lm-chat \
  --num-rollout-workers=2 \
  --num-reference-workers=2 \
  --num-grpo-learners=2 \
  --batch-size-sync=16 \
  --batch-size-update=8 \
  --batch-size-backward=4 \
  --batch-size-rollout=2 \
  --batch-size-reference=2 \
  --num-generations=2 \
  --max-length=512 \
  --temperature=1.0

Development

$ poetry run black .
$ poetry run pytest -xsvv tests

Note

You may need to set GLOO_SOCKET_IFNAME=lo0 to run this script on Mac.
This design is largely influenced by RLlib Flow.

ToDo

missing eos penalty
bf16 (numpy does not support bf16)
FSDP (DDP works without GPUs, but FSDP does not)
vllm integration (collective_rpc support is not yet released)
large scale experiments
tensor parallel

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
rlvr		rlvr
tests		tests
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

simple-rlvr

Usage

Development

Note

ToDo

About

Uh oh!

Releases

Packages

Languages

Ktakuya332C/simple-rlvr

Folders and files

Latest commit

History

Repository files navigation

simple-rlvr

Usage

Development

Note

ToDo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages