Skip to content

Conversation

dwisdom0
Copy link
Contributor

@dwisdom0 dwisdom0 commented Sep 5, 2025

What does this PR do?

This PR updates the GRPO example in the quickstart to make it work. Before this PR, the example referenced a keyword argument reward_function that doesn't exist on the GRPOTrainer. After this PR, a user can copy/paste the example and have it run correctly.

The example in the GRPOTrainer reference documentation has the correct keyword argument reward_funcs.
https://huggingface.co/docs/trl/en/grpo_trainer#trl.GRPOTrainer.example

trainer = GRPOTrainer(
    model="Qwen/Qwen2-0.5B-Instruct",
    reward_funcs=reward_func,
    train_dataset=dataset,
)

Before (erroring out)

Traceback (most recent call last):
  File "/Users/freebie/code/python/llm_chess_rlvr/grpo_hello_world.py", line 9, in <module>
    trainer = GRPOTrainer(
              ^^^^^^^^^^^^
TypeError: GRPOTrainer.__init__() got an unexpected keyword argument 'reward_function'

After (running successfully)

config.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 659/659 [00:00<00:00, 8.35MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████| 988M/988M [00:12<00:00, 81.9MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████| 242/242 [00:00<00:00, 3.79MB/s]
tokenizer_config.json: 7.30kB [00:00, 22.6MB/s]
vocab.json: 2.78MB [00:00, 52.5MB/s]
merges.txt: 1.67MB [00:00, 159MB/s]
tokenizer.json: 7.03MB [00:00, 194MB/s]
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
  0%|                                                                                                     | 0/350166 [00:00<?, ?it/s]

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kashif kashif merged commit f5c2fec into huggingface:main Sep 6, 2025
@dwisdom0 dwisdom0 deleted the patch-1 branch September 6, 2025 17:06
SamY724 pushed a commit to SamY724/trl that referenced this pull request Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants