Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding Completion Length Change in Reproducing SimpleRL-Reason #465

Open
nonstopfor opened this issue Mar 4, 2025 · 5 comments

Comments

@nonstopfor
Copy link

I have run the following commands as shown in Readme to reproduce the results in SimpleRL-Reason.

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml \
    --num_processes=7 src/open_r1/grpo.py \
    --config recipes/Qwen2.5-Math-7B/grpo/config_simple_rl.yaml

And my training dynamics are as follows (I slightly modify the config to remove format reward as I observed that it stays at zero during training):

Image

The accuracy indeed increases a lot, which suggests the training process may be correct. However, I note that the completion length keeps decreasing, while it is expected to increase after around 20 training steps, as reported in SimpleRL-Reason.

Can anyone explain about this?

@TimeLovercc
Copy link

Me too. Format reward keeps 0 all the way. But actually the format reward of this one is different from the format reward of SimpleRL-Reason. Is the difference the reason?

@linkezh
Copy link

linkezh commented Mar 5, 2025

Are you guys using the default setting? Why does my accuracy award keep decreasing 😢

@TimeLovercc
Copy link

@linkezh My global batch size is smaller, and other settings are same. The accuracy reward keeps increasing.

@linkezh
Copy link

linkezh commented Mar 5, 2025

@TimeLovercc Thanks! I guess I must have screwed something up. 😰

@0205090923
Copy link

Exactly the same observation as yours. And how did you eval qwen-math-7b? its maximum length is 4096, and the given script failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants