Questions Regarding Completion Length Change in Reproducing SimpleRL-Reason #465

nonstopfor · 2025-03-04T02:52:41Z

I have run the following commands as shown in Readme to reproduce the results in SimpleRL-Reason.

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml \
    --num_processes=7 src/open_r1/grpo.py \
    --config recipes/Qwen2.5-Math-7B/grpo/config_simple_rl.yaml

And my training dynamics are as follows (I slightly modify the config to remove format reward as I observed that it stays at zero during training):

The accuracy indeed increases a lot, which suggests the training process may be correct. However, I note that the completion length keeps decreasing, while it is expected to increase after around 20 training steps, as reported in SimpleRL-Reason.

Can anyone explain about this?

The text was updated successfully, but these errors were encountered:

TimeLovercc · 2025-03-04T13:21:48Z

Me too. Format reward keeps 0 all the way. But actually the format reward of this one is different from the format reward of SimpleRL-Reason. Is the difference the reason?

linkezh · 2025-03-05T04:00:33Z

Are you guys using the default setting? Why does my accuracy award keep decreasing 😢

TimeLovercc · 2025-03-05T04:39:55Z

@linkezh My global batch size is smaller, and other settings are same. The accuracy reward keeps increasing.

linkezh · 2025-03-05T05:15:09Z

@TimeLovercc Thanks! I guess I must have screwed something up. 😰

0205090923 · 2025-03-05T13:27:32Z

Exactly the same observation as yours. And how did you eval qwen-math-7b？ its maximum length is 4096, and the given script failed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions Regarding Completion Length Change in Reproducing SimpleRL-Reason #465

Questions Regarding Completion Length Change in Reproducing SimpleRL-Reason #465

nonstopfor commented Mar 4, 2025

TimeLovercc commented Mar 4, 2025

linkezh commented Mar 5, 2025

TimeLovercc commented Mar 5, 2025

linkezh commented Mar 5, 2025

0205090923 commented Mar 5, 2025

Questions Regarding Completion Length Change in Reproducing SimpleRL-Reason #465

Questions Regarding Completion Length Change in Reproducing SimpleRL-Reason #465

Comments

nonstopfor commented Mar 4, 2025

TimeLovercc commented Mar 4, 2025

linkezh commented Mar 5, 2025

TimeLovercc commented Mar 5, 2025

linkezh commented Mar 5, 2025

0205090923 commented Mar 5, 2025