Is there a smaller data set that supports GRPO training? #485

laogonggong847 · 2025-03-06T15:05:01Z

Hello @lewtun，Thank you for your excellent work！

I have a question if there is a smaller scale data set to support GRPO training, I am currently training GRPO on 8 A800s and it takes 3 days (not code data set)
In addition, I have run GRPO training twice on the same machine with the same configuration, and I find that there is a big difference in the convergence of the final loss and reward. Is this normal for reinforcement learning? How should I make sure that my reinforcement learning has an approximate effect every time (or that loss and reward converge as closely as possible)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a smaller data set that supports GRPO training? #485

Is there a smaller data set that supports GRPO training? #485

laogonggong847 commented Mar 6, 2025 •

edited

Loading

Is there a smaller data set that supports GRPO training? #485

Is there a smaller data set that supports GRPO training? #485

Comments

laogonggong847 commented Mar 6, 2025 • edited Loading

laogonggong847 commented Mar 6, 2025 •

edited

Loading