Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a smaller data set that supports GRPO training? #485

Open
laogonggong847 opened this issue Mar 6, 2025 · 0 comments
Open

Is there a smaller data set that supports GRPO training? #485

laogonggong847 opened this issue Mar 6, 2025 · 0 comments

Comments

@laogonggong847
Copy link

laogonggong847 commented Mar 6, 2025

Hello @lewtun,Thank you for your excellent work!

I have a question if there is a smaller scale data set to support GRPO training, I am currently training GRPO on 8 A800s and it takes 3 days (not code data set)
In addition, I have run GRPO training twice on the same machine with the same configuration, and I find that there is a big difference in the convergence of the final loss and reward. Is this normal for reinforcement learning? How should I make sure that my reinforcement learning has an approximate effect every time (or that loss and reward converge as closely as possible)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant