You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question if there is a smaller scale data set to support GRPO training, I am currently training GRPO on 8 A800s and it takes 3 days (not code data set)
In addition, I have run GRPO training twice on the same machine with the same configuration, and I find that there is a big difference in the convergence of the final loss and reward. Is this normal for reinforcement learning? How should I make sure that my reinforcement learning has an approximate effect every time (or that loss and reward converge as closely as possible)?
The text was updated successfully, but these errors were encountered:
Hello @lewtun,Thank you for your excellent work!
I have a question if there is a smaller scale data set to support GRPO training, I am currently training GRPO on 8 A800s and it takes 3 days (not code data set)
In addition, I have run GRPO training twice on the same machine with the same configuration, and I find that there is a big difference in the convergence of the final loss and reward. Is this normal for reinforcement learning? How should I make sure that my reinforcement learning has an approximate effect every time (or that loss and reward converge as closely as possible)?
The text was updated successfully, but these errors were encountered: