Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Support Parallel Reward Calculation for Time-consuming Methods #406

Open
AIBionics opened this issue Feb 27, 2025 · 0 comments

Comments

@AIBionics
Copy link

Hello,

I would like to propose a feature enhancement aimed at improving the efficiency of reward calculation, particularly for time-consuming methods. Currently, the system waits for all rollouts to complete before initiating the reward calculations. This proposal requests the implementation of parallel processing capabilities so that reward calculations can begin immediately after each rollout completes.

This change would be especially beneficial for scenarios involving remote RM and other computationally intensive reward function computations. If it's not feasible to implement fully independent reward calculations per rollout in the short term, supporting independence across small batches would also be a valuable intermediate step.

Thank you for considering this feature request. I look forward to your thoughts and any potential updates on this front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant