[Feature Request]: Support Parallel Reward Calculation for Time-consuming Methods #406

AIBionics · 2025-02-27T11:49:32Z

Hello,

I would like to propose a feature enhancement aimed at improving the efficiency of reward calculation, particularly for time-consuming methods. Currently, the system waits for all rollouts to complete before initiating the reward calculations. This proposal requests the implementation of parallel processing capabilities so that reward calculations can begin immediately after each rollout completes.

This change would be especially beneficial for scenarios involving remote RM and other computationally intensive reward function computations. If it's not feasible to implement fully independent reward calculations per rollout in the short term, supporting independence across small batches would also be a valuable intermediate step.

Thank you for considering this feature request. I look forward to your thoughts and any potential updates on this front.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Support Parallel Reward Calculation for Time-consuming Methods #406

[Feature Request]: Support Parallel Reward Calculation for Time-consuming Methods #406

AIBionics commented Feb 27, 2025

[Feature Request]: Support Parallel Reward Calculation for Time-consuming Methods #406

[Feature Request]: Support Parallel Reward Calculation for Time-consuming Methods #406

Comments

AIBionics commented Feb 27, 2025