You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
shawnye2000
changed the title
What is the difference for reward_model_type=='vm' and reward_model_type=='prm'?
What is the difference between reward_model_type=='vm' and reward_model_type=='prm'?
Jan 10, 2025
In my opinion, the VM phase can be approximately regarded as the training phase. During this phase, the MCTS (Monte Carlo Tree Search) algorithm mainly focuses on exploration and simulates to find the optimal path. This can be seen as a process of exploration and labeling.
在我看来,VM 阶段可以近似看作是训练阶段。在这个阶段中,MCTS(蒙特卡洛树搜索)算法主要进行探索,并通过模拟找出最佳路径。这可以被视为一个探索和打标签的过程。
On the other hand, the PRM phase resembles a pure inference process. In this phase, the algorithm only needs to explore the path without making any path selection.
而 PRM 阶段则更像是一个纯推理阶段。在这个阶段中,算法只需要将路径探索出来即可,并没有进行路径的选择。
This may not be entirely accurate. I welcome any additions or discussions!
未必完全准确,欢迎大家补充和交流!
No description provided.
The text was updated successfully, but these errors were encountered: