You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the is_alpacafarm_rm is set to False as default. Is this correct? Because after reading information on https://github.com/tatsu-lab/alpaca_farm, it seems that the golden reward used in PPO experiment is alpacafarm_rm?
The text was updated successfully, but these errors were encountered:
Hi, thanks for your work and code. I met model loading error when loading reward-model-human to evaluate answers.
My questions are as follow:
I download the pretrained weights on website https://huggingface.co/tatsu-lab/alpaca-farm-reward-model-human-wdiff as mentioned in https://github.com/tatsu-lab/alpaca_farm. Is this a right move? After doing so, I met the problem below with the following config in config_rl.yaml:
The text was updated successfully, but these errors were encountered: