Model loading error when conducting PPO experiment #17

tsWen0309 · 2024-12-10T07:57:38Z

Hi, thanks for your work and code. I met model loading error when loading reward-model-human to evaluate answers.
My questions are as follow:

How could I get the weights of alpaca_farm_models/reward-model-human properly?
I download the pretrained weights on website https://huggingface.co/tatsu-lab/alpaca-farm-reward-model-human-wdiff as mentioned in https://github.com/tatsu-lab/alpaca_farm. Is this a right move? After doing so, I met the problem below with the following config in config_rl.yaml:

pythia_rlhf_individual:
  output_dir: runs/ppo_individual
  datasets:
    - alpaca_farm

  **gold_config:**
    model_name: tatsu-lab/alpaca-farm-reward-model-human-wdiff
    **is_alpacafarm_rm: False**
    batch_size: 32

  rank_config:
    is_reward_model: true
    model_names: 
      - models/rm-pythia-44m_seed1
    cache_dir: .cache
    pooling: last
    residual_dropout: 0.01
    use_flash_attention: false
    dtype: bf16
    batch_size: 128

  sft_config:
    is_reward_model: false
    model_name: tlc4418/pythia_1.4b_sft_policy
    pretrained_path : models/hf_pythia_1.4b_sft
    cache_dir: .cache
    quantization: false
    seq2seqmodel: false
    freeze_layer:
    num_layers_unfrozen: 2
    residual_dropout: 0.2
    use_flash_attention: false
    dtype: bf16
    batch_size: 32

I noticed that the is_alpacafarm_rm is set to False as default. Is this correct? Because after reading information on https://github.com/tatsu-lab/alpaca_farm, it seems that the golden reward used in PPO experiment is alpacafarm_rm?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model loading error when conducting PPO experiment #17

Model loading error when conducting PPO experiment #17

tsWen0309 commented Dec 10, 2024 •

edited

Loading

Model loading error when conducting PPO experiment #17

Model loading error when conducting PPO experiment #17

Comments

tsWen0309 commented Dec 10, 2024 • edited Loading

tsWen0309 commented Dec 10, 2024 •

edited

Loading