Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning pretrained model #51

Open
ashleyyy94 opened this issue Mar 21, 2019 · 3 comments
Open

Fine-tuning pretrained model #51

ashleyyy94 opened this issue Mar 21, 2019 · 3 comments

Comments

@ashleyyy94
Copy link

I'm trying to fine-tune the pretrained model provided with my custom dataset. The command is nvidia-docker run -it --rm -v pwd:/decaNLP/ -u $(id -u):$(id -g) bmccann/decanlp:cuda9_torch041 bash -c "python /decaNLP/train.py --load /decaNLP/mqan_decanlp_better_sampling_cove_cpu/iteration_560000.pth --resume --train_tasks mwo

While trying to initialise the MQAN model, it throws up this error:
RuntimeError: Error(s) in loading state_dict for MultitaskQuestionAnsweringNetwork: Missing key(s) in state_dict: "encoder_embeddings.projection.linear.weight", "encoder_embeddings.projection.linear.bias". Unexpected key(s) in state_dict: "cove.rnn1.weight_ih_l0", "cove.rnn1.weight_hh_l0", "cove.rnn1.bias_ih_l0", "cove.rnn1.bias_hh_l0", "cove.rnn1.weight_ih_l0_reverse", "cove.rnn1.weight_hh_l0_reverse", "cove.rnn1.bias_ih_l0_reverse", "cove.rnn1.bias_hh_l0_reverse", "cove.rnn1.weight_ih_l1", "cove.rnn1.weight_hh_l1", "cove.rnn1.bias_ih_l1", "cove.rnn1.bias_hh_l1", "cove.rnn1.weight_ih_l1_reverse", "cove.rnn1.weight_hh_l1_reverse", "cove.rnn1.bias_ih_l1_reverse", "cove.rnn1.bias_hh_l1_reverse", "project_cove.linear.weight", "project_cove.linear.bias".

Kindly advise how to go about fine-tuning the model. Thank you.

@hot-cheeto
Copy link

Hello,
I am having issues with fine-tuning the pretrained model mqan_decanlp_better_sampling_cove_cpu. I give the following command :

python train.py --name test_run --load /path/to/mqan_decanlp_better_sampling_cove_cpu/iteration_560000.pth --resume --device 0 --cove --train_tasks new_task

put I received the following error message: ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

I have double check the parameters in config.json in mqan_decanlp_better_sampling_cove_cpu.

What could be the problem ? Am I missing something ?

Thank you in advance !

@diarmidmackenzie
Copy link

These queries are quite old, but I've been hitting the same problems. Posting some answers in case they might be useful for others.

@ashleyyy94 Looks like you were running without the --cove parameter. The pretrained model you were trying to use had used cove, so you need to also use cove to continue training.

@hot-cheeto I have been hitting this problem too. I don't fully understand it, but it seems to be possible to work around this by dropping the "resume" parameter.

The issue seems to be that the stored state for the optimizer has a mismatching number of parameters.

It has 153 parameters.

>>> import torch
>>> a = torch.load("/decaNLP/results/checkpoints/iteration_560000_rank_0_optim.pth", map_location='cpu')
>>> len(a['param_groups'][0]['params'])
153

Whereas if I start training with the same parameters as you, the optimizer state only has 137 parameters (16 fewer).

>>> b = torch.load("/decaNLP/diarmid_learning/1/iteration_1000_rank_0_optim.pth", map_location='cpu')
>>> len(b['param_groups'][0]['params'])
137

I have not yet understood what accounts for these extra 16 optimizer parameters. So I have no idea how to correct for them. But I believe it is reasonable to discard the optimizer state and continue training from the model state, and seem to have got some reasonable results doing so.

You can do that by dropping the --resume parameter.

Once you have a learning checkpoint that you have generated yourself, you can then continue from this checkpoint using the --resume parameter.

@diarmidmackenzie
Copy link

Adding a note that you can't set "strict=False" on the call to load_state_dict for the optimizer. The reason why is explained here: pytorch/pytorch#3852.

I am suspicious there has been some change to the model since the pre-trained data referenced in the ReadMe was generated.

The pre-trained data logs say:
process_0 - MultitaskQuestionAnsweringNetwork has 18,199,502 parameters

What I see when training is:
process_0 - MultitaskQuestionAnsweringNetwork has 14,589,902 trainable parameters

parameters vs. trainable parameters in this log seems to imply that the pre-trained data set was generated from code prior to this commit (26 Oct 2018):
2c837ea
(even if the training itself seems to have taken place in December 2018, it seems to have been using older code).

The 3.5M difference in the number of trainable parameters seems concerning too, and this doesn't seem to be down to configuration (unless I have missed something)

There have been quite a few changes to the repo since 26 Oct 2018 (including a bunch on Oct 26 itself). I've not analyzed them all, but it seems plausible that one of these changes might have resulted in the incompatibility of the optimizer's stored state, causing this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants