Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to restore a model? #4

Open
mfxss opened this issue Apr 16, 2017 · 9 comments
Open

How to restore a model? #4

mfxss opened this issue Apr 16, 2017 · 9 comments

Comments

@mfxss
Copy link

mfxss commented Apr 16, 2017

Is it ok if I add S.load_npz(model_file, model) after model = BiLstmContext(args.deep, args.gpu, reader.word2index, context_word_units, lstm_hidden_units, target_word_units, loss_func, True, args.dropout) in train_context2vec.py without using common.model_reader?
Thank you very much.

@orenmel
Copy link
Owner

orenmel commented Apr 18, 2017

Note that the model_reader also loads the word2index mapping, which is essential for applying the model.

@mfxss
Copy link
Author

mfxss commented Apr 19, 2017

This is where I modified.

#cs = [reader.trimmed_word2count[w] for w in range(len(reader.trimmed_word2count))]
#loss_func = L.NegativeSampling(target_word_units, cs, NEGATIVE_SAMPLING_NUM, args.ns_power)
if args.context == 'lstm':
    model = model_reader.model
    model_reader = ModelReader(model_param_file)

It seems that the train=False and assert train == False in model_reader.py should also be modified.
And the trained word embedding is included in model.loss_func.W.data.
Am I right? If I missed something that should be modified, Please tell me.
Thank you very very much

@orenmel
Copy link
Owner

orenmel commented Apr 19, 2017

This seems ok. The only thing is that the model_reader doesn't bother to initialize the loss_func with the correct values, because it's currently not supported in train mode. If your purpose it to further train a model that you load, then you should make sure you initialize the model's loss_func correctly with the true cs values.

@mfxss
Copy link
Author

mfxss commented Apr 19, 2017

I am a little confused. Do cs values change after each epoch? What does cs stand for?
Here is my new code.

cs = [reader.trimmed_word2count[w] for w in range(len(reader.trimmed_word2count))]
loss_func = L.NegativeSampling(target_word_units, cs, NEGATIVE_SAMPLING_NUM, args.ns_power)
if args.context == 'lstm':
    #model_reader = ModelReader(model_param_file)
    #model = model_reader.model
    model = BiLstmContext(args.deep, args.gpu, reader.word2index, context_word_units, lstm_hidden_units, target_word_units, loss_func, True, args.dropout)
    S.load_npz(model_file, model)

I can use word2index of reader. Is the word2index of reader the same as that of model_reader?
Moreover, how can I restore the word embedding matrix w into the model? Will loss_func.W.data=model_reader.w work?

@orenmel
Copy link
Owner

orenmel commented Apr 19, 2017

Just to make sure, could you please describe what your end-goal here is? Are you trying to load one of our existing models and continue training it for more epochs? Using which corpus?

@mfxss
Copy link
Author

mfxss commented Apr 20, 2017

My goal is to train a ukwac model like yours, with different parameters. I have run for one epoch, for some reason, I had to stop. Now I want to load the model to continue.
I found while training, the word embedding was also trained. So how to load this word embedding in targets file? This is all I wonder.
Oh,I read the model file, it seems that model file saves the loss_func.W.data. So there is no need to load the word embedding targets file again. Right?
Thank you.

@orenmel
Copy link
Owner

orenmel commented Apr 20, 2017

Ok. So as long as you are using the exact same corpus that you used in the first epoch, then your code should work fine (since reader.word2index would be identical to the one used in the first epoch). And yes, there's no need to load the word embedding targets.

@mfxss
Copy link
Author

mfxss commented Apr 24, 2017

With more epoches, the loss begin to increase, did this occur to you? And the accuracy of WSD became lower.

@orenmel
Copy link
Owner

orenmel commented Apr 24, 2017

As you could see from the code, I never continued training of an existing model. In the case of UkWac, I trained for one epoch, then later I trained for 3 epochs from scratch, and the performance of the latter model was better. I wouldn't expect the train loss to increase in your case, but maybe there's something I'm missing. One thing that does come to mind is that to do this properly you should also save (and later restore) the Adam optimizer state along with the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants