Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 questions #23

Open
matteogabella opened this issue Jul 22, 2019 · 3 comments
Open

2 questions #23

matteogabella opened this issue Jul 22, 2019 · 3 comments

Comments

@matteogabella
Copy link

matteogabella commented Jul 22, 2019

hi Abraham and first of all, thank you for your amazing job... i have a couple of question:

  • what's the purpose of the back up?
    why in some particular cases you perform a backup? is it necessary in case of resuming the training or is it just a precaution?
    why someone would have to use the back up later?

  • in order to convert the model in tflite format (to play a bit in mobile environment) i need to freeze the model... in most of the guides i read that to do that, starting from checkpoint file, i need to pass the output nodes to 'convert_variables_to_constants' function ...

so i used the 'print_tensors_in_checkpoint_file' to get the nodes of your model, and i obtained the list below...
wich are the output node? do you think i need to pass all the tensors with 'decoder' in the path?

thank you

tensor_name: model/decoder/attention_decoder_cell/attention_layer/kernel
tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_b
tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_g
tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_v
tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/query_layer/kernel
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_0/basic_lstm_cell/bias
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_0/basic_lstm_cell/kernel
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_1/basic_lstm_cell/bias
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_1/basic_lstm_cell/kernel
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_2/basic_lstm_cell/bias
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_2/basic_lstm_cell/kernel
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_3/basic_lstm_cell/bias
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_3/basic_lstm_cell/kernel
tensor_name: model/decoder/memory_layer/kernel
tensor_name: model/decoder/output_dense/bias
tensor_name: model/decoder/output_dense/kernel
tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/basic_lstm_cell/bias
tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel
tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_1/basic_lstm_cell/bias
tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_1/basic_lstm_cell/kernel
tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/bias
tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel
tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_1/basic_lstm_cell/bias
tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_1/basic_lstm_cell/kernel
tensor_name: model/encoder/shared_embeddings_matrix

@AbrahamSanders
Copy link
Owner

Hi @matteogabella

a) what's the purpose of the back up?
Normally, a model is trained until some validation metric stops improving for a given number of epochs and then it stops. However, there is no good automatic validation metric for open-ended dialog quality. Metrics exist for neural machine translation such as BLEU and perplexity, but these only work well when there is a single correct way for the model to respond (such as in neural machine translation). In conversational modeling, there are many ways to correctly answer the same question, and therefore no known way to automatically measure the "correctness" of each response. Since there is no good automatic validation metric, an early stopping mechanism cannot be used. So, instead the training routine will save a full backup of the entire model at loss intervals specified in the hparams. This way a human can converse with each one and manually judge which one strikes the best balance of generalization and context sensitivity.

For example, an under-trained model might respond with "I don't know" for every single prompt, while an overtrained model might respond with a very detailed answer which is completely out of context because the question wasn't posed exactly the way it showed up in training. A human can choose the best backup point and delete the rest.

b) which are the output node?
model/decoder/output_dense/...
These are the weights and biases of the fully connected layer that sits on top of the decoder LSTM output. The softmax that models the probability distribution across the vocabulary uses the activations of output_dense.

Let me know if you have any more questions and thank you for your interest in the project!

@matteogabella
Copy link
Author

thank you, and sorry to bother you again... but if
tensor_name: model/decoder/output_dense/bias
tensor_name: model/decoder/output_dense/kernel

are che output nodes, which are the INPUT nodes?

@AbrahamSanders
Copy link
Owner

The input node would be the shared_embeddings_matrix, since the model input is a sequence of embedding indices which is converted to a sequence of word vectors using the embedding lookup function. This sequence of word vectors is what is then fed into the encoder RNN's bidirectional cells.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants