2 questions #23

matteogabella · 2019-07-22T12:33:09Z

hi Abraham and first of all, thank you for your amazing job... i have a couple of question:

what's the purpose of the back up?
why in some particular cases you perform a backup? is it necessary in case of resuming the training or is it just a precaution?
why someone would have to use the back up later?
in order to convert the model in tflite format (to play a bit in mobile environment) i need to freeze the model... in most of the guides i read that to do that, starting from checkpoint file, i need to pass the output nodes to 'convert_variables_to_constants' function ...

so i used the 'print_tensors_in_checkpoint_file' to get the nodes of your model, and i obtained the list below...
wich are the output node? do you think i need to pass all the tensors with 'decoder' in the path?

thank you

tensor_name: model/decoder/attention_decoder_cell/attention_layer/kernel
tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_b
tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_g
tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/attention_v
tensor_name: model/decoder/attention_decoder_cell/bahdanau_attention/query_layer/kernel
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_0/basic_lstm_cell/bias
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_0/basic_lstm_cell/kernel
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_1/basic_lstm_cell/bias
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_1/basic_lstm_cell/kernel
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_2/basic_lstm_cell/bias
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_2/basic_lstm_cell/kernel
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_3/basic_lstm_cell/bias
tensor_name: model/decoder/attention_decoder_cell/multi_rnn_cell/cell_3/basic_lstm_cell/kernel
tensor_name: model/decoder/memory_layer/kernel
tensor_name: model/decoder/output_dense/bias
tensor_name: model/decoder/output_dense/kernel
tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/basic_lstm_cell/bias
tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel
tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_1/basic_lstm_cell/bias
tensor_name: model/encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_1/basic_lstm_cell/kernel
tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/bias
tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel
tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_1/basic_lstm_cell/bias
tensor_name: model/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_1/basic_lstm_cell/kernel
tensor_name: model/encoder/shared_embeddings_matrix

AbrahamSanders · 2019-07-23T04:06:31Z

Hi @matteogabella

a) what's the purpose of the back up?
Normally, a model is trained until some validation metric stops improving for a given number of epochs and then it stops. However, there is no good automatic validation metric for open-ended dialog quality. Metrics exist for neural machine translation such as BLEU and perplexity, but these only work well when there is a single correct way for the model to respond (such as in neural machine translation). In conversational modeling, there are many ways to correctly answer the same question, and therefore no known way to automatically measure the "correctness" of each response. Since there is no good automatic validation metric, an early stopping mechanism cannot be used. So, instead the training routine will save a full backup of the entire model at loss intervals specified in the hparams. This way a human can converse with each one and manually judge which one strikes the best balance of generalization and context sensitivity.

For example, an under-trained model might respond with "I don't know" for every single prompt, while an overtrained model might respond with a very detailed answer which is completely out of context because the question wasn't posed exactly the way it showed up in training. A human can choose the best backup point and delete the rest.

b) which are the output node?
model/decoder/output_dense/...
These are the weights and biases of the fully connected layer that sits on top of the decoder LSTM output. The softmax that models the probability distribution across the vocabulary uses the activations of output_dense.

Let me know if you have any more questions and thank you for your interest in the project!

matteogabella · 2019-07-23T15:58:53Z

thank you, and sorry to bother you again... but if
tensor_name: model/decoder/output_dense/bias
tensor_name: model/decoder/output_dense/kernel
are che output nodes, which are the INPUT nodes?

AbrahamSanders · 2019-07-25T05:30:34Z

The input node would be the shared_embeddings_matrix, since the model input is a sequence of embedding indices which is converted to a sequence of word vectors using the embedding lookup function. This sequence of word vectors is what is then fed into the encoder RNN's bidirectional cells.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 questions #23

2 questions #23

matteogabella commented Jul 22, 2019 •

edited

Loading

AbrahamSanders commented Jul 23, 2019

matteogabella commented Jul 23, 2019

AbrahamSanders commented Jul 25, 2019

2 questions #23

2 questions #23

Comments

matteogabella commented Jul 22, 2019 • edited Loading

AbrahamSanders commented Jul 23, 2019

matteogabella commented Jul 23, 2019

AbrahamSanders commented Jul 25, 2019

matteogabella commented Jul 22, 2019 •

edited

Loading