You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 11, 2021. It is now read-only.
I noticed that you run the attention through a sigmoid because you were having numerical problems:
https://github.com/codekansas/keras-language-modeling/blob/master/attention_lstm.py#L54
This may work, but I think that should actually be a softmax. In the paper you cite, it only says that the activation should be proportional to
In another paper [1], they explicitly say it should be
[1] https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf
The text was updated successfully, but these errors were encountered: