Skip to content
This repository has been archived by the owner on Apr 11, 2021. It is now read-only.

Sigmoid in AttentionLSTM #30

Open
bkj opened this issue Nov 22, 2016 · 0 comments
Open

Sigmoid in AttentionLSTM #30

bkj opened this issue Nov 22, 2016 · 0 comments

Comments

@bkj
Copy link

bkj commented Nov 22, 2016

I noticed that you run the attention through a sigmoid because you were having numerical problems:

https://github.com/codekansas/keras-language-modeling/blob/master/attention_lstm.py#L54

This may work, but I think that should actually be a softmax. In the paper you cite, it only says that the activation should be proportional to

exp(dot(m, U_s))

In another paper [1], they explicitly say it should be

softmax(exp(dot(m, U_s)))

[1] https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant