Sigmoid in AttentionLSTM #30

bkj · 2016-11-22T19:20:48Z

I noticed that you run the attention through a sigmoid because you were having numerical problems:

This may work, but I think that should actually be a softmax. In the paper you cite, it only says that the activation should be proportional to

exp(dot(m, U_s))

In another paper [1], they explicitly say it should be

softmax(exp(dot(m, U_s)))

The text was updated successfully, but these errors were encountered:

Provide feedback