Skip to content

Commit

Permalink
fix the distribution calculation by dividing with the square root of …
Browse files Browse the repository at this point in the history
…query_size before applying the softmax
  • Loading branch information
Dobiasd committed Dec 31, 2023
1 parent 67264ea commit 8f630a8
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion include/fdeep/layers/multi_head_attention_layer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,8 @@ class multi_head_attention_layer : public layer
// https://github.com/keras-team/keras/blob/v2.14.0/keras/layers/attention/multi_head_attention.py
// https://gist.github.com/sevagh/b71d253a347a9b59c026580625452fc5
const tensor scores = dot_product_tensors(query, transpose(key), std::vector<int>({2, 1}), false);
const tensor distribution = softmax(scores);
const std::size_t query_size = query.shape().depth_;
const tensor distribution = softmax(transform_tensor(fplus::multiply_with(1 / std::sqrt(query_size)), scores));
return dot_product_tensors(distribution, value, std::vector<int>({2, 1}), false);
}
protected:
Expand Down

0 comments on commit 8f630a8

Please sign in to comment.