Replies: 2 comments 5 replies
-
Is this correct? ![]() ![]() |
Beta Was this translation helpful? Give feedback.
2 replies
-
Doc updated |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Context / Scenario
Read the document Cosine Similarity.
What happened?
The document Cosine Similarity contains the following text:
Cosine similarity is particularly useful when working with high-dimensional data such as word embeddings because it takes into account both the magnitude and direction of each vector. This makes it more robust than other measures like Euclidean distance, which only considers the direction.
This is simply false.
Example
Cosine similarity checks the similarity by first scaling vectors to become a unit vector, wiping all magnitude effects. And then it gives a distance measure via the how close the two vectors are aligned via angles (cosine of angle). A vector [1,1] and vector [100, 100] have a cosine similarity of 1, i.e. they are the same.
The Euclidean distance does not involve a normed the magnitude and the above two vectors have a L2 distance of about 140.
Now comparing [1,1] to [10, 10], the cosine similarity is still 1, but the L2 distance is now about 12.72.
Thus the Euclidean distance clearly takes the magnitude into account while the cosine similarity does not.
This was an issue raised by @aguzev in #651 but it appears there was a misunderstanding and the error was not corrected.
Further Discussion Cautioning the Use of Cosine
Additionally, calling cosine a more robust method of similarity is not true, as there is ongoing discussion on cosine similarity's usefulness
Proposed edits
My suggestion is to simply remove the paragraph, and the article would be solid piece. As it stands, the description can easily be mistaken as a false mathematical statement.
Importance
a fix would make my life easier
Platform, Language, Versions
KM Version 0.62
Relevant log output
Beta Was this translation helpful? Give feedback.
All reactions