Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with jaro_winkler function? #47

Open
flo-blg opened this issue Jan 30, 2020 · 0 comments
Open

Bug with jaro_winkler function? #47

flo-blg opened this issue Jan 30, 2020 · 0 comments

Comments

@flo-blg
Copy link

flo-blg commented Jan 30, 2020

Hi,

I'm facing a strange result using jaro_winkler function, which looks like a bug:

In [73]: Levenshtein.jaro_winkler('guerrilla girls', 'guerilla girls')
Out[73]: 0.9295238095238095

I was surprised to see such a low score for this simple "r" omission from a 15 characters string.

So I tried replacing the second "r" in the first string with a "b". The only thing that changes in this test is that the "r" omission becomes a "b" omission in the second string.

And now the score is pretty good, and much closer from what I expected:

In [74]: Levenshtein.jaro_winkler('guerbilla girls', 'guerilla girls')
Out[74]: 0.9866666666666667

I tried the two same tests with another library (jaro-winkler), and the two scores are equal in both situations (and they are equal to the second test made with python-Levenshtein):

In [77]: jaro.jaro_winkler_metric('guerrilla girls', 'guerilla girls')
Out[77]: 0.9866666666666667
In [78]: jaro.jaro_winkler_metric('guerbilla girls', 'guerilla girls')
Out[78]: 0.9866666666666667

What do you think about it? The first result is really weird, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant