Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generator repeats same result several times. #20

Open
sjscotti opened this issue Jan 16, 2023 · 0 comments
Open

generator repeats same result several times. #20

sjscotti opened this issue Jan 16, 2023 · 0 comments

Comments

@sjscotti
Copy link

sjscotti commented Jan 16, 2023

Hi
I just tried out your search capability where I have a short "needle" string of 14 characters, but a very long "haystack" string of over 50000 characters. The routine is finding the needle very quickly, but I am getting the same string detected multiple times. Here is the result for a distance of 5...

ngramLen < 10
needle George Latimer
result [
  { start: 9599, end: 9613, dist: 5 },
  { start: 9599, end: 9613, dist: 5 },
  { start: 9599, end: 9613, dist: 5 },
  { start: 9598, end: 9613, dist: 5 },
  { start: 9598, end: 9613, dist: 4 },
  { start: 9598, end: 9613, dist: 4 },
  { start: 9598, end: 9613, dist: 5 },
  { start: 9598, end: 9613, dist: 4 },
  { start: 9598, end: 9613, dist: 5 },
  { start: 9598, end: 9613, dist: 5 }
]

Note: the actual substring in the Haystack is "george \\ latimer"

Also, the start value seems to be slightly off (it starts in column 9611 according to my text editor).
Is there a reason for the multiple detections to the same substring, and the different values for dist returned?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant