Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: end_index must be non-negative (again) #32

Open
jtlz2 opened this issue Jun 26, 2020 · 8 comments
Open

ValueError: end_index must be non-negative (again) #32

jtlz2 opened this issue Jun 26, 2020 · 8 comments

Comments

@jtlz2
Copy link

jtlz2 commented Jun 26, 2020

This presents just as in #13. See below to reproduce. Awesome module, thanks!

Version info:

Python 2.7.16 |Anaconda custom (64-bit)| (default, Aug 22 2019, 10:59:10)
fuzzysearch.__version__ = 0.7.2

import fuzzysearch
fuzzysearch.find_near_matches('ABC 0123456', 'ABC', max_l_dist=1).next()

Traceback:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-9ccf0e63dac4> in <module>()
----> 1 fuzzysearch.find_near_matches('ABC 0123456', 'ABC', max_l_dist=1).next()

/anaconda2/lib/python2.7/site-packages/fuzzysearch/__init__.pyc in find_near_matches(subsequence, sequence, max_substitutions, max_insertions, max_deletions, max_l_dist)
     55     search_class = choose_search_class(search_params)
     56     matches = search_class.search(subsequence, sequence, search_params)
---> 57     return search_class.consolidate_matches(matches)
     58
     59

/anaconda2/lib/python2.7/site-packages/fuzzysearch/levenshtein.pyc in consolidate_matches(cls, matches)
    159     @classmethod
    160     def consolidate_matches(cls, matches):
--> 161         return consolidate_overlapping_matches(matches)
    162
    163     @classmethod

/anaconda2/lib/python2.7/site-packages/fuzzysearch/common.pyc in consolidate_overlapping_matches(matches)
    186 def consolidate_overlapping_matches(matches):
    187     """Replace overlapping matches with a single, "best" match."""
--> 188     groups = group_matches(matches)
    189     best_matches = [get_best_match_in_group(group) for group in groups]
    190     return sorted(best_matches)

/anaconda2/lib/python2.7/site-packages/fuzzysearch/common.pyc in group_matches(matches)
    162 def group_matches(matches):
    163     groups = []
--> 164     for match in matches:
    165         overlapping_groups = [g for g in groups if g.is_match_in_group(match)]
    166         if not overlapping_groups:

/anaconda2/lib/python2.7/site-packages/fuzzysearch/levenshtein.pyc in search(cls, subsequence, sequence, search_params)
    154     def search(cls, subsequence, sequence, search_params):
    155         for match in find_near_matches_levenshtein(subsequence, sequence,
--> 156                                                    search_params.max_l_dist):
    157             yield match
    158

/anaconda2/lib/python2.7/site-packages/fuzzysearch/levenshtein_ngram.pyc in find_near_matches_levenshtein_ngrams(subsequence, sequence, max_l_dist)
    175         start_index = max(0, ngram_start - max_l_dist)
    176         end_index = min(seq_len, seq_len - subseq_len + ngram_end + max_l_dist)
--> 177         for index in search_exact(subsequence[ngram_start:ngram_end], sequence, start_index, end_index):
    178             # try to expand left and/or right according to n_ngram
    179             dist_right, right_expand_size = _expand(

/anaconda2/lib/python2.7/site-packages/fuzzysearch/search_exact.pyc in search_exact(subsequence, sequence, start_index, end_index)
     69         try:
     70             return search_exact_byteslike(subsequence, sequence,
---> 71                                           start_index, end_index)
     72         except (TypeError, UnicodeEncodeError):
     73             return _search_exact(subsequence, sequence, start_index, end_index)

ValueError: end_index must be non-negative
@taleinat
Copy link
Owner

Awesome module, thanks!

Thanks for the kind words, I'm happy you're finding it useful! It would be great to hear what you're using it for.

@taleinat
Copy link
Owner

@jtlz2, which platform are you running this on? Windows / Linux / macOS, which exact version, 32 or 64 bit?

@taleinat
Copy link
Owner

taleinat commented Jun 26, 2020

@jtlz2, could you try running the same code, with bytes objects rather than strings? I.e.:

fuzzysearch.find_near_matches(b'ABC 0123456', b'ABC', max_l_dist=1).next()

@jtlz2
Copy link
Author

jtlz2 commented Jun 26, 2020

@taleinat Apologies - macOS 10.13.6..

We are trialling it for OCR post-processing.

The error comes out the same when using bytes as you suggest (ValueError at L71).

Thanks again!

@taleinat
Copy link
Owner

taleinat commented Jun 27, 2020

@jtlz2, I've started working on this. It seems like a problem with the native (C) extensions.

In the meantime, you may install fuzzysearch without the native extensions by fetching a source archive, unpacking it running python setup.py install --noexts.

@taleinat
Copy link
Owner

@jtlz2, I've fixed what appears to be the source of this issue. The fix is available in version 0.7.3 which I've just released. Please let me know if it resolves this issue for you!

@jtlz2
Copy link
Author

jtlz2 commented Jul 2, 2020

@taleinat Still get the same problem in 0.7.3 :\

@taleinat
Copy link
Owner

taleinat commented Jul 3, 2020

Still get the same problem in 0.7.3 :\

☹️

This seems to be related to the Anaconda distribution somehow, as it only appears to happen with it, but not with Python from python.org or built from the main git repo. I'll have to investigate further when I have more time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants