Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fresh QuickUMLS installation on Windows 10 returning empty [] - no errors #82

Open
khankanz opened this issue Mar 22, 2022 · 2 comments

Comments

@khankanz
Copy link

khankanz commented Mar 22, 2022

Describe the bug

  • running test script following fresh Windows installation returns []

**Environment **

  • conda env with python 3.8.10, quickumls 1.4.0.post1, quickumls-simstring 1.1.5r2
  • Windows 10

Additional context

  • followed the installation guide. Ran into issues with simstring installation but was able to resolve following the steps from /Georgetown-IR-Lab/simstring)
  • had to update core.py to load spacy with: self.nlp = spacy.load('en_core_web_sm')
  • the following test script returns []:
  • from quickumls import QuickUMLS matcher = QuickUMLS(quickumls_fp='FILEPATH TO QUICKUMLS') text = "The ulna has dislocated posteriorly from the trochlea of the humerus." print(matcher.match(text, best_match=True, ignore_syntax=False))
@khankanz
Copy link
Author

Happy to provide additional details. Not sure which direction to investigate further.

@khankanz
Copy link
Author

I started to dig into the source code. I started writing scripts to test pieces of the code. I'm noticing that the retrieve function keeps returning null. This same script works perfectly fine on my Ubuntu VM.

import unicodedata
from quickumls_simstring import simstring
import os, six, unicodedata

def safe_unicode(s):
    if six.PY2:
        # in python 3, there no ambiguity on whether
        # a string is encoded in bytes format or not
        try:
            s = u'%s' % s
        except UnicodeDecodeError:
            s = u'%s' % s.decode('utf-8')

    return u'{}'.format(unicodedata.normalize('NFKD', s))

def prepare_string_for_db_input(s):
    if six.PY2:
        print('s > six.PY2', s)
        return s.encode('utf-8')
    else:
        print('s > NO six.PY2', s)
        return s

path = "FILEPATH/umls-simstring.db"
print(os.path.join(path, 'umls-terms.simstring'))
db = simstring.reader(os.path.join(path, 'umls-terms.simstring'))
#Use cosine & threshold 0.6
db.measure = simstring.cosine
db.threshold = 0.6
term = "elbow, ula"

print('term ready for db lookup:', prepare_string_for_db_input(safe_unicode(term)))
print(db.retrieve(prepare_string_for_db_input(safe_unicode(term))))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant