You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
running test script following fresh Windows installation returns []
**Environment **
conda env with python 3.8.10, quickumls 1.4.0.post1, quickumls-simstring 1.1.5r2
Windows 10
Additional context
followed the installation guide. Ran into issues with simstring installation but was able to resolve following the steps from /Georgetown-IR-Lab/simstring)
had to update core.py to load spacy with: self.nlp = spacy.load('en_core_web_sm')
the following test script returns []:
from quickumls import QuickUMLS matcher = QuickUMLS(quickumls_fp='FILEPATH TO QUICKUMLS') text = "The ulna has dislocated posteriorly from the trochlea of the humerus." print(matcher.match(text, best_match=True, ignore_syntax=False))
The text was updated successfully, but these errors were encountered:
I started to dig into the source code. I started writing scripts to test pieces of the code. I'm noticing that the retrieve function keeps returning null. This same script works perfectly fine on my Ubuntu VM.
import unicodedata
from quickumls_simstring import simstring
import os, six, unicodedata
def safe_unicode(s):
if six.PY2:
# in python 3, there no ambiguity on whether
# a string is encoded in bytes format or not
try:
s = u'%s' % s
except UnicodeDecodeError:
s = u'%s' % s.decode('utf-8')
return u'{}'.format(unicodedata.normalize('NFKD', s))
def prepare_string_for_db_input(s):
if six.PY2:
print('s > six.PY2', s)
return s.encode('utf-8')
else:
print('s > NO six.PY2', s)
return s
path = "FILEPATH/umls-simstring.db"
print(os.path.join(path, 'umls-terms.simstring'))
db = simstring.reader(os.path.join(path, 'umls-terms.simstring'))
#Use cosine & threshold 0.6
db.measure = simstring.cosine
db.threshold = 0.6
term = "elbow, ula"
print('term ready for db lookup:', prepare_string_for_db_input(safe_unicode(term)))
print(db.retrieve(prepare_string_for_db_input(safe_unicode(term))))
Describe the bug
**Environment **
Additional context
self.nlp = spacy.load('en_core_web_sm')
from quickumls import QuickUMLS matcher = QuickUMLS(quickumls_fp='FILEPATH TO QUICKUMLS') text = "The ulna has dislocated posteriorly from the trochlea of the humerus." print(matcher.match(text, best_match=True, ignore_syntax=False))
The text was updated successfully, but these errors were encountered: