-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding possibility to select cim10 and atc in eds.cim10 and eds.drugs #314
base: master
Are you sure you want to change the base?
Conversation
55433ab
to
c804921
Compare
Coverage Report
Files without new missing coverage
264 files skipped due to complete coverage. Coverage success: total of 97.76% is above 97.76% 🎉 |
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @svittoz, thank you ! I've left a few comments and suggestions
@@ -30,4 +30,6 @@ def get_patterns() -> Dict[str, List[str]]: | |||
|
|||
patterns = df.groupby("code")["patterns"].agg(list).to_dict() | |||
|
|||
patterns = {k: v for k, v in patterns.items() if k in cim10} if cim10 else patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not (which would fail for unknown cim10 codes) and might be a bit faster
patterns = {k: v for k, v in patterns.items() if k in cim10} if cim10 else patterns | |
patterns = {k: patterns[k] for k in codes} if codes else patterns |
@@ -5,7 +5,7 @@ | |||
from edsnlp import BASE_DIR | |||
|
|||
|
|||
def get_patterns() -> Dict[str, List[str]]: | |||
def get_patterns(cim10: List[str] = None) -> Dict[str, List[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe standardize the cim10/atc into a "code" parameter ?
def get_patterns(cim10: List[str] = None) -> Dict[str, List[str]]: | |
def get_patterns(codes: List[str] = None) -> Dict[str, List[str]]: |
@@ -28,6 +28,7 @@ def create_component( | |||
name: str = "cim10", | |||
*, | |||
attr: str = "NORM", | |||
cim10: List[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cim10: List[str] = None, | |
codes: List[str] = None, |
cim10 : str | ||
List of cim10 to retrieve. If None, all cim10 will be searched, | ||
resulting in higher computation time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standardize cim10/atc/codes
cim10 : str | |
List of cim10 to retrieve. If None, all cim10 will be searched, | |
resulting in higher computation time. | |
codes : str | |
CIM10 codes to retrieve. If None, synonyms for all codes will be searched | |
resulting in higher computation time. |
@@ -104,7 +108,7 @@ def create_component( | |||
nlp=nlp, | |||
name=name, | |||
regex=dict(), | |||
terms=get_patterns(), | |||
terms=get_patterns(cim10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
terms=get_patterns(cim10), | |
terms=get_patterns(codes), |
@@ -83,6 +84,9 @@ def create_component( | |||
The name of the component | |||
attr : str | |||
The default attribute to use for matching. | |||
atc : str | |||
List of atc to retrieve. If None, all atc will be searched, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List of atc to retrieve. If None, all atc will be searched, | |
codes : str | |
ATC codes to retrieve. If None, synonyms for all codes will be searched | |
resulting in higher computation time. |
@@ -111,7 +115,7 @@ def create_component( | |||
nlp=nlp, | |||
name=name, | |||
regex=dict(), | |||
terms=get_patterns(), | |||
terms=get_patterns(atc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
terms=get_patterns(atc), | |
terms=get_patterns(codes), |
@@ -6,6 +6,8 @@ | |||
drugs_file = BASE_DIR / "resources" / "drugs.json" | |||
|
|||
|
|||
def get_patterns() -> Dict[str, List[str]]: | |||
def get_patterns(atc: List[str] = None) -> Dict[str, List[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_patterns(atc: List[str] = None) -> Dict[str, List[str]]: | |
def get_patterns(codes: List[str] = None) -> Dict[str, List[str]]: |
with open(drugs_file, "r") as f: | ||
return json.load(f) | ||
patterns = json.load(f) | ||
patterns = {k: v for k, v in patterns.items() if k in atc} if atc else patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patterns = {k: v for k, v in patterns.items() if k in atc} if atc else patterns | |
patterns = {k: patterns[k] for k in codes} if codes else patterns |
text = """ | ||
Leucocytes ¦ ¦ ¦4.2 ¦ ¦4.0-10.0 | ||
Hémoglobine ¦ ¦9.0 - ¦ ¦13-14 | ||
""" | ||
doc = blank_nlp(text) | ||
doc = matcher(doc) | ||
|
||
assert len(doc.spans["measurements"]) == 0 | ||
assert len(doc.spans["quantities"]) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a backward compatibility test ?
assert len(doc.spans["quantities"]) == 0 | |
assert len(doc.spans["quantities"]) == 0 | |
def test_measurements(blank_nlp): | |
blank_nlp.add_pipe("eds.measurements") | |
assert blank_nlp("La tumeur fait 4 cm").spans["measurements"][0]._.value.mm == 40 |
ae75dc5
to
430ef22
Compare
2038fb9
to
232ca91
Compare
Description
Currently,
eds.cim10
andeds.drugs
does not allow to select cim10 or ATC of interest, resulting in high computational time when used.TODO : see also
eds.umls
?Checklist