UMLS thesaurus #1206

rpietro · 2017-07-20T21:31:20Z

rpietro
Jul 20, 2017

Any thoughts on the possibility of integrating spacy with the UMLS thesaurus https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/ for classification of medical texts?

honnibal · 2017-07-21T22:51:27Z

honnibal
Jul 21, 2017
Maintainer

In general I'd like to do more to support medical text processing -- so, maybe! However at first glance it seems that the UMLS data isn't easy to license? This makes me much less inclined to include native functionality for it in the main library.

What do you need, mostly? If you want to annotate texts with concepts from UMLS, I think the PhraseMatcher class would be a good option. You can also add lexical flag attributes, or customize the .similarity() function to refer to the thesaurus.

0 replies

ines · 2017-11-09T16:31:38Z

ines
Nov 9, 2017
Maintainer

Update: Now that spaCy v2.0 is out, this might be a good use case for a custom pipeline extension! https://spacy.io/usage/processing-pipelines#custom-components

0 replies

putssander · 2018-04-16T09:01:41Z

putssander
Apr 16, 2018

Currently I am thinking about writing a component for spaCy which could write annotations to an UIMA xml format, would this be feasible/makes sense? Would like to combine spaCy with Apache cTakes components (e.g. UMLS dictionary lookup). Not sure how compatible annotations are ...

An alternative approach would be to use cTakes to create databases from UMLS and re-implement the cTakes database lookup module as a spaCy module.

0 replies

GregSilverman · 2018-07-04T18:10:32Z

GregSilverman
Jul 4, 2018

We're actually about to undertake extending our tool, BioMedICUS, to integrate its pipeline components with spaCy. We are definitely open to collaboration in the endeavor, especially with respect to UMLS integration and decoupling from UIMA XMI CAS uglitude.

0 replies

putssander · 2018-07-09T13:05:31Z

putssander
Jul 9, 2018

Currently I am using QuickUMLS for concept extraction, which uses spaCy. QuickUMLS has a tight integration with dependencies leveldb and simstring, which are not very windows compatible. To create a QuickUMLS extension component for spaCy would require rewriting of QuickUMLS. For now I created a dockerized version of QuickUMLS with json TCP in/out, will send QuickUMLS a PR soon. Doing the same atm for pyContextNLP.

Next to using QuickUMLS I started mapping spaCys output to a xml format, to couple spaCy to other NLP libraries and programming languages. First I had a look at UIMA XMI CAS, but picked the NAF document format (xml). Here my WIP of a dockerized version of spaCy with TCP output in NAF. The NAF xml document can be mapped to the CAS namespace within Java (cTakes namespace example). I stopped my attempt to couple spaCy to cTakes dictionary lookup when I found out that (cTakes dictionary lookup requires a chunker), which I would need for the Dutch language. QuickUMLS was my quick alternative ;).

@GregSilverman
Nice medical NLP pipeline! What would your approach be to integrate with spaCy? My current approach is to dockerize NLP modules with xml/json in/out and communication over TCP and chain and scale them using Kubernetes and Spring Cloud Dataflow. Using this approach you can scale, keep models in memory and combine multiple programming languages with low communication overhead.

Writing the modules as custom pipeline extension for spaCy would make the usage more accessible but less reusable.

0 replies

GregSilverman · 2018-07-10T21:27:48Z

GregSilverman
Jul 10, 2018

@putssander, at this point we are evaluating options.

I also thought about use of Docker. I'm currently deep in a similar project using a Kubernetes cluster and just wrapping up on the local testing of BioMedICUS -> ElasticSearch piece (we plan on running BioMedICUS, Clamp, cTakes and MetaMap all in parallel, once we deploy this remotely). See NLP-ADAPT... Please excuse the lack of proper documentation (it's all in our private Slack channel and will eventually be a full Wiki).

We should definitely stay in touch regarding this project.

0 replies

GregSilverman · 2018-07-11T23:29:46Z

GregSilverman
Jul 11, 2018

@putssander, I discussed this my colleagues today and will give it a whirl. I am going to make a microservice in docker that does specific tagged pattern matching in spaCy (based on a gold standard manual annotated set of documents). I'll then take the output of that and convert it to proper annotated CAS xmi format (how? at this point, no idea!) and then combine these annotated artifacts with the other artifacts selected from the other NLP engines running in NLP-ADAPT using a tool one of out developers created, called AMICUS. We're planning on evaluating the spaCy results against the TagEx pipeline component in BioMedICUS.

0 replies

putssander · 2018-07-18T09:01:39Z

putssander
Jul 18, 2018

@GregSilverman Let me know if you make progress with your spaCy integration. Noticed from the AMICUS repository that one of your developers already has extended knowledge of the CAS xmi approach.

Not sure if CAS xmi is required as an intermediate format. From the official UIMA website I understand it should be possible to combine the languages within the UIMA C++ framework (not sure how much CAS xmi serialization is done there). I don’t want to dive into too much into UIMA, I prefer to use SCDF kubernetes + NAF as I described in an earlier comment.

Here is another approach of integrating spaCy with UIMA by exposing spaCy over REST.

If you want to try out my approach, let me know, I can help you to help to get it running.

0 replies

ines · 2019-09-29T16:03:02Z

ines
Sep 29, 2019
Maintainer

Also see scispaCy's UMLS linker: https://github.com/allenai/scispacy#umlsentitylinker-alpha-feature

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UMLS thesaurus #1206

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

UMLS thesaurus #1206

rpietro Jul 20, 2017

Replies: 9 comments

honnibal Jul 21, 2017 Maintainer

ines Nov 9, 2017 Maintainer

putssander Apr 16, 2018

GregSilverman Jul 4, 2018

putssander Jul 9, 2018

GregSilverman Jul 10, 2018

GregSilverman Jul 11, 2018

putssander Jul 18, 2018

ines Sep 29, 2019 Maintainer

rpietro
Jul 20, 2017

honnibal
Jul 21, 2017
Maintainer

ines
Nov 9, 2017
Maintainer

putssander
Apr 16, 2018

GregSilverman
Jul 4, 2018

putssander
Jul 9, 2018

GregSilverman
Jul 10, 2018

GregSilverman
Jul 11, 2018

putssander
Jul 18, 2018

ines
Sep 29, 2019
Maintainer