Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

position returned by pymetamap #47

Open
ShoRit opened this issue May 10, 2020 · 4 comments
Open

position returned by pymetamap #47

ShoRit opened this issue May 10, 2020 · 4 comments

Comments

@ShoRit
Copy link

ShoRit commented May 10, 2020

Hi Anthony,

Firstly, thank you for the wonderful implementation of metamap.

However, I was running into some issues while extracting the keywords using pymetamap.

For example, in the sentence itself "John had a huge heart-attack", could you please direct me to how to extract the exact position of the keyword identified by pymetamap. It shows position = 17:12, but in several cases, I see the exact character position is off by 1-2 characters.

Could you provide some insight into this?

@AnthonyMRios
Copy link
Owner

Hi ShoRit,

Can you provide some examples? This is going to be an issue with MetaMap, not with the wrapper. But, if you share an example, I can look into it.

@yuliaoh
Copy link

yuliaoh commented May 23, 2020

Metamap positions are not 0-indexed, that must be why it appears off

@kaushikacharya
Copy link
Contributor

Metamap positions are not 0-indexed, that must be why it appears off

@ShoRit @yuliaoh
My understanding is that its 0-indexed.
MMI output documentation quotes

Positional Information – Bar separated list of positional information doubles showing StartPos, colon (:), and Length of each trigger identified in the Trigger Information field. StartPos begins at position zero (0) of the input text.

Here's the output using MetaMap 2020 release version:

echo "heart attack" | ./public_mm/bin/metamap -N -Q 4 -y --sldi
outputs
USER|MMI|5.18|Myocardial Infarction|C0027051|[dsyn]|["HEART ATTACK"-tx-1-"heart attack"-noun-0]|TX|0/12|

Another example:
echo "John had a huge heart attack" | ./public_mm/bin/metamap -N -Q 4 -y --sldi
outputs
USER|MMI|3.75|Myocardial Infarction|C0027051|[dsyn]|["HEART ATTACK"-tx-1-"heart attack"-noun-0]|TX|16/12|

As you can see, its 0-indexed. I have passed the same input arguments as used by pymetamap.

Then why does pymetamap output is 1-indexed?

Its the way pymetamap passes input text which is the reason it appears to be 1-indexed.
Taking the example mentioned in pymetamap Readme:

In [3]: sents = ['Heart Attack', 'John had a huge heart attack']

In [4]: concepts,error = mm.extract_concepts(sents,[1,2])

In [5]: for concept in concepts:
   ...:     print(concept)
   ...:
ConceptMMI(index='1', mm='MMI', score='5.18', preferred_name='Myocardial Infarction', cui='C0027051', semtypes='[dsyn]', trigger='["HEART ATTACK"-tx-1-"Heart Attack"-noun-0]', location='TX', pos_info='1/12', tree_codes='')
ConceptMMI(index='2', mm='MMI', score='3.75', preferred_name='Myocardial Infarction', cui='C0027051', semtypes='[dsyn]', trigger='["HEART ATTACK"-tx-1-"heart attack"-noun-0]', location='TX', pos_info='17/12', tree_codes='')

Looking into the code why it appears to become 1-indexed in pymetamap's output:

https://github.com/AnthonyMRios/pymetamap/blob/master/pymetamap/SubprocessBackend.py#L174

                        if input_text is None:
                            input_text = '{0!r}|{1!r}\n'.format(identifier, sentence).encode('utf8')
                        else:
                            input_text += '{0!r}|{1!r}\n'.format(identifier, sentence).encode('utf8')

Have a look at the difference between the two strings:

In [12]: '{0!r}'.format('Heart Attack')
Out[12]: "'Heart Attack'"

In [13]: '{0}'.format('Heart Attack')
Out[13]: 'Heart Attack'

This has been nicely explained by mgilson in https://stackoverflow.com/a/38418132/282155 using example as well as the python documentation.

@AnthonyMRios
Copy link
Owner

Nice catch @kaushikacharya. I will look into creating a fix for this, it seems reasonably easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants