noun chunks are missed when there are `()` #1840

pengyu · 2018-01-15T01:43:51Z

The following example shows that phospholipase C (PLC) δ1 can not be correctly extracted. This usually happens when there are (). Can this bug be systematically fixed?

$ cat main2.py 
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:

import spacy
nlp = spacy.load('en', disable=['tokenizer', 'ner', 'textcat'])
## 'tagger' and 'parser' can not be disabled.

doc = nlp(u'We previously revealed that the expression of phospholipase C (PLC) δ1, one of the most basal PLCs, is down-regulated in colon adenocarcinoma, and that the KRAS signaling pathway suppresses PLCδ1 expression.')
print [x for x in doc.noun_chunks]
$ ./main2.py 
[We, the expression, phospholipase C (PLC, δ1, the most basal PLCs, colon adenocarcinoma, the KRAS, PLCδ1 expression]

The text was updated successfully, but these errors were encountered:

ines · 2018-12-14T11:28:18Z

The noun chunks depend on the part-of-speech tags and dependency parse, so this issue likely comes down to incorrect predictions made by the tagger or parser.

I'm merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details.

lock · 2019-01-13T16:59:02Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added performance lang / en English language data and models labels Jan 15, 2018

ines added perf / accuracy Performance: accuracy and removed performance labels Aug 15, 2018

ines closed this as completed Dec 14, 2018

lock bot locked as resolved and limited conversation to collaborators Jan 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

noun chunks are missed when there are `()` #1840

noun chunks are missed when there are `()` #1840

pengyu commented Jan 15, 2018

ines commented Dec 14, 2018

lock bot commented Jan 13, 2019

noun chunks are missed when there are () #1840

noun chunks are missed when there are () #1840

Comments

pengyu commented Jan 15, 2018

ines commented Dec 14, 2018

lock bot commented Jan 13, 2019

noun chunks are missed when there are `()` #1840

noun chunks are missed when there are `()` #1840