You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've realised that sometimes noun chunks yield a noun chunk which is embedded in a longer one. I have only identified this behaviour in a few examples involving clauses with which.
Take the sentence "Including equity share of refineries in which the Group has a stake."
"the Group" and "in which the Group has a stake" are marked as noun chunks. But this does not happen normally. I put below a few examples so you can reproduce and study this.
How to reproduce the behaviour
importspacynlp=spacy.load('en_core_web_md')
text0="American company listed on NASDAQ in which the Group holds a 23.51% interest as of December 31, 2016."text1="Including equity share of refineries in which the Group has a stake."text2="Prices for oil and natural gas may fluctuate widely due to many\nfactors over which TOTAL has no control."text3="This\nscope, which is different from the “operated domain” mentioned\nabove, includes all the assets in which the Group has a financial\ninterest or rights to production.\n "text4="GHG emissions are also published on an equity interest basis, i.e.,\nby consolidating the Group share of the emissions of all assets in\nwhich the Group has a financial interest or rights to production.\n "text5="From this profit, minus prior losses, if any, the following items are\ndeducted in the order indicated:\n 1) 5% to constitute the legal reserve fund, until said fund reaches\n10% of the share capital;\n 2) the amounts set by the Shareholders’ Meeting to fund reserves\nfor which it determines the allocation or use; and\n 3) the amounts that the Shareholders’ Meeting decides to retain.\n "texts= [text0, text1, text2, text3, text4, text5]
fori, tinenumerate(texts):
print('# Noun chunks in text {}:'.format(i))
doc=nlp(t)
fornpindoc.noun_chunks:
print(np)
These are my comments on the texts analyzed:
Text 0: "the Group" and "in which the Group holds a 23.51% interest"
Text 1: "the Group" and "in which the Group has a stake".
Text 2: "TOTAL" and "over which TOTAL has no control".
Text 3: "the Group" and "in which the Group has a financial".
Text 4: no issue as per this example, this is the behaviour I expected.
Text 5: "it" and "for which it determines the allocation".
The noun chunks depend on the part-of-speech tags and dependency parse, so this issue likely comes down to incorrect predictions made by the tagger or parser.
I'm merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details.
The problem
I've realised that sometimes noun chunks yield a noun chunk which is embedded in a longer one. I have only identified this behaviour in a few examples involving clauses with which.
Take the sentence "Including equity share of refineries in which the Group has a stake."
"the Group" and "in which the Group has a stake" are marked as noun chunks. But this does not happen normally. I put below a few examples so you can reproduce and study this.
How to reproduce the behaviour
These are my comments on the texts analyzed:
Your Environment
The text was updated successfully, but these errors were encountered: