Support for constituency parsing #59

elyase · 2015-04-21T12:41:06Z

elyase
Apr 21, 2015

It would be great if spacy offered some sort of constituency parsing information. I think the API can look similar to the one used for NER.

honnibal · 2015-04-28T14:22:27Z

honnibal
Apr 28, 2015
Maintainer

This is part of the current road-map, using something like this approach:

https://www.cs.cmu.edu/~nasmith/papers/kong+rush+smith.naacl15.pdf

In the meantime I'd like to provide just base noun phrases and prepositional phrases using heuristics. This might be ready within the next month or so.

0 replies

aeneaswiener · 2015-08-17T13:05:51Z

aeneaswiener
Aug 17, 2015

Has there been any progress on noun phrase extraction mentioned above?

0 replies

geovedi · 2015-08-17T13:50:46Z

geovedi
Aug 17, 2015

I've seen some progress on NP extraction. Pretty smart, I'd say :-)
https://github.com/honnibal/spaCy/blob/gaz/spacy/tokens/doc.pyx#L185

0 replies

elyase · 2015-11-13T01:05:29Z

elyase
Nov 13, 2015
Author

is constituency parsing still on the roadmap?

0 replies

ohenepee · 2017-05-29T09:26:35Z

ohenepee
May 29, 2017

It's 2017, is constituency parsing still in the works, outta the works, or almost done?

0 replies

f11r · 2017-10-10T16:03:04Z

f11r
Oct 10, 2017

@honnibal Would be great to know if/where this is on the current roadmap.

0 replies

ines · 2017-11-09T16:51:45Z

ines
Nov 9, 2017
Maintainer

Quick update: This might be a nice use case for the new custom processing pipeline components and extension attributes introduced in v2.0!

0 replies

maxhawkins · 2018-04-07T12:22:23Z

maxhawkins
Apr 7, 2018

The author of the "Transforming Dependencies" paper has an open source implementation: https://github.com/ikekonglp/PAD

0 replies

iloveluce · 2018-05-10T18:52:30Z

iloveluce
May 10, 2018

You might also want to take a look at AllenNLP, they're open source, built on top of SpaCy and just released Constituency Parsing

0 replies

nikitakit · 2018-05-29T03:19:27Z

nikitakit
May 29, 2018

For anyone interested in English constituency parsing I now have a release version out for the paper I'll be presenting at ACL this year ("Constituency Parsing with a Self-Attentive Encoder"). The package ships with a pre-trained English model (95 F1 on the Penn Treebank WSJ test set) and spaCy integration via extension attributes.

3 replies

afparsons Feb 5, 2021

Edit: no longer true

FYI to anyone who comes across this: the Berkeley Neural Parser works quite well (thank you, @nikitakit!) but seems to no longer be maintained. For example, users now have to fork or monkey-patch the package in order to get Tensorflow to cooperate.

nikitakit Feb 6, 2021

It's correct to say that I haven't updated benepar in a long time, but I actually just wrapped up and published a major new version of the parser. It also has updates to ensure that it still works with spaCy 3.0

afparsons Feb 7, 2021

Congrats! Thank you for your hard work.

ines · 2018-05-29T11:00:28Z

ines
May 29, 2018
Maintainer

@nikitakit Ah, this is cool and a great use of the extension attributes. Looking forward to playing with it! Would you mind if we added the project to the spaCy Universe?

0 replies

nikitakit · 2018-05-29T15:32:45Z

nikitakit
May 29, 2018

@ines Sure, happy to have a mention included in the spaCy Universe!

0 replies

honnibal · 2018-05-29T17:43:10Z

honnibal
May 29, 2018
Maintainer

@nikitakit Wow, great paper! I need to try your attention component in the parser and NER! Seems simple enough...

I'm intrigued that the results keep going up even at a window of 20. Not by much, but it's still helping. It surprises me -- I would've thought the long-range effects would be very weak.

There are a few other things that are interesting here. Thanks for carefully exploring this question of the effects of factoring the different sources of information. Together with Dozat and Manning's results, I think we can say we now have two systems indicating this can be important. I wonder whether it's just a dataset size issue (it might be too easy to overfit on these small parsing problems), or whether we'll find more problems where controlling the flow of information in the network proves useful. If so we'll end up back towards the factor graphs I think, and back towards something that looks like feature engineering.

I haven't read all of the paper yet, and I haven't gone through the attention mechanism carefully. So perhaps this doesn't make sense, but: if you restrict the window of the attention component, and then stack those layers, wouldn't you get an increasing "receptive field", just like in CNNs? So, imagine you encode with a window size of 5. Then, you encode with another window size of 5. Aren't you drawing information from up to 10 words away?

0 replies

nikitakit · 2018-05-30T03:54:02Z

nikitakit
May 30, 2018

@honnibal Glad you liked the paper! There's definitely a lot more to explore in terms of trying self-attention for different tasks, and also figuring out if factoring information helps across tasks or could be generalized in some way.

For the attention windowing, you're absolutely right that stacking two layers with window size 5 gives an effective receptive field of size 10. That's why it's surprising that 8 layers, with window size 10 each, still don't quite match the accuracy of the un-windowed model. I don't have any convincing explanation for why this is the case.

0 replies

thesofakillers · 2020-07-03T16:46:26Z

thesofakillers
Jul 3, 2020

Hi Guys, besides the great plugin provided by Nikita (thank you so much btw), are there any plans to bring constituency parsing into the core API still? Or have we settled on using the plugin? Thanks!

0 replies

randomgambit · 2020-10-27T19:29:39Z

randomgambit
Oct 27, 2020

interested in this as well, thanks!

0 replies

nikitakit · 2021-02-07T22:43:06Z

nikitakit
Feb 7, 2021

@honnibal @ines I see that there are new Span/SpanGroup APIs under development for spaCy 3 (and maybe even released, as of a few days ago). Would you say that these are appropriate for use with constituency parsing? I haven't familiarized myself with these APIs, so I'm not sure if there may be an gotchas that when using these for deeply nested tree structures, or if using these for constituency parsing will somehow conflict with other use cases like NER.

0 replies

afparsons · 2022-10-06T13:40:32Z

afparsons
Oct 6, 2022

I'm not familiar with the publication history behind constituency parsing (although names like Michael Collins and Fei Xia seem to appear often), so please forgive any naivety.

Can we use the DependencyMatcher to form a constituency tree? Or is there some flaw in relying on a rules-based conversion?

1 reply

adrianeboyd Oct 7, 2022

I think you could potentially develop good rule-based conversions for individual treebanks, but I don't think the dependency matcher on its own is going to be the right tool to do this kind of tree transformation. In a quick search, these papers looked relevant and fairly recent in the scheme of things, but I haven't done a thorough search or review: https://aclanthology.org/N15-1080, https://aclanthology.org/C16-1041

The spacy Doc object has been focused on dependency trees and doesn't really include a good way to store a constituency tree, so you'd probably want to look at other libraries or tools to work on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for constituency parsing #59

{{title}}

Replies: 18 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Support for constituency parsing #59

Replies: 18 comments · 4 replies

honnibal Apr 28, 2015 Maintainer

elyase Nov 13, 2015 Author

ines Nov 9, 2017 Maintainer

ines May 29, 2018 Maintainer

honnibal May 29, 2018 Maintainer

Replies: 18 comments 4 replies

honnibal
Apr 28, 2015
Maintainer

elyase
Nov 13, 2015
Author

ines
Nov 9, 2017
Maintainer

ines
May 29, 2018
Maintainer

honnibal
May 29, 2018
Maintainer