Training dependency parser and POS tagger on modified OntoNotes 5.0 #13454

skarokin · 2024-04-23T22:56:44Z

skarokin
Apr 23, 2024

I have OntoNotes 5.0 dataset that I see that spaCy's pretrained dependency parser and POS tagger is trained on. I need to modify this dataset and train both.

I understand that OntoNotes 5.0 is constituency parsed, not dependency parsed. Is there a way to convert between the two? Furthermore, how are there any modifications I need to do on the OntoNotes 5.0 dataset to train the POS tagger?

Finally, to my best beliefs the parser is reliant on the tagger. How can I ensure that in actual usage, my custom tagging model is used when running my own trained parser?

honnibal · 2024-04-23T23:00:18Z

honnibal
Apr 23, 2024
Maintainer

There's a variety of dependency converters, but we use the ClearNLP one because it makes use of the trace nodes and function tags to get better results. We also aligned the data to raw source texts where available so that we could train the parser on non-sentence-segmented text.

The parser doesn't depend on the POS tagger. You can train them separately.

1 reply

skarokin Apr 23, 2024
Author

Thanks for the quick reply.

My goal is to use the dependency relations between two words and analyze their POS tags to detect if an input is ungrammatical. Thus I will be modifying and copying a subset of the OntoNotes 5.0 dataset by changing the part of speech of some word(s) while maintaining dependency relations.

Since you mentioned that the tagger and parser are completely separate components, is it only necessary to train the tagger with my goal in mind?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training dependency parser and POS tagger on modified OntoNotes 5.0 #13454

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Training dependency parser and POS tagger on modified OntoNotes 5.0 #13454

skarokin Apr 23, 2024

Replies: 1 comment · 1 reply

honnibal Apr 23, 2024 Maintainer

skarokin Apr 23, 2024 Author

skarokin
Apr 23, 2024

Replies: 1 comment 1 reply

honnibal
Apr 23, 2024
Maintainer

skarokin Apr 23, 2024
Author