Custom morphological analysis #4519
Replies: 2 comments
-
The plan is to have a model that can learn to do the morphological analysis, but it's still in progress. The assignment through Some languages do have a huge number of tags currently with results that are much better than I'd expect (Dutch has 800 at 91% accuracy), but I really don't know how spacy might do on Polish. I think the morphological analysis component just isn't quite ready at this point, so for the current version you can explode the tagset if you want to have some morphological tags or leave it at coarser-grained tags for now. |
Beta Was this translation helpful? Give feedback.
-
The upcoming v3 will have a (And a note about the v3 alpha Polish models: because of issues training from multiple datasets for the Please feel free to open a new issue if you run into any problems! |
Beta Was this translation helpful? Give feedback.
-
Hey!
I'm working on POS tagging for Polish for my model. It is a language with very rich morphology, and we would like to offer some information about this to the user. But becasue of the sheer number of tags, training the POS tagger to recognize all the fine-grained tags of the NKJP tagset that we work with is impossible (it has thousands of combinations of few dozens morphological features). For this reason we've cut down the tagset to 35 part of speech tags, and completely disregard morphology in the tagger.
However we have a separate tool which is able to do morphological analysis very well, and we aim to integrate it. Ideally we would want to write the features directly to the token.morph, but this is prohibited. If I understand it correctly, an indirect assignment would have to go through the tag_map, but I'd not risk causing troubles to the POS tagger in this way.
I was also thinking about writing this information directly to
token.tag_
, e.g. turningSUBST
(corresponding to UDNOUN
) intoSUBST:SG:ACC:F
, but I am worried thatthis would cause problems somewhere else.How would you go about this task?
Beta Was this translation helpful? Give feedback.
All reactions