Skip to content

Incorrect tagging by a trained model for Tibetan #13549

Discussion options

You must be logged in to vote

I have finally identified the cause of the poor tagging through testing with other languages: the configuration file incorrectly lists the pipeline as ["tok2vec", "tagger"]. It should be set to ["tok2vec", "morphologizer"]. The "tagger" option is used to train a model for XPOS, i.e., language-specific part-of-speech tags, while the "morphologizer" is used for UPOS, i.e., universal part-of-speech tags.

This is the simplest explanation for the issue, but there's another problem in our training dataset: the absence of MISC, the last column in the conllu file. I discovered this by modifying conllu files and training German and Chinese POS taggers from scratch:

  1. ID, FORM, LEMMA and UPOS: This …

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@ykyogoku
Comment options

Answer selected by ykyogoku
@ykyogoku
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant