Skip to content

About Traditional Chinese model #6025

Discussion options

You must be logged in to vote

In the basic Chinese language support, there's not much specific to simplified Chinese, except maybe the stop words. If you have a jieba dictionary or pkuseg model for traditional Chinese characters, it should work fine.

The provided Chinese models like zh_core_web_sm are trained on OntoNotes 5, which only contains simplified Chinese (see p. 27 in the docs: https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf). If you know of data with a permissive license that can be used to train models for traditional Chinese (typically it's hardest to find NER data), we'd be happy to look into whether we could provide additional models.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage lang / zh Chinese language data and models
2 participants
Converted from issue

This discussion was converted from issue #6025 on December 11, 2020 00:00.