About Traditional Chinese model #6025

pactera-song · 2020-09-04T07:06:34Z

pactera-song
Sep 4, 2020

Does spaCY support traditional Chinese? ZH-TW or ZH-HK.

Sep 4, 2020

In the basic Chinese language support, there's not much specific to simplified Chinese, except maybe the stop words. If you have a jieba dictionary or pkuseg model for traditional Chinese characters, it should work fine.

The provided Chinese models like zh_core_web_sm are trained on OntoNotes 5, which only contains simplified Chinese (see p. 27 in the docs: https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf). If you know of data with a permissive license that can be used to train models for traditional Chinese (typically it's hardest to find NER data), we'd be happy to look into whether we could provide additional models.

View full answer

adrianeboyd · 2020-09-04T10:34:33Z

adrianeboyd
Sep 4, 2020

In the basic Chinese language support, there's not much specific to simplified Chinese, except maybe the stop words. If you have a jieba dictionary or pkuseg model for traditional Chinese characters, it should work fine.

The provided Chinese models like zh_core_web_sm are trained on OntoNotes 5, which only contains simplified Chinese (see p. 27 in the docs: https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf). If you know of data with a permissive license that can be used to train models for traditional Chinese (typically it's hardest to find NER data), we'd be happy to look into whether we could provide additional models.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Traditional Chinese model #6025

{{title}}

Replies: 1 comment

{{title}}

Select a reply

About Traditional Chinese model #6025

pactera-song Sep 4, 2020

Replies: 1 comment

adrianeboyd Sep 4, 2020

pactera-song
Sep 4, 2020

adrianeboyd
Sep 4, 2020