-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
additional language dependencies #33
Comments
@balmas - I spend all day to add these libraries - and here it is my results:
And faced with problems for japanese and korean Japanese needs I was able to find versions for our environment - Unidic, Mecab But I didn't find a working version for SudachiPy to work with Cython There is a compiled library with SudachiPy and Cython - https://github.com/polm/fugashi Korean needs I was able to install natto-py I could continue with it tomorrow - it is really difficult to build the container on my evenning/night - it needs much more time. @balmas , how do you think how much time it is worth to spend for Koreen and Japaneese support? |
@irina060981 let's not worry about those for the moment. Thanks. |
#31 identified a tokenizer error with Chinese due to a missing dependency.
Spacy documentation lists additional dependencies for a number of languages at https://spacy.io/usage/models#languages:
Japanese: Unidic, Mecab, SudachiPy
Russian: pymorphy2
Ukrainian: pymorphy2
Thai: pythainlp
Korean: mecab-ko, mecab-ko-dic, natto-py
Vietnamese: Pyvi
@irina060981 if you can confirm the chinese fix works (and the Dockerfile fix too) maybe you can add these dependencies too?
The text was updated successfully, but these errors were encountered: