Install `fugashi`, `unidic`, `unidic-lite`, and `ipadic` as dependencies to MLServer HuggingFace to support hosting Japanese language models #1506

jbauer2718 · 2023-12-08T18:57:29Z

Because of the fact that Japanese mixes phonetic scripts and Chinese characters, special algorithms and dictionaries are needed to run tokenizers for these these models. A popular example of this is the BERT Japanese model:

https://huggingface.co/transformers/v4.11.3/_modules/transformers/models/bert_japanese/tokenization_bert_japanese.html

Without these dependencies, mlserver_huggingface/common.py errors when trying to load the tokenizer in the pipeline.

To reproduce, use any Japanese model. Here is an example.

jbauer2718 · 2023-12-08T19:10:16Z

If someone adds me as a contributor, I am happy to fix this issue and write a test for it.

sakoush · 2023-12-11T08:33:44Z

@jbauer2718 many thanks for reporting this issue and offering to fix it. You can create a PR based on changes from your fork and we can look at it.

jbauer2718 · 2023-12-12T19:37:17Z

Hey @sakoush , just added the above-linked PR for the team's review.

jbauer2718 linked a pull request Dec 12, 2023 that will close this issue

Add Japanese language dependencies #1511

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install `fugashi`, `unidic`, `unidic-lite`, and `ipadic` as dependencies to MLServer HuggingFace to support hosting Japanese language models #1506

Install `fugashi`, `unidic`, `unidic-lite`, and `ipadic` as dependencies to MLServer HuggingFace to support hosting Japanese language models #1506

jbauer2718 commented Dec 8, 2023

jbauer2718 commented Dec 8, 2023

sakoush commented Dec 11, 2023

jbauer2718 commented Dec 12, 2023

Install fugashi, unidic, unidic-lite, and ipadic as dependencies to MLServer HuggingFace to support hosting Japanese language models #1506

Install fugashi, unidic, unidic-lite, and ipadic as dependencies to MLServer HuggingFace to support hosting Japanese language models #1506

Comments

jbauer2718 commented Dec 8, 2023

jbauer2718 commented Dec 8, 2023

sakoush commented Dec 11, 2023

jbauer2718 commented Dec 12, 2023

Install `fugashi`, `unidic`, `unidic-lite`, and `ipadic` as dependencies to MLServer HuggingFace to support hosting Japanese language models #1506

Install `fugashi`, `unidic`, `unidic-lite`, and `ipadic` as dependencies to MLServer HuggingFace to support hosting Japanese language models #1506