Train/adapt to other languages #43

ng-4r · 2023-03-14T09:12:48Z

Hi!

I see that it is possible to use MUSS with other languages:

If you are going to add a new language to this project, in folder resources/models/language_models/wikipedia donwload the files of the target language from https://huggingface.co/edugp/kenlm/tree/main/wikipedia. These language models are used to filter high quality sentences in the paraphrase mining phase.

But what if the target language is not listed in the kenlm repository? I would like to try this system on Italian

louismartin · 2023-03-19T19:42:14Z

Hi there,

Sorry for the delay.

Kenlm is only used to clean the common crawl data if I remember correctly.
You can probably find other ways to clean the data using other heuristics, or not clean it at all (but get potentially worse performance).

Another solution is also to use the ChatGPT API which is very good at text simplification in multiple languages.

ng-4r · 2023-03-22T18:30:17Z

Hi!

thank you very much for your reply. So I can replace that part with other methods.

I know GPT capabilities, but I'm studying this topic and I want to make a comparison of different models, including GPT with zero-/few-shot learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train/adapt to other languages #43

Train/adapt to other languages #43

ng-4r commented Mar 14, 2023

louismartin commented Mar 19, 2023

ng-4r commented Mar 22, 2023

Train/adapt to other languages #43

Train/adapt to other languages #43

Comments

ng-4r commented Mar 14, 2023

louismartin commented Mar 19, 2023

ng-4r commented Mar 22, 2023