Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Train/adapt to other languages #43

Open
ng-4r opened this issue Mar 14, 2023 · 2 comments
Open

Train/adapt to other languages #43

ng-4r opened this issue Mar 14, 2023 · 2 comments

Comments

@ng-4r
Copy link

ng-4r commented Mar 14, 2023

Hi!

I see that it is possible to use MUSS with other languages:

If you are going to add a new language to this project, in folder resources/models/language_models/wikipedia donwload the files of the target language from https://huggingface.co/edugp/kenlm/tree/main/wikipedia. These language models are used to filter high quality sentences in the paraphrase mining phase.

But what if the target language is not listed in the kenlm repository? I would like to try this system on Italian

@louismartin
Copy link
Contributor

Hi there,

Sorry for the delay.

Kenlm is only used to clean the common crawl data if I remember correctly.
You can probably find other ways to clean the data using other heuristics, or not clean it at all (but get potentially worse performance).

Another solution is also to use the ChatGPT API which is very good at text simplification in multiple languages.

@ng-4r
Copy link
Author

ng-4r commented Mar 22, 2023

Hi!

thank you very much for your reply. So I can replace that part with other methods.

I know GPT capabilities, but I'm studying this topic and I want to make a comparison of different models, including GPT with zero-/few-shot learning

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants