Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for new languages #111

Open
pyai88 opened this issue Jan 11, 2025 · 2 comments
Open

Support for new languages #111

pyai88 opened this issue Jan 11, 2025 · 2 comments

Comments

@pyai88
Copy link

pyai88 commented Jan 11, 2025

Hello, thank you for the great work. I have a couple of questions:

  1. Should this method work on a new language out-of-the-box? I'm seeing properties from the training language in my output, so I'm wondering if I've made a mistake.
  2. If it doesn't work out-of-the-box, would fine-tuning the pre-trained model be preferable to training from scratch?
  3. To help me estimate the training cost, could you provide guidance on how much data is typically needed for a new language and how long you trained your pre-trained model for?

Thank you in advance.

@Plachtaa
Copy link
Owner

Hi there,
If you find the output in unseen language being accented, you may try finetuning the current checkpoint with the language you desire.
I cannot give an estimation how many hours of data is required, the only thing I suggest is to use as much as you have

@EmreOzkose
Copy link

Hi, @

@pyai88 did you train model on a new language? How many hours of data did you train?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants