-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speaker_id during inference #5
Comments
Hey @Srija616, you can use the kwarg forward_params = {"speaker_id": XXXX}
text = "वहीं पंजाब सरकार ने सरबत खालसा के आयोजन के लिए, पंजाब के भठिंडा ज़िले में, तलवंडी साबो में, जगह देने से मना कर दिया है।"
speech = synthetiser(text, forward_params=forward_params) Did you finetune using the multi-speaker feature from the training code ? |
@ylacombe Yes we have two speakers for Hindi (male, female) and these are the two params we tweaked to enable multispeaker training. Just wondering if there are other params that need to be defined for multispeaker training. We are also facing two issues:
Adding the wandb charts for our Hindi and English runs: @ylacombe Was wondering if you have some thoughts on why these losses are going to infinity or Nan. It is possible that we are missing something trivial. I can share the generated samples over mail, if you'd like to here. |
Hey @Srija616, sorry for the late response! Nice project here! Can you send me your training config ? I do have some great finetuning results on a single speaker fine-tuning |
Hello @ylacombe , as I am currently finetuning the mms_tts_ell on a single speaker dataset would it be possible to assist me with the training configurations? My dataset consists ~4 hourds. |
Hi @ylacombe! I have a multi-speaker data using which I have trained the hindi checkpoint. I wanted to generate a particular speaker's voice during inference. Is there any way to do that using the inference code given in the README?
Here is how my current code looks:
`import scipy
from transformers import pipeline
import time
model_id = "./vits_finetuned_hindi"
synthesiser = pipeline("text-to-speech", model_id, device=0) # add device=0 if you want to use a GPU
speech = synthesiser("वहीं पंजाब सरकार ने सरबत खालसा के आयोजन के लिए, पंजाब के भठिंडा ज़िले में, तलवंडी साबो में, जगह देने से मना कर दिया है।")
scipy.io.wavfile.write("hindi_1.wav", rate=speech["sampling_rate"], data=speech["audio"][0])`
The text was updated successfully, but these errors were encountered: