Training 🍵 Matcha‐TTS with different dataset & languages

Hello! Thank you for your interest in 🍵 Matcha-TTS.

For training with a different dataset, most parameters would be the same as ljspeech.yaml. So you can essentially just copy that.Generally, I prefer resampling all my audio files to 22050 sampling rate instead of changing the audio parameters, as this solves the problem of finding a different vocoder. Then, You can generate mean and standard deviation for your dataset (for better standardisation) using these steps I have added in README.md. The major changes you would require: In YOUR_DATASET.yaml

name: NAME_YOUR_DATASET_ANYTHING_ARBITRARY
train_filelist_path: NEW_FILEPATHS
valid_filelist_path: NEW_FILEPATHS
data_statistics:
  mel_mean: <generate (better) or use lj_speech's value> 
  mel_std: <generate (better) or use lj_speech's value> 
cleaners: [?chinese_cleaner? ] # you will need to setup text normalisation rules as stated below

You can take a look at vctk.yaml and do something similar, use the defaults from ljspeech.yaml and override what you need for your specific dataset.

For phonemisation: (again, I have no experience in training with majority of other datasets but you can change the phonemizer language here, I think for mandarin it is zh with espeak backend and cmn with espeak-ng backend.)

https://github.com/shivammehta25/Matcha-TTS/blob/c8d0d60f87147fe340f4627b84588e812e5fbb00/matcha/text/cleaners.py#L28

Is a relatively small dataset (like a 20-min dataset) okay?

I haven't tested it but the Monotonic alignment is very useful for jointly learning to speak and train. Even with a small dataset (especially if it is a studio-recorded dataset of read speech). I feel It depends largely on the dataset quality and the possibility of aligning it monotonically. However, fine-tuning should mostly work better than training from scratch in such scenarios, so, you can first train on a larger dataset and then fine-tune it for your specific one. Something similar to what we did for OverFlow and it worked very well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training 🍵 Matcha‐TTS with different dataset & languages

Clone this wiki locally