Can I train the Chinese model? #111
Replies: 22 comments 13 replies
-
Look at issue #41 to check the current progress. |
Beta Was this translation helpful? Give feedback.
-
You can, but with the current PL-BERT in English the quality won’t be as good it’s originally proposed to be. I’m working on multilingual PL-BERT now and it may take one or two months to finish. |
Beta Was this translation helpful? Give feedback.
-
See yl4579/StyleTTS#10 for more details. |
Beta Was this translation helpful? Give feedback.
-
@yl4579 I trained styletts2 successfully using Chinese data, it sound very good. As wavlm-base-plus only supporting English, I used a Chinese Hubert model as SLM. When I want to train a model both for Chinese and English, I can not find a pre-trained model sopport Chinese and English at the same time. About SLM,Do you have any suggestions ? |
Beta Was this translation helpful? Give feedback.
-
You can try whisper encoder that was trained with multiple languages. You can also try multilingual wav2vec2.0: https://huggingface.co/facebook/wav2vec2-large-xlsr-53 |
Beta Was this translation helpful? Give feedback.
-
Did you use the English PL-BERT or did you train PL-BERT with Chinese data? |
Beta Was this translation helpful? Give feedback.
-
train PL-BERT with Chinese data |
Beta Was this translation helpful? Give feedback.
-
What is your modeling unit? IPA or Pinyin? |
Beta Was this translation helpful? Give feedback.
-
@Moonmore The modeling unit is pinyin. test.zip is a synth sample. |
Beta Was this translation helpful? Give feedback.
-
Do you use the tone of pinyin when training Chinese PL-BERT? I believe StyleTTS uses F0 for Chinese tones. Can this PL-BERT with tones work with StyleTTS? |
Beta Was this translation helpful? Give feedback.
-
I trained Chinese PL-BERT without pinyin tones. But maybe PL-BERT with tones will also work normally, so you can try. |
Beta Was this translation helpful? Give feedback.
-
How many samples did you use to train Chinese PL-BERT? |
Beta Was this translation helpful? Give feedback.
-
@zhouyong64 I used about 84,000,000 text sentences to train the Chinese PL-BERT model. |
Beta Was this translation helpful? Give feedback.
-
Sounds really good. I would like to ask if the pinyin unit you mentioned cannot be disassembled into phones? How to align plbert and text input? |
Beta Was this translation helpful? Give feedback.
-
@Moonmore @zhouyong64 Sorry for the wrong information of yesterday, I tained PL-BERT with tones, and trained asr without tones.
|
Beta Was this translation helpful? Give feedback.
-
So can I understand that all text-related models are trained using the same phoneme unit, and the characteristics of each minimum pronunciation modeling unit are obtained. like(ni3 hao3 -> n i3 h ao3), The input length is 4, and the output length of the model is also 4. text encoder and the bert model. and how to construct the plbert label? |
Beta Was this translation helpful? Give feedback.
-
@Moonmore |
Beta Was this translation helpful? Give feedback.
-
@hermanseu Thank you for your reply. |
Beta Was this translation helpful? Give feedback.
-
How can the above be applied to StyleTTS2? Is there a complete repo already I could look up that is specialized on Mandarin using this G2PW? As a non-expert I am looking at the puzzle pieces but don't see the entire picture. Perhaps its too early in the development. |
Beta Was this translation helpful? Give feedback.
-
@hermanseu ,兄弟,请问一下你在训asr模块的时候,意思是分解音素时没有声调吗? 比如 |
Beta Was this translation helpful? Give feedback.
-
@hermanseu hi, I have a question about using the Whisper encoder as part of the Speech Language Model (SLM). The Whisper encoder requires preprocessing of the audio, which in the forward computation of the WavLMLoss, seems to necessitate detaching the gradients for y_rec. Will this not impact the training process, or have I misunderstood something? I look forward to your response. |
Beta Was this translation helpful? Give feedback.
-
I have successfully implemented Style-TTS in Chinese and english, but I'm encountering an issue with the speech rate. The shorter the sentence, the slower the speech, and the longer the sentence, the faster the speech. Does anyone else have the same problem? |
Beta Was this translation helpful? Give feedback.
-
I want to train the Chinese model. Do you support mixed input in Chinese and English?
Beta Was this translation helpful? Give feedback.
All reactions