-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Lower quality than the examples on the demo page #334
Comments
Hi, for me, it sounds like the generated speech is trying to speak in a whisper style. You may run multiple times to get the best results. Besides, this can be improved by fine-tuning with high-quality whisper speeches or adding more whisper speeches in the training stage. Answer for: #340 (comment)
Since I did not participate in the training of MaskGCT or demo generating, I did not know the details. Could @HeCheng0625 help with this? |
I have the same problem.I followed steps on page 'https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct', and started the gradio demo.But I found the audio generated is not as good as the official examples.I used the audio downloaded from 'https://maskgct.github.io/audios/icl_smaples/icl_10.wav' and use target text '顿时,气氛变得沉郁起来。乍看之下,一切的困扰仿佛都围绕在我身边。我皱着眉头,感受着那份压力,但我知道我不能放弃,不能认输。于是,我深吸一口气,心底的声音告诉我:“无论如何,都要冷静下来,重新开始。”', both in the first example of 'Zero-shot In-context Learning'. |
Describe the bug
After following the installation instructions (plus replacing phonemizer with https://github.com/justinjohn0306/phonemizer to make it work on Win 11), and using the same examples from the demo page, I was unable to replicate the quality of the examples. For example, the whispering voice always outputs something between a whisper and a normal voice. I tried both the inference script and the Gradio app, with the same result. Additionally, the duration calculator seems to be broken for Chinese - it makes the output twice as fast when set to auto.
This is the demo page result:
https://vocaroo.com/15JxVNPRScwD
This is mine:
https://vocaroo.com/13b14dZCkNau
How To Reproduce
Steps to reproduce the behavior:
Follow the instructions to install on Win 11 with special phonemizer and generate audio
Expected behavior
Quality should be the same as the examples
Screenshots
Environment Information
Additional context
Thank you very much for this project
The text was updated successfully, but these errors were encountered: