How Much Audio Data Is Needed for Fine-Tuning Voice Tone? #838
Unanswered
JiangNanDream
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I used about 20 minutes of audio for fine-tuning and tested the model after training with merged weights at 200, 5500, and 10,000 steps. However, the actual output performance was worse than the original model and couldn't even generate natural speech.
I want the model to generate voice tone and pitch that align more closely with my real voice.
Beta Was this translation helpful? Give feedback.
All reactions