Subtle jitter in generated speech? #636

ragequitninja · 2024-10-31T12:39:29Z

ragequitninja
Oct 31, 2024

When training a new TTS model (using LJSpeech dataset) at high quality 22.5kHz I notice that some parts of sentences has a very slight jitter when doing inference. It is very subtle but in some ways annoying because it sounds unnatural. I am already at epoch 2500+

Understandably increasing model params will make the model larger and slower on CPU but before wasting more GPU resources, I wonder if anyone has already tried to train a larger model (e.g. increasing n_heads and/or n_layers)? If yes, has it helped overall quality of the speaking in terms of it being more natural sounding?

Or am I thinking about this the wrong way?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtle jitter in generated speech? #636

{{title}}

Replies: 0 comments

Select a reply

Subtle jitter in generated speech? #636

ragequitninja Oct 31, 2024

Replies: 0 comments

ragequitninja
Oct 31, 2024