-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio Length Limitation and FlashAttention Warning in Parler TTS #126
Comments
I have the same issue. Audio length is truncated. |
the training default configuration in parler-tts is max 30sec, max text length 600 either you fine tune it with longer data, or send it in split if you're text > 30 sec or text length > 600 sec, just split it by (.,) |
I have already applied the suggested method of splitting the text if it exceeds 30 seconds or 600 characters by using punctuation (.,). However, when I combine the audio segments, there is an inconsistency in the voice tone, even when a specific voice prompt is set. |
I could get it to work with this PR: #110 The main idea is to generate once with a small prompt like
You then need to remove the encoded audio from each output to get consistent results without the prefix prompt |
do you guys experiencing not fluent(in strange way) when parler inferencing number and letter ? for example: "my id card is 5o613123jkl" |
Perhaps u can also experiment with the model.generate(min_new_tokens=1720, **generation_kwargs) |
I have been working with Parler TTS and encountered an issue where I am unable to generate audio longer than 20 seconds. Despite trying various methods, such as streaming and splitting the text into chunks, the audio output is still truncated to around 15-20 seconds.
Additionally, I received a warning stating that FlashAttention is not installed. Could this be the cause of the issue? I would appreciate any guidance or suggestions on how to handle longer input text effectively.
The text was updated successfully, but these errors were encountered: