Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Alignment #69

Open
neuronx1 opened this issue Jan 25, 2022 · 1 comment
Open

Bad Alignment #69

neuronx1 opened this issue Jan 25, 2022 · 1 comment

Comments

@neuronx1
Copy link

neuronx1 commented Jan 25, 2022

Hi @cschaefer26,

thanks for your great repository.

Unfortunatley I get really bad results, I think the reason is because of bad alignment.

I train the models on a german dataset, containing 900 samples, each between 5 and 30 seconds. The sampling rate is 22050 and they are 16 bit (mono). I ran your preprocessing step.
My tensorboard looks like this (as you can see there is no alignment).
grafik
grafik
What's the reason for this and how can I solve it?
I really appreciate every help!

Thanks in advance!

@cschaefer26
Copy link

Hi, could you show the attention score? The generated attention does not matter, what's used for duration extraction is the ground truth aligned one. 900 samples is quite few for generating attention with tacotron - what language are the samples in and are you using phonemes? For a small dataset like this one could try to pretrain a tacotron model on a different dataset until attention is built up and then continue training on the smaller dataset. Also, it could make sense to set the trim_long_silences=True and vad_max_silence_length=6 or so for shorter silent parts in the audios, which helps attention to build up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants