Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not getting alignment properly #628

Open
hongseoi opened this issue May 16, 2024 · 3 comments
Open

Not getting alignment properly #628

hongseoi opened this issue May 16, 2024 · 3 comments

Comments

@hongseoi
Copy link

hongseoi commented May 16, 2024

Hi!
I trained tacotron2 more than 60000 steps but I cannot get alignment properly.
The alignment graph is as follows. Does anyone know the cause of this?

alignment
chart

I'm training using 100 samples of elderly voice data selected from the common voice dataset.

Training performance was not good in previous attempts, so I looked for other issues.

        use_saved_learning_rate=False,
        learning_rate=0.25*1e-3,
        weight_decay=1e-6,
        grad_clip_thresh=1.0,
        batch_size=16, #64
        mask_padding=True  # set model's padded outputs to padded values
    )

But sadly it didn't work.

@hongseoi
Copy link
Author

hongseoi commented May 31, 2024

Use Sox to remove silence in the audio file. It's not yet a complete success, but some improvements have been made.

image

import subprocess
import os
import glob

def remove_silence(input_file, output_file):
    try:
        # sox
        subprocess.run([
            'sox', input_file, output_file, 'silence', '2', '0.1', '1%', 'reverse', 'silence', '2', '0.1', '1%', 'reverse'
        ], check=True)
        print(f'Successfully removed silence from {input_file} and saved to {output_file}')
    except subprocess.CalledProcessError as e:
        print(f'Error occurred: {e}')

def process_folder(input_folder, output_folder):
    # mkdir output folder
    os.makedirs(output_folder, exist_ok=True)

    # process all of the wav files in the input_folder
    for wav_file in glob.glob(os.path.join(input_folder, '*.wav')):
        file_name = os.path.basename(wav_file)
        output_wav = os.path.join(output_folder, file_name)
        remove_silence(wav_file, output_wav)

input_folder = '~/data/train'
output_folder = '~/data/processed_train'

process_folder(input_folder, output_folder)

@hongseoi
Copy link
Author

hongseoi commented Jun 13, 2024

screenshot

It was a really simple problem

  • resampling your audio as 22050 (because the sample rate of data that used in the pretrained model is 22050 and the pretrained model is also adjusted to that)
  • check your audio bit depth and change max_wav_value in hparams.py as you change sample rate

@hongseoi
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant