Not getting alignment properly #628

hongseoi · 2024-05-16T02:05:06Z

Hi!
I trained tacotron2 more than 60000 steps but I cannot get alignment properly.
The alignment graph is as follows. Does anyone know the cause of this?

I'm training using 100 samples of elderly voice data selected from the common voice dataset.

Training performance was not good in previous attempts, so I looked for other issues.

if the batch size is reduced, the learning rate must also be reduced

        use_saved_learning_rate=False,
        learning_rate=0.25*1e-3,
        weight_decay=1e-6,
        grad_clip_thresh=1.0,
        batch_size=16, #64
        mask_padding=True  # set model's padded outputs to padded values
    )

padding_idx=0 should be added as a hyperparameter to the embedding

But sadly it didn't work.

The text was updated successfully, but these errors were encountered:

hongseoi · 2024-05-31T02:04:39Z

Use Sox to remove silence in the audio file. It's not yet a complete success, but some improvements have been made.

import subprocess
import os
import glob

def remove_silence(input_file, output_file):
    try:
        # sox
        subprocess.run([
            'sox', input_file, output_file, 'silence', '2', '0.1', '1%', 'reverse', 'silence', '2', '0.1', '1%', 'reverse'
        ], check=True)
        print(f'Successfully removed silence from {input_file} and saved to {output_file}')
    except subprocess.CalledProcessError as e:
        print(f'Error occurred: {e}')

def process_folder(input_folder, output_folder):
    # mkdir output folder
    os.makedirs(output_folder, exist_ok=True)

    # process all of the wav files in the input_folder
    for wav_file in glob.glob(os.path.join(input_folder, '*.wav')):
        file_name = os.path.basename(wav_file)
        output_wav = os.path.join(output_folder, file_name)
        remove_silence(wav_file, output_wav)

input_folder = '~/data/train'
output_folder = '~/data/processed_train'

process_folder(input_folder, output_folder)

hongseoi · 2024-06-13T10:41:04Z

It was a really simple problem

resampling your audio as 22050 (because the sample rate of data that used in the pretrained model is 22050 and the pretrained model is also adjusted to that)
check your audio bit depth and change max_wav_value in hparams.py as you change sample rate

hongseoi · 2024-06-13T10:45:50Z

https://www.semanticscholar.org/reader/57c38167e0fa7c045c7fa6d9783216c7d725f6ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not getting alignment properly #628

Not getting alignment properly #628

hongseoi commented May 16, 2024 •

edited

Loading

hongseoi commented May 31, 2024 •

edited

Loading

hongseoi commented Jun 13, 2024 •

edited

Loading

hongseoi commented Jun 13, 2024

Not getting alignment properly #628

Not getting alignment properly #628

Comments

hongseoi commented May 16, 2024 • edited Loading

hongseoi commented May 31, 2024 • edited Loading

hongseoi commented Jun 13, 2024 • edited Loading

hongseoi commented Jun 13, 2024

hongseoi commented May 16, 2024 •

edited

Loading

hongseoi commented May 31, 2024 •

edited

Loading

hongseoi commented Jun 13, 2024 •

edited

Loading