Skip to content

Latest commit

 

History

History
124 lines (81 loc) · 3.64 KB

README.md

File metadata and controls

124 lines (81 loc) · 3.64 KB

Automatic Speech Recognition (ASR) with PyTorch

AboutInstallationHow To UseCreditsLicense

About

This repository is a system for training the deepspeech2 model for an ASR task.

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda or venv (+pyenv).

    a. conda version:

    # create env
    conda create -n project_env python=PYTHON_VERSION
    
    # activate env
    conda activate project_env

    b. venv (+pyenv) version:

    # create env
    ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env
    
    # alternatively, using default python version
    python3 -m venv project_env
    
    # activate env
    source project_env
  2. Install all required packages

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install

How To Use

To train a model, run the following command:

python3 train.py -cn=deepspeech2

Where the model will learn 50 epochs on all datasets from leebspeech

To run inference (evaluate the model or save predictions):

Dowload model:

python3 download_model.py

For predicts on test-clean dataset:

python3 inference.py -cn=inference

For predicts on test-other dataset:

python3 inference.py -cn=inference_other

To calc cer/wer

python3 calc_wer_cer.py --dir_path dir

Where dir is path to your dir (example "/ASR/data/saved/predict/test")

About work

All my graphs with experiments on obtaining my solution can be found here (there are also separate conclusions for each of the augmentation)

I will keep my course of action in the same order as the graphs are arranged. First of all, I made baseline and one batch test (which will be better later), changed max lr to 1-e3, added log-scaling to spectrograms (at least better perception) and a self-written beam search (in the corresponding graph you can see how it works - goes through all possible options. As proof that my beam search is working correctly, I have displayed it in every training).

You can see that all the graphs give out a strange loss and bad metrics - the mistake was that I incorrectly calculated the length of the output sequences of probabilities.

At this moment, my learning model has the following hyperparameters:

  • start lr 1e-4
  • max lr 1e-3
  • num epochs 50 (200 iter)
  • batch size 10
  • train dataset: clean-100
  • beam size 10
  • model parametrs 28086844

I added 4 augmentations: LowPassFilter, HighPassFilter, Color Noise, BandPassFilter. The probability of each one being triggered is about 1/4. The result on clean data turned out to be slightly worse than without it, but I was ready to do it, because then my model would work a little better with "other" data and there would be no overfiting in the future.

My next and final step was to expand the amount of data (use all three datasets) and increase the batch size to 64.

Final model

Credits

This repository is based on a PyTorch Project Template.

License

License