Skip to content

pulumboom/avss_project

Repository files navigation

About

This repository contains an implementation of several architectures and training configs to solve BSS (blind speech separation) problem.

Our best weights for ConvTasNet available: here

Report on the completed work: here

Installation

Installation may depend on your task. The general steps are the following:

  1. (Optional) Create and activate new environment using venv (+pyenv).

    # create env
    ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env
    
    # alternatively, using default python version
    python3 -m venv project_env
    
    # activate env
    source project_env
  2. Install all required packages

    pip install -r requirements.txt
  3. Install pre-commit:

    pre-commit install

How To Use

To train a model, run the following command:

python3 train.py -cn=convtasnet HYDRA_CONFIG_ARGUMENTS

Where HYDRA_CONFIG_ARGUMENTS are optional arguments.

To run inference (evaluate the model or save predictions):

python3 inference.py -cn=inference.yaml

In the inference.yaml you can specify:

  • model - name of model config and model itself
  • datasets.test.dataset_path - path to the CustomDirDataset of the following format:
NameOfTheDirectoryWithUtterances
├── audio
│   ├── mix
│   │   ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│   │   ├── FirstSpeakerID2_SecondSpeakerID2.wav
│   │   .
│   │   .
│   │   .
│   │   └── FirstSpeakerIDn_SecondSpeakerIDn.wav
│   ├── s1 # ground truth for the speaker s1, may not be given
│   │   ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│   │   ├── FirstSpeakerID2_SecondSpeakerID2.wav
│   │   .
│   │   .
│   │   .
│   │   └── FirstSpeakerIDn_SecondSpeakerIDn.wav
│   └── s2 # ground truth for the speaker s2, may not be given
│       ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│       ├── FirstSpeakerID2_SecondSpeakerID2.wav
│       .
│       .
│       .
│       └── FirstSpeakerIDn_SecondSpeakerIDn.wav
└── mouths # contains video information for all speakers
    ├── FirstOrSecondSpeakerID1.npz # npz mouth-crop
    ├── FirstOrSecondSpeakerID2.npz
    .
    .
    .
    └── FirstOrSecondSpeakerIDn.npz
  • dataloader.batch_size - batch size
  • inferencer.save_path - path to directory where to save predictions (in subfolders s1 and s2 with [name].wav files). If not absolute path is provided, they will be stored in ./data/saved/[save_path] folder. By default, save_path=inference_result.
  • inferencer.from_pretrained - path to the file with model weights

To calculate metrics:

python3 metrics_eval.py -cn=metrics_eval.yaml

In the metrics_eval.yaml you can specify:

  • metrics - metrics config name (e.g. audio_metrics - "SI-SNRi", "SDRi") In defaults.metrics.inference._target can be PESQ, SDRi, SI-SNRi, STOI.
  • pred_path - path to the directory with predictions (in subfolders s1 and s2 with [name].wav files).
  • true_path - path to the directory with true sources (in subfolders s1 and s2 with [name].wav files).
  • show_all - if True, will show metrics for each file, otherwise will show mean value.

Credits

This repository is based on a heavily modified fork of pytorch-template and asr_project_template repositories.

License

License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages