This repository contains an implementation of several architectures and training configs to solve BSS (blind speech separation) problem.
Our best weights for ConvTasNet available: here
Report on the completed work: here
Installation may depend on your task. The general steps are the following:
-
(Optional) Create and activate new environment using
venv
(+pyenv
).# create env ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env # alternatively, using default python version python3 -m venv project_env # activate env source project_env
-
Install all required packages
pip install -r requirements.txt
-
Install
pre-commit
:pre-commit install
To train a model, run the following command:
python3 train.py -cn=convtasnet HYDRA_CONFIG_ARGUMENTS
Where HYDRA_CONFIG_ARGUMENTS
are optional arguments.
To run inference (evaluate the model or save predictions):
python3 inference.py -cn=inference.yaml
In the inference.yaml
you can specify:
model
- name of model config and model itselfdatasets.test.dataset_path
- path to theCustomDirDataset
of the following format:
NameOfTheDirectoryWithUtterances
├── audio
│ ├── mix
│ │ ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│ │ ├── FirstSpeakerID2_SecondSpeakerID2.wav
│ │ .
│ │ .
│ │ .
│ │ └── FirstSpeakerIDn_SecondSpeakerIDn.wav
│ ├── s1 # ground truth for the speaker s1, may not be given
│ │ ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│ │ ├── FirstSpeakerID2_SecondSpeakerID2.wav
│ │ .
│ │ .
│ │ .
│ │ └── FirstSpeakerIDn_SecondSpeakerIDn.wav
│ └── s2 # ground truth for the speaker s2, may not be given
│ ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│ ├── FirstSpeakerID2_SecondSpeakerID2.wav
│ .
│ .
│ .
│ └── FirstSpeakerIDn_SecondSpeakerIDn.wav
└── mouths # contains video information for all speakers
├── FirstOrSecondSpeakerID1.npz # npz mouth-crop
├── FirstOrSecondSpeakerID2.npz
.
.
.
└── FirstOrSecondSpeakerIDn.npz
dataloader.batch_size
- batch sizeinferencer.save_path
- path to directory where to save predictions (in subfolderss1
ands2
with[name].wav
files). If not absolute path is provided, they will be stored in./data/saved/[save_path]
folder. By default,save_path=inference_result
.inferencer.from_pretrained
- path to the file with model weights
To calculate metrics:
python3 metrics_eval.py -cn=metrics_eval.yaml
In the metrics_eval.yaml
you can specify:
metrics
- metrics config name (e.g.audio_metrics
- "SI-SNRi", "SDRi") Indefaults.metrics.inference._target
can bePESQ, SDRi, SI-SNRi, STOI
.pred_path
- path to the directory with predictions (in subfolderss1
ands2
with[name].wav
files).true_path
- path to the directory with true sources (in subfolderss1
ands2
with[name].wav
files).show_all
- ifTrue
, will show metrics for each file, otherwise will show mean value.
This repository is based on a heavily modified fork of pytorch-template and asr_project_template repositories.