This repository contains Ruijie Tao's unofficial reimplementation of the standard ECAPA-TDNN.
This repository is modified based on voxceleb_trainer and TaoRuijie/ECAPA-TDNN.
In this project, we use CN-Celeb2 Dataset to train ECAPA-TDNN model.
Dataset | CN-Celeb2 |
---|---|
EER | 2.99% |
Accuracy | 79.60% |
Create the environment:
conda create -n ECAPA python=3.7
conda activate ECAPA
pip install -r requirements.txt
Dataset for training:
- CN-Celeb2 training set;
wget https://us.openslr.org/resources/82/cn-celeb2_v2.tar.gzaa
wget https://us.openslr.org/resources/82/cn-celeb2_v2.tar.gzab
wget https://us.openslr.org/resources/82/cn-celeb2_v2.tar.gzac
cat cn-celeb2_v2.tar.gzaa cn-celeb2_v2.tar.gzab cn-celeb2_v2.tar.gzac > cn-celeb2_v2.tar.gz
tar -xzvf cn-celeb2_v2.tar.gz
- MUSAN dataset;
wget https://us.openslr.org/resources/17/musan.tar.gz
tar -xzvf musan.tar.gz
- RIR dataset.
wget https://us.openslr.org/resources/28/rirs_noises.zip
unzip rirs_noises.zip
We provide data/train_files.txt and data/test_pairs.txt. You can use these lists to train and evaluate.
Or you can use GenerateList()
in tools.py
to generate the train_list.txt and test_pairs.txt randomly.
Then you can change the data path in the train.py
. Train ECAPA-TDNN model end-to-end by using:
python train.py --save_path exps/[your_exp_name]
Every test_step
epoches, system will be evaluated and print the EER.
The result will be saved in exps/[your_exp_name]/score.txt
. The model will saved in exps/[your_exp_name]/model
In my case, I trained 80 epoches in one 3090 GPU. Each epoch takes 90 mins.
Original ECAPA-TDNN paper
@inproceedings{desplanques2020ecapa,
title={{ECAPA-TDNN: Emphasized Channel Attention, propagation and aggregation in TDNN based speaker verification}},
author={Desplanques, Brecht and Thienpondt, Jenthe and Demuynck, Kris},
booktitle={Interspeech 2020},
pages={3830--3834},
year={2020}
}
Ruijie Tao's reimplement report
@article{das2021hlt,
title={HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE},
author={Das, Rohan Kumar and Tao, Ruijie and Li, Haizhou},
journal={arXiv preprint arXiv:2111.06671},
year={2021}
}
VoxCeleb_trainer paper
@inproceedings{chung2020in,
title={In defence of metric learning for speaker recognition},
author={Chung, Joon Son and Huh, Jaesung and Mun, Seongkyu and Lee, Minjae and Heo, Hee Soo and Choe, Soyeon and Ham, Chiheon and Jung, Sunghwan and Lee, Bong-Jin and Han, Icksang},
booktitle={Interspeech},
year={2020}
}
We study many useful projects in our codeing process, which includes:
Thanks for these authors to open source their code!