This work is the main assignment for the CentraleSupelec course Deep Learning led by Valentin Petit and Maria Vakalopolou. You can find the report HERE.
The audio demo for AUTOVC can be found here
- Python 3
- Numpy
- PyTorch >= v0.4.1
- TensorFlow >= v1.3 (only for tensorboard)
- librosa
- tqdm
- wavenet_vocoder
pip install wavenet_vocoder
for more information, please refer to https://github.com/r9y9/wavenet_vocoder
AUTOVC | Speaker Encoder | WaveNet Vocoder |
---|---|---|
link | link | link |
If you want to apply the style of speaker p228 to the file p225/p225_003.wav
, run :
python converter.py --source='p225/p225_003.wav' --target='p228'
We have included a small set of training audio files in the wav folder. However, the data is very small and is for code verification purpose only. Please prepare your own dataset for training.
1.Generate spectrogram data from the wav files: py .\make_spect.py --dataset='voxceleb'
2.Generate training metadata, including the GE2E speaker embedding (please use one-hot embeddings if you are not doing zero-shot conversion): py .\make_metadata.py --dataset='voxceleb'
3.Run the main training script: python main.py
or python main_circular.py
for CycleAutoVC. You can provide several parameters for the training in the bash command (learning rate, dataset, bottleneck dimension, ...). To display the list of parameters : python main(_circular).py -h