Skip to content

Latest commit

 

History

History
44 lines (25 loc) · 2.15 KB

README.md

File metadata and controls

44 lines (25 loc) · 2.15 KB

This work is the main assignment for the CentraleSupelec course Deep Learning led by Valentin Petit and Maria Vakalopolou. You can find the report HERE.

Audio Demo of AutoVC (from original authors)

The audio demo for AUTOVC can be found here

Dependencies

  • Python 3
  • Numpy
  • PyTorch >= v0.4.1
  • TensorFlow >= v1.3 (only for tensorboard)
  • librosa
  • tqdm
  • wavenet_vocoder pip install wavenet_vocoder for more information, please refer to https://github.com/r9y9/wavenet_vocoder

Pre-trained models

AUTOVC Speaker Encoder WaveNet Vocoder
link link link

0.Voice Conversion

If you want to apply the style of speaker p228 to the file p225/p225_003.wav, run :

python converter.py --source='p225/p225_003.wav' --target='p228'

2.Train model

We have included a small set of training audio files in the wav folder. However, the data is very small and is for code verification purpose only. Please prepare your own dataset for training.

1.Generate spectrogram data from the wav files: py .\make_spect.py --dataset='voxceleb'

2.Generate training metadata, including the GE2E speaker embedding (please use one-hot embeddings if you are not doing zero-shot conversion): py .\make_metadata.py --dataset='voxceleb'

3.Run the main training script: python main.py or python main_circular.py for CycleAutoVC. You can provide several parameters for the training in the bash command (learning rate, dataset, bottleneck dimension, ...). To display the list of parameters : python main(_circular).py -h