This repository contains the official NNabla implementation of NVC-Net: End-to-End Adversarial Voice Conversion.
Abstract:
Hyperparameters: All hyper-parameters are defined in hparams.py.
Audio samples: Some audio samples can be found at the demo website http://nvcnet.github.io/
Install python >= 3.6
, then set up python dependencies from requirements.txt:
pip install -r ./requirements.txt
Note that this requirements.txt
doesn't contain nnabla-ext-cuda
.
If you have CUDA environment, we highly recommend to install nnabla-ext-cuda
and use GPU devices. See NNabla CUDA extension package installation guide.
Alternatively, one can build the docker image
bash scripts/docker_build.sh
We use the VCTK data set. Download the dataset, then run the following command to prepare trianing and validation sets.
python preprocess.py -i <path to `VCTK-Corpus/wav48/`> \
-o <path to save precomputed inputs> \
-s data/list_of_speakers.txt \
--make-test
- List of speakers used for training the model can be found here.
- List of speakers used for the traditional subjective evaluation can be found here.
- List of speakers used for the zero-shot evaluation can be found here.
- Gender information about speakers can be found here.
All hyper-parameters used for training are defined at hparams.py. These parameters can also be changed in the command line.
mpirun -n <number of GPUs> python main.py -c cudnn -d <list of GPUs e.g., 0,1,2,3> \
--output_path log/baseline/ \
--batch_size 8 \
...
The conversion can be performed as follows.
python inference.py -c cudnn -d <list of GPUs e.g., 0,1> \
-m <path to pretrained model> \
-i <input audio file> \
-r <reference audio file> \
-o <output audio file>
@article{nguyen2021nvc,
title={NVC-Net: End-to-End Adversarial Voice Conversion},
author={Nguyen, Bac and Cardinaux, Fabien},
journal={arXiv preprint arXiv:2106.00992},
year={2021}
}