Skip to content

aserquen/Real-Time-Voice-Cloning-Spanish

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Voice Cloning in Spanish

This repository is a fork of Real Time Voice Cloning (RTVC) with a synthesizer that works for the Spanish language. You can check my paper for a more detailed explanation. You can listen to the demo audios from all the Spanish models we trained (and a sample from RacoonML's trained model, too) here.

Papers implemented (by CorentinJ)

URL Designation Title Implementation source
1806.04558 SV2TTS Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis This repo
1802.08435 WaveRNN (vocoder) Efficient Neural Audio Synthesis fatchord/WaveRNN
1703.10135 Tacotron (synthesizer) Tacotron: Towards End-to-End Speech Synthesis fatchord/WaveRNN
1710.10467 GE2E (encoder) Generalized End-To-End Loss for Speaker Verification This repo

Dataset used

Mozilla's Common Voice Spanish dataset

Setup

1. Install Requirements

Python 3.6 or 3.7 is needed to run the toolbox.

  • Install PyTorch (>=1.1.0).
  • Install ffmpeg.
  • Run pip install -r requirements.txt to install the remaining necessary packages.

2. Download Pretrained Models

Download the latest here.

3. Try the demo CLI

python demo_cli.py

If all tests pass, you're good to go.

4. Launch the Toolbox

You can then try the toolbox: python demo_toolbox.py

About

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%