| | | 中文文档
This repository is the official PyTorch implementation of our IJCAI-2022 paper, in which we propose SyntaSpeech for syntax-aware non-autoregressive Text-to-Speech.
Our SyntaSpeech is built on the basis of PortaSpeech (NeurIPS 2021) with three new features:
- We propose Syntactic Graph Builder (Sec. 3.1) and Syntactic Graph Encoder (Sec. 3.2), which is proved to be an effective unit to extract syntactic features to improve the prosody modeling and duration accuracy of TTS model.
- We introduce Multi-Length Adversarial Training (Sec. 3.3), which could replace the flow-based post-net in PortaSpeech, speeding up the inference time and improving the audio quality naturalness.
- We support three datasets: LJSpeech (single-speaker English dataset), Biaobei (single-speaker Chinese dataset) , and LibriTTS (multi-speaker English dataset).
conda create -n synta python=3.7
condac activate synta
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0
pip install -r requirements.txt
# install dgl for graph neural network, dgl-cu102 supports rtx2080, dgl-cu113 support rtx3090
pip install dgl-cu102 dglgo -f https://data.dgl.ai/wheels/repo.html
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install force alignment tools
Please follow the following steps to run this repo.
You can directly use our binarized datasets for LJSpeech and Biaobei. Download them and unzip them into the data/binary/
folder.
As for LibriTTS, you can download the raw datasets and process them with our data_gen
modules. Detailed instructions can be found in dosc/prepare_data.
We provide the pre-trained model of vocoders for three datasets. Specifically, Hifi-GAN for LJSpeech and Biaobei, ParallelWaveGAN for LibriTTS. Download and unzip them into the checkpoints/
folder.
Then you can train SyntaSpeech in the three datasets.
cd <the root_dir of your SyntaSpeech folder>
export PYTHONPATH=./
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset # training in LJSpeech
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name biaobei_synta --reset # training in Biaobei
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/libritts/synta.yaml --exp_name libritts_synta --reset # training in LibriTTS
tensorboard --logdir=checkpoints/lj_synta
tensorboard --logdir=checkpoints/biaobei_synta
tensorboard --logdir=checkpoints/libritts_synta
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/lj/synta.yaml --exp_name lj_synta --reset --infer # inference in LJSpeech
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/biaobei/synta.yaml --exp_name biaobei_synta --reset --infer # inference in Biaobei
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config egs/tts/libritts/synta.yaml --exp_name libritts_synta --reset ---infer # inference in LibriTTS
Audio samples in the paper can be found in our demo page.
We also provide HuggingFace Demo Page for LJSpeech. Try your interesting sentences there!
@article{ye2022syntaspeech,
title={SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech},
author={Ye, Zhenhui and Zhao, Zhou and Ren, Yi and Wu, Fei},
journal={arXiv preprint arXiv:2204.11792},
year={2022}
}
Our codes are based on the following repos: