-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Home
TTS is a deep learning based text2speech solution. It favors simplicity over complex and large models and yet, it aims to achieve state of the art results.
Currently, we propose two model architectures, plotted on Tacotron and Tacotron2. There are many improvements over the initial architectures espeically for the attention module.
Tacotron based model is smaller and targets faster training/inference whereas Tacotron2 based model is almost 3 times larger but achieves better results by using a neural vocoder (WaveRNN, WaveNet etc.). Be mindful to choose the right architecture serving your needs.
So far, based on our experiments, TTS is able to give on par or better performance compared to other open-sourced text2speech solutions. It also supports various languages (English, German, Chinese etc.), with very little change.
Tacotron model:
(thanks to @yweweler)
Tacotron2 model (from the paper):