My name is Yuancheng Wang (王远程). I'm a first-year Ph.D. student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), supervised by Professor Zhizheng Wu. before that, I received my B.S. degree at CUHK-Shenzhen. I also collaborate with Xu Tan (谭旭) from Microsoft Research Asia.
My research interest includes text-to-speech synthesis, text-to-audio generation, and unified audio representation and generation. I am one of the main contributors and leaders of the open-source Amphion toolkit.
I have developed NaturalSpeech 3, which is an advanced text-to-speech model with factorized speech representation and modeling.
- 2024.10: 🔥 We released code and checkpoints of MaskGCT (2.5k+ ⭐️ in one week), MaskGCT has been used in 趣丸千音.
- 2024.09: 🎉: Our paper, SD-Eval, got accepted by NeurIPS 2024.
- 2024.09: 🔥 We released MaskGCT, A new SOTA large-scale TTS system with masked generative models.
- 2024.08: 🎉 our papers, Amphion and Emilia got accepted by IEEE SLT 2024.
- 2024.07: 🔥 We released Emilia, an extensive, multilingual, and diverse speech dataset for large-scale speech generation with 101k hours of speech in six languages and features diverse speech with varied speaking styles.
- 2024.05: 🎉 Our paper Factorized Diffusion Models are Natural and Zero-shot Speech Synthesizers, aka NaturalSpeech 3, got accepted by ICML 2024 as an Oral presentation!
- 2024.03: 🎉 We are delighted to release NaturalSpeech 3, which is an advanced version of the NaturalSpeech series with speech factorization. And we release FACodec checkpoints and demo in HuggingFace Amphion Space.
- 2023.11: 🔥 We released Amphion v0.1 (⭐️ 7k+), which is an open-source toolkit for audio, music, and speech generation.
- 2023.09: 🎉 My first paper about audio generation and editing AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models got accepted by NeurIPS 2023!