From 142af82d0dcb244c68700c37cf8008950fed7e0a Mon Sep 17 00:00:00 2001 From: Harry He <68176557+HarryHe11@users.noreply.github.com> Date: Thu, 22 Aug 2024 14:20:05 +0800 Subject: [PATCH 1/7] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index fccbae6e..220ddc0b 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ In addition to the specific generation tasks, Amphion includes several **vocoders** and **evaluation metrics**. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks. Moreover, Amphion is dedicated to advancing audio generation in real-world applications, such as building **large-scale datasets** for speech synthesis. ## πŸš€Β News +- **2024/08/22**: The **Emilia** dataset is now publicly avaiable! Try it now at [Opendatalab](https://opendatalab.com/Amphion/Emilia)! - **2024/07/01**: Amphion now releases **Emilia**, the first open-source multilingual in-the-wild dataset for speech generation with over 101k hours of speech data, and the **Emilia-Pipe**, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation! [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2407.05361) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/amphion/Emilia) [![demo](https://img.shields.io/badge/WebPage-Demo-red)](https://emilia-dataset.github.io/Emilia-Demo-Page/) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](preprocessors/Emilia/README.md) - **2024/06/17**: Amphion has a new release for its **VALL-E** model! It uses Llama as its underlying architecture and has better model performance, faster training speed, and more readable codes compared to our first version. [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](egs/tts/VALLE_V2/README.md) - **2024/03/12**: Amphion now support **NaturalSpeech3 FACodec** and release pretrained checkpoints. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2403.03100) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-model-yellow)](https://huggingface.co/amphion/naturalspeech3_facodec) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-demo-pink)](https://huggingface.co/spaces/amphion/naturalspeech3_facodec) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](models/codec/ns3_codec/README.md) From 1cbb4256483c96a5fcab920a77acb946c7406013 Mon Sep 17 00:00:00 2001 From: Harry He <68176557+HarryHe11@users.noreply.github.com> Date: Thu, 22 Aug 2024 14:27:58 +0800 Subject: [PATCH 2/7] Update README.md --- preprocessors/Emilia/README.md | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/preprocessors/Emilia/README.md b/preprocessors/Emilia/README.md index 1b5dd523..f99ddd15 100644 --- a/preprocessors/Emilia/README.md +++ b/preprocessors/Emilia/README.md @@ -6,12 +6,13 @@ This is the official repository πŸ‘‘ for the **Emilia** dataset and the source code for **Emilia-Pipe** speech data preprocessing pipeline. ## News πŸ”₯ +- **2024/08/22**: The **Emilia** dataset is now publicly avaiable! Explore the most extensive and diverse speech generation dataset now at [Opendatalab](https://opendatalab.com/Amphion/Emilia)! πŸ‘‘ - **2024/07/08**: Our preprint [paper](https://arxiv.org/abs/2407.05361) is now available! πŸ”₯πŸ”₯πŸ”₯ - **2024/07/03**: We welcome everyone to check our [homepage](https://emilia-dataset.github.io/Emilia-Demo-Page/) for our brief introduction for Emilia dataset and our demos! - **2024/07/01**: We release of Emilia and Emilia-Pipe! We welcome everyone to explore it! πŸŽ‰πŸŽ‰πŸŽ‰ ## About ⭐️ -🎀 **Emilia** is a comprehensive, multilingual dataset with the following features: +The **Emilia** is a comprehensive, multilingual dataset with the following features: - containing over *101k* hours of speech data; - covering six different languages: *English (En), Chinese (Zh), German (De), French (Fr), Japanese (Ja), and Korean (Ko)*; - containing diverse speech data with *various speaking styles*; @@ -20,15 +21,26 @@ Detailed description for the dataset could be found in our [paper](https://arxiv πŸ› οΈ **Emilia-Pipe** is the first open-source preprocessing pipeline designed to transform raw, in-the-wild speech data into high-quality training data with annotations for speech generation. This pipeline can process one hour of raw audio into model-ready data in just a few minutes, requiring only the raw speech data. -*To use the Emilia dataset, you can download the raw audio files from our provided source URL list on [HuggingFace](https://huggingface.co/datasets/amphion/Emilia) and use our open-source [Emilia-Pipe](.) preprocessing pipeline to preprocess the raw data and rebuild the dataset.* +## Dataset Usage 🎀 +The Emilia dataset is now publicly available at [OpenDataLab](https://opendatalab.com/Amphion/Emilia)! -*Please note that Emilia doesn't own the copyright of the audios; the copyright remains with the original owners of the video or audio. Additionally, users can easily use Emilia-Pipe to preprocess their own raw speech data for custom needs.* +To download the Emilia dataset, please follow these steps: -By open-sourcing the Emilia-Pipe code, we aim to enable the speech community to collaborate on large-scale speech generation research. +1. Fill out the [Application Form](https://speechteam.feishu.cn/share/base/form/shrcn7z8VODrVkOelbx0YUeJDOh) to receive the PASSWORD. +2. Visit the [OpenXLab dataset](https://openxlab.org.cn/datasets/Amphion/Emilia/tree/main/raw) and click the "Apply Download" button. +3. Enter the PASSWORD you received in step 1 into the "Detailed Purpose Description" input box and submit your download request. Applications will only be approved if the correct PASSWORD is provided. Once approved, you can enjoy using the dataset! -This following README will introduce the installation and usage guide of the Emilia-Pipe. -## Pipeline Overview πŸ‘€ +The Emilia dataset will be structured as follows: + +- **Speech Data**: High-quality audio recordings in .mp3 format. +- **Transcriptions**: Corresponding text transcriptions for each audio file. + +*Please note that Emilia does not own the copyright to the audio files; the copyright remains with the original owners of the videos or audio. Users are permitted to use this dataset only for non-commercial purposes under the CC BY-NC-4.0 license.* + + +## Emilia-Pipe Overview πŸ‘€ +If you wish to re-build Emilia, you may download the raw audio files from the [provided URL list](https://huggingface.co/datasets/amphion/Emilia) and use our open-source [Emilia-Pipe](https://github.com/open-mmlab/Amphion/tree/main/preprocessors/Emilia) preprocessing pipeline to preprocess the raw data. Additionally, users can easily use Emilia-Pipe to preprocess their own raw speech data for custom needs. By open-sourcing the Emilia-Pipe code, we aim to enable the speech community to collaborate on large-scale speech generation research. The Emilia-Pipe includes the following major steps: @@ -152,7 +164,7 @@ If you use the Emilia dataset or the Emilia-Pipe pipeline, please cite the follo title={Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation}, author={He, Haorui and Shang, Zengqiang and Wang, Chaoren and Li, Xuyuan and Gu, Yicheng and Hua, Hua and Liu, Liwei and Yang, Chen and Li, Jiaqi and Shi, Peiyang and Wang, Yuancheng and Chen, Kai and Zhang, Pengyuan and Wu, Zhizheng}, journal={arXiv}, - volume={abs/2407.05361} + volume={abs/2407.05361}, year={2024} } ``` @@ -161,7 +173,7 @@ If you use the Emilia dataset or the Emilia-Pipe pipeline, please cite the follo title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit}, author={Zhang, Xueyao and Xue, Liumeng and Gu, Yicheng and Wang, Yuancheng and He, Haorui and Wang, Chaoren and Chen, Xi and Fang, Zihao and Chen, Haopeng and Zhang, Junan and Tang, Tze Ying and Zou, Lexiao and Wang, Mingxuan and Han, Jun and Chen, Kai and Li, Haizhou and Wu, Zhizheng}, journal={arXiv}, - volume={abs/2312.09911} + volume={abs/2312.09911}, year={2024}, } ``` From e3426e5138483a3dae3fc906340395874dc7316a Mon Sep 17 00:00:00 2001 From: Harry He <68176557+HarryHe11@users.noreply.github.com> Date: Thu, 22 Aug 2024 14:30:05 +0800 Subject: [PATCH 3/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 220ddc0b..0805e61b 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ In addition to the specific generation tasks, Amphion includes several **vocoders** and **evaluation metrics**. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks. Moreover, Amphion is dedicated to advancing audio generation in real-world applications, such as building **large-scale datasets** for speech synthesis. ## πŸš€Β News -- **2024/08/22**: The **Emilia** dataset is now publicly avaiable! Try it now at [Opendatalab](https://opendatalab.com/Amphion/Emilia)! +- **2024/08/22**: The **Emilia** dataset is now publicly avaiable! Explore the most extensive and diverse speech generation dataset now at [Opendatalab](https://opendatalab.com/Amphion/Emilia)! πŸ‘‘ - **2024/07/01**: Amphion now releases **Emilia**, the first open-source multilingual in-the-wild dataset for speech generation with over 101k hours of speech data, and the **Emilia-Pipe**, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation! [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2407.05361) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/amphion/Emilia) [![demo](https://img.shields.io/badge/WebPage-Demo-red)](https://emilia-dataset.github.io/Emilia-Demo-Page/) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](preprocessors/Emilia/README.md) - **2024/06/17**: Amphion has a new release for its **VALL-E** model! It uses Llama as its underlying architecture and has better model performance, faster training speed, and more readable codes compared to our first version. [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](egs/tts/VALLE_V2/README.md) - **2024/03/12**: Amphion now support **NaturalSpeech3 FACodec** and release pretrained checkpoints. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2403.03100) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-model-yellow)](https://huggingface.co/amphion/naturalspeech3_facodec) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-demo-pink)](https://huggingface.co/spaces/amphion/naturalspeech3_facodec) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](models/codec/ns3_codec/README.md) From 3efd8c0a127a8d0c4a0b6bf9387d8af6fbf25f0b Mon Sep 17 00:00:00 2001 From: Harry He <68176557+HarryHe11@users.noreply.github.com> Date: Thu, 22 Aug 2024 15:09:49 +0800 Subject: [PATCH 4/7] Update README.md --- preprocessors/Emilia/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/preprocessors/Emilia/README.md b/preprocessors/Emilia/README.md index f99ddd15..bd84e5e7 100644 --- a/preprocessors/Emilia/README.md +++ b/preprocessors/Emilia/README.md @@ -6,7 +6,7 @@ This is the official repository πŸ‘‘ for the **Emilia** dataset and the source code for **Emilia-Pipe** speech data preprocessing pipeline. ## News πŸ”₯ -- **2024/08/22**: The **Emilia** dataset is now publicly avaiable! Explore the most extensive and diverse speech generation dataset now at [Opendatalab](https://opendatalab.com/Amphion/Emilia)! πŸ‘‘ +- **2024/08/22**: The **Emilia** dataset is now publicly avaiable! Explore the most extensive and diverse speech generation dataset now at [OpenXLab](https://openxlab.org.cn/datasets/Amphion/Emilia)! πŸ‘‘ - **2024/07/08**: Our preprint [paper](https://arxiv.org/abs/2407.05361) is now available! πŸ”₯πŸ”₯πŸ”₯ - **2024/07/03**: We welcome everyone to check our [homepage](https://emilia-dataset.github.io/Emilia-Demo-Page/) for our brief introduction for Emilia dataset and our demos! - **2024/07/01**: We release of Emilia and Emilia-Pipe! We welcome everyone to explore it! πŸŽ‰πŸŽ‰πŸŽ‰ From d6ee7175ad08c10f7125105df47d849fbd90bd7c Mon Sep 17 00:00:00 2001 From: Harry He <68176557+HarryHe11@users.noreply.github.com> Date: Thu, 22 Aug 2024 15:10:54 +0800 Subject: [PATCH 5/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0805e61b..6782cec5 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ In addition to the specific generation tasks, Amphion includes several **vocoders** and **evaluation metrics**. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks. Moreover, Amphion is dedicated to advancing audio generation in real-world applications, such as building **large-scale datasets** for speech synthesis. ## πŸš€Β News -- **2024/08/22**: The **Emilia** dataset is now publicly avaiable! Explore the most extensive and diverse speech generation dataset now at [Opendatalab](https://opendatalab.com/Amphion/Emilia)! πŸ‘‘ +- **2024/08/22**: The **Emilia** dataset is now publicly available! Explore the most extensive and diverse speech generation dataset now at [OpenXLab](https://openxlab.org.cn/datasets/Amphion/Emilia)! πŸ‘‘ - **2024/07/01**: Amphion now releases **Emilia**, the first open-source multilingual in-the-wild dataset for speech generation with over 101k hours of speech data, and the **Emilia-Pipe**, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation! [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2407.05361) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/amphion/Emilia) [![demo](https://img.shields.io/badge/WebPage-Demo-red)](https://emilia-dataset.github.io/Emilia-Demo-Page/) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](preprocessors/Emilia/README.md) - **2024/06/17**: Amphion has a new release for its **VALL-E** model! It uses Llama as its underlying architecture and has better model performance, faster training speed, and more readable codes compared to our first version. [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](egs/tts/VALLE_V2/README.md) - **2024/03/12**: Amphion now support **NaturalSpeech3 FACodec** and release pretrained checkpoints. [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2403.03100) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-model-yellow)](https://huggingface.co/amphion/naturalspeech3_facodec) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-demo-pink)](https://huggingface.co/spaces/amphion/naturalspeech3_facodec) [![readme](https://img.shields.io/badge/README-Key%20Features-blue)](models/codec/ns3_codec/README.md) From 7aa39a2447952990ffd09a421133d4df358be47e Mon Sep 17 00:00:00 2001 From: Harry He <68176557+HarryHe11@users.noreply.github.com> Date: Thu, 22 Aug 2024 15:11:10 +0800 Subject: [PATCH 6/7] Update README.md --- preprocessors/Emilia/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/preprocessors/Emilia/README.md b/preprocessors/Emilia/README.md index bd84e5e7..d13853ff 100644 --- a/preprocessors/Emilia/README.md +++ b/preprocessors/Emilia/README.md @@ -6,7 +6,7 @@ This is the official repository πŸ‘‘ for the **Emilia** dataset and the source code for **Emilia-Pipe** speech data preprocessing pipeline. ## News πŸ”₯ -- **2024/08/22**: The **Emilia** dataset is now publicly avaiable! Explore the most extensive and diverse speech generation dataset now at [OpenXLab](https://openxlab.org.cn/datasets/Amphion/Emilia)! πŸ‘‘ +- **2024/08/22**: The **Emilia** dataset is now publicly available! Explore the most extensive and diverse speech generation dataset now at [OpenXLab](https://openxlab.org.cn/datasets/Amphion/Emilia)! πŸ‘‘ - **2024/07/08**: Our preprint [paper](https://arxiv.org/abs/2407.05361) is now available! πŸ”₯πŸ”₯πŸ”₯ - **2024/07/03**: We welcome everyone to check our [homepage](https://emilia-dataset.github.io/Emilia-Demo-Page/) for our brief introduction for Emilia dataset and our demos! - **2024/07/01**: We release of Emilia and Emilia-Pipe! We welcome everyone to explore it! πŸŽ‰πŸŽ‰πŸŽ‰ From e8b2bde877e54f2f53a8ca8fbb8008954094febc Mon Sep 17 00:00:00 2001 From: Yuan_Tuo Date: Thu, 22 Aug 2024 15:20:50 +0800 Subject: [PATCH 7/7] Update README.md --- preprocessors/Emilia/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/preprocessors/Emilia/README.md b/preprocessors/Emilia/README.md index d13853ff..6424cfc0 100644 --- a/preprocessors/Emilia/README.md +++ b/preprocessors/Emilia/README.md @@ -1,10 +1,12 @@ -## Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation +# Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](https://arxiv.org/abs/2407.05361) [![hf](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Dataset-yellow)](https://huggingface.co/datasets/amphion/Emilia) [![demo](https://img.shields.io/badge/WebPage-Demo-red)](https://emilia-dataset.github.io/Emilia-Demo-Page/) This is the official repository πŸ‘‘ for the **Emilia** dataset and the source code for **Emilia-Pipe** speech data preprocessing pipeline. +
+ ## News πŸ”₯ - **2024/08/22**: The **Emilia** dataset is now publicly available! Explore the most extensive and diverse speech generation dataset now at [OpenXLab](https://openxlab.org.cn/datasets/Amphion/Emilia)! πŸ‘‘ - **2024/07/08**: Our preprint [paper](https://arxiv.org/abs/2407.05361) is now available! πŸ”₯πŸ”₯πŸ”₯