You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**2024/10/19**: We release **MaskGCT**, a fully non-autoregressive TTS model that eliminates the need for explicit alignment information between text and speech supervision. MaskGCT is trained on [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset) dataset and achieves SOTA zero-shot TTS perfermance.
23
23
24
+
## Issues
25
+
26
+
If you encounter any issue when using MaskGCT, feel free to open an issue in this repository. But please use **English** to describe, this will make it easier for keyword searching and more people to participate in the discussion.
27
+
24
28
## Quickstart
25
29
26
-
**Clone and install**
30
+
### Clone and Environment
31
+
32
+
This parts, follow the steps below to clone the repository and install the environment.
33
+
34
+
1. Clone the repository, you can choose (a) partial clone or (b) full clone.
35
+
2. Install the environment follow guide below.
36
+
37
+
#### 1. (a) Partial clone
38
+
39
+
Since the whole Amphion repository is large, you can use sparse-checkout to download only the needed code.
Before start installing, making sure you are under the `Amphion` directory. If not, use `cd` to enter.
71
+
72
+
Since we use `phonemizer` to convert text to phoneme, you need to install `espeak-ng` first. More details can be found [here](https://bootphon.github.io/phonemizer/install.html). Choose the correct installation command according to your operating system:
73
+
74
+
```bash
75
+
# For Debian-like distribution (e.g. Ubuntu, Mint, etc.)
76
+
sudo apt-get install espeak-ng
77
+
# For RedHat-like distribution (e.g. CentOS, Fedora, etc.)
78
+
sudo yum install espeak-ng
79
+
80
+
# For Windows
81
+
# Please visit https://github.com/espeak-ng/espeak-ng/releases to download .msi installer
82
+
```
83
+
84
+
It is recommended to use conda to configure the environment. You can use the following command to create and activate a new conda environment.
You can use the following code to generate speech from text and a prompt speech (the code is also provided in [inference.py](../../../models/tts/maskgct/maskgct_inference.py)).
69
145
146
+
Run it with `python -m models.tts.maskgct.maskgct_inference`.
prompt_text=" We do not break. We never give in. We never back down."
70
70
target_text="In this paper, we introduce MaskGCT, a fully non-autoregressive TTS model that eliminates the need for explicit alignment information between text and speech supervision."
71
71
# Specify the target duration (in seconds). If target_len = None, we use a simple rule to predict the target duration.
0 commit comments