Skip to content

Commit 908c85d

Browse files
committed
fix #252
1 parent 72112a6 commit 908c85d

File tree

2 files changed

+13
-35
lines changed

2 files changed

+13
-35
lines changed

egs/tts/VALLE_V2/README.md

Lines changed: 12 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -17,26 +17,29 @@ To ensure your transformers library can run the code, we recommend additionally
1717
pip install -U transformers==4.41.2
1818
```
1919

20-
<!-- espeak-ng is required to run G2p. To install it, you could refer to:
21-
https://github.com/espeak-ng/espeak-ng/blob/master/docs/guide.md
22-
23-
For Linux, it should be `sudo apt-get install espeak-ng`.
24-
For Windows, refer to the above link.
25-
If you do not have sudo privilege, you could build the library by following the last section of this readme. -->
26-
2720
## Inferencing pretrained VALL-E models
2821
### Download pretrained weights
29-
You need to download our pretrained weights from huggingface.
22+
You need to download our pretrained weights from huggingface. Our models are trained on the MLS dataset (45k hours of English, contains 10-20s speech).
3023

3124
Script to download AR and NAR model checkpoint:
3225
```bash
3326
huggingface-cli download amphion/valle valle_ar_mls_196000.bin valle_nar_mls_164000.bin --local-dir ckpts
3427
```
3528
Script to download codec model (SpeechTokenizer) checkpoint:
3629
```bash
37-
huggingface-cli download amphion/valle speechtokenizer_hubert_avg/SpeechTokenizer.pt speechtokenizer_hubert_avg/config.json --local-dir ckpts
30+
mkdir -p ckpts/speechtokenizer_hubert_avg && huggingface-cli download amphion/valle SpeechTokenizer.pt config.json --local-dir ckpts/speechtokenizer_hubert_avg
31+
```
32+
33+
If you cannot access huggingface, consider using the huggingface mirror to download:
34+
```bash
35+
HF_ENDPOINT=https://hf-mirror.com huggingface-cli download amphion/valle valle_ar_mls_196000.bin valle_nar_mls_164000.bin --local-dir ckpts
36+
```
37+
Script to download codec model (SpeechTokenizer) checkpoint:
38+
```bash
39+
mkdir -p ckpts/speechtokenizer_hubert_avg && HF_ENDPOINT=https://hf-mirror.com huggingface-cli download amphion/valle SpeechTokenizer.pt config.json --local-dir ckpts/speechtokenizer_hubert_avg
3840
```
3941

42+
4043
### Inference in IPython notebook
4144

4245
We provide our pretrained VALL-E model that is trained on 45k hours MLS dataset.
@@ -111,31 +114,6 @@ You should also select a reasonable batch size at the "batch_size" entry (curren
111114

112115
You can change other experiment settings in the `/egs/tts/VALLE_V2/exp_ar_libritts.json` such as the learning rate, optimizer and the dataset.
113116

114-
Here we choose `libritts` dataset we added and set `use_dynamic_dataset` false.
115-
116-
Config `use_dynamic_dataset` is used to solve the problem of inconsistent sequence length and improve gpu utilization, here we set it to false for simplicity.
117-
118-
```json
119-
"dataset": {
120-
"use_dynamic_batchsize": false,
121-
"name": "libritts"
122-
},
123-
```
124-
125-
We also recommend changing "num_hidden_layers" if your GPU memory is limited.
126-
127-
**Set smaller batch_size if you are out of memory😢😢**
128-
129-
I used batch_size=3 to successfully run on a single card, if you'r out of memory, try smaller.
130-
131-
```json
132-
"batch_size": 3,
133-
"max_tokens": 11000,
134-
"max_sentences": 64,
135-
"random_seed": 0
136-
```
137-
138-
139117
### Run the command to Train AR model
140118
(Make sure your current directory is at the Amphion root directory).
141119
Run:

egs/tts/VALLE_V2/demo.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@
7878
"# prepare inference data\n",
7979
"import librosa\n",
8080
"import torch\n",
81-
"wav, _ = librosa.load('./egs/tts/valle_v2/example.wav', sr=16000)\n",
81+
"wav, _ = librosa.load('./egs/tts/VALLE_V2/example.wav', sr=16000)\n",
8282
"wav = torch.tensor(wav, dtype=torch.float32)\n",
8383
"from IPython.display import Audio\n",
8484
"Audio(wav, rate = 16000)"

0 commit comments

Comments
 (0)