|
| 1 | +# Jets Recipe |
| 2 | + |
| 3 | +In this recipe, we will show how to train [Jets](https://arxiv.org/abs/2203.16852) using Amphion's infrastructure. Jets is an end-to-end text-to-speech (E2E-TTS) model which jointly trains FastSpeech2 and HiFi-GAN. |
| 4 | + |
| 5 | +There are four stages in total: |
| 6 | + |
| 7 | +1. Data preparation |
| 8 | +2. Features extraction |
| 9 | +3. Training |
| 10 | +4. Inference |
| 11 | + |
| 12 | +> **NOTE:** You need to run every command of this recipe in the `Amphion` root path: |
| 13 | +> |
| 14 | +> ```bash |
| 15 | +> cd Amphion |
| 16 | +> ``` |
| 17 | +
|
| 18 | +## 1. Data Preparation |
| 19 | +
|
| 20 | +### Dataset Download |
| 21 | +
|
| 22 | +You can use LJSpeech to train TTS model. How to download dataset is detailed [here](../../datasets/README.md). |
| 23 | +
|
| 24 | +### Configuration |
| 25 | +
|
| 26 | +After downloading the dataset, you can set the dataset paths in `exp_config.json`. Note that you can change the `dataset` list to use your preferred datasets. |
| 27 | +
|
| 28 | +```json |
| 29 | + "dataset": [ |
| 30 | + "LJSpeech", |
| 31 | + ], |
| 32 | + "dataset_path": { |
| 33 | + // TODO: Fill in your dataset path |
| 34 | + "LJSpeech": "[LJSpeech dataset path]", |
| 35 | + }, |
| 36 | +``` |
| 37 | +
|
| 38 | +## 2. Features Extraction |
| 39 | + |
| 40 | +### Configuration |
| 41 | + |
| 42 | +Specify the `processed_dir` and the `log_dir` and for saving the processed data and the checkpoints in `exp_config.json`: |
| 43 | + |
| 44 | +```json |
| 45 | + // TODO: Fill in the output log path |
| 46 | + "log_dir": "ckpts/tts", |
| 47 | + "preprocess": { |
| 48 | + // TODO: Fill in the output data path |
| 49 | + "processed_dir": "data", |
| 50 | + ... |
| 51 | + }, |
| 52 | +``` |
| 53 | + |
| 54 | +### Run |
| 55 | + |
| 56 | +Run the `run.sh` as the preproces stage (set `--stage 1`): |
| 57 | + |
| 58 | +```bash |
| 59 | +sh egs/tts/Jets/run.sh --stage 1 |
| 60 | +``` |
| 61 | + |
| 62 | +## 3. Training |
| 63 | + |
| 64 | +### Configuration |
| 65 | + |
| 66 | +We provide the default hyparameters in the `exp_config.json`. They can work on single NVIDIA-24g GPU. You can adjust them based on your GPU machines. |
| 67 | + |
| 68 | +``` |
| 69 | +"train": { |
| 70 | + "batch_size": 16, |
| 71 | + } |
| 72 | +``` |
| 73 | + |
| 74 | +### Run |
| 75 | + |
| 76 | +Run the `run.sh` as the training stage (set `--stage 2`). Specify a experimental name to run the following command. The tensorboard logs and checkpoints will be saved in `ckpts/tts/[YourExptName]`. |
| 77 | + |
| 78 | +```bash |
| 79 | +sh egs/tts/Jets/run.sh --stage 2 --name [YourExptName] |
| 80 | +``` |
| 81 | + |
| 82 | +> **NOTE:** The `CUDA_VISIBLE_DEVICES` is set as `"0"` in default. We recommend you to only use one GPU for training. |
| 83 | +
|
| 84 | +## 4. Inference |
| 85 | + |
| 86 | +### Configuration |
| 87 | + |
| 88 | +For inference, you need to specify the following configurations when running `run.sh`: |
| 89 | + |
| 90 | +| Parameters | Description | Example | |
| 91 | +| ----------------------- | ------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- | |
| 92 | +| `--infer_expt_dir` | The experimental directory which contains `checkpoint` | `ckpts/tts/[YourExptName]` | |
| 93 | +| `--infer_output_dir` | The output directory to save inferred audios. | `ckpts/tts/[YourExptName]/result` | |
| 94 | +| `--infer_mode` | The inference mode, e.g., "`batch`". | `batch`" to generate a batch of speech at a time. | |
| 95 | +| `--infer_dataset` | The dataset used for inference. | For LJSpeech dataset, the inference dataset would be `LJSpeech`. | |
| 96 | +| `--infer_testing_set` | The subset of the inference dataset used for inference, e.g., test | For LJSpeech dataset, the testing set would be "`test`" split from LJSpeech at the feature extraction | |
| 97 | + |
| 98 | +### Run |
| 99 | + |
| 100 | +For example, if you want to generate speech of all testing set split from LJSpeech, just run: |
| 101 | + |
| 102 | +```bash |
| 103 | +sh egs/tts/Jets/run.sh --stage 3 \ |
| 104 | + --infer_expt_dir ckpts/tts/[YourExptName] \ |
| 105 | + --infer_output_dir ckpts/tts/[YourExptName]/result \ |
| 106 | + --infer_mode "batch" \ |
| 107 | + --infer_dataset "LJSpeech" \ |
| 108 | + --infer_testing_set "test" |
| 109 | +``` |
| 110 | + |
| 111 | +### ISSUES and Solutions |
| 112 | + |
| 113 | +``` |
| 114 | +NotImplementedError: Using RTX 3090 or 4000 series doesn't support faster communication broadband via P2P or IB. Please set `NCCL_P2P_DISABLE="1"` and `NCCL_IB_DISABLE="1" or use `accelerate launch` which will do this automatically. |
| 115 | +2024-02-24 10:57:49 | INFO | torch.distributed.distributed_c10d | Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. |
| 116 | +``` |
| 117 | + |
| 118 | +The error message is related to an incompatibility issue with the NVIDIA RTX 3090 or 4000 series GPUs when trying to use peer-to-peer (P2P) communication or InfiniBand (IB) for faster communication. This incompatibility arises within the PyTorch accelerate library, which facilitates distributed training and inference. |
| 119 | + |
| 120 | +To fix this issue, before running your script, you can set the environment variables in your terminal: |
| 121 | + |
| 122 | +``` |
| 123 | +export NCCL_P2P_DISABLE=1 |
| 124 | +export NCCL_IB_DISABLE=1 |
| 125 | +``` |
| 126 | + |
| 127 | +### Noted |
| 128 | + |
| 129 | +Extensive logging messages related to `torch._subclasses.fake_tensor` and `torch._dynamo.output_graph` may be observed during inference. Despite attempts to ignore these logs, no effective solution has been found. However, it does not impact the inference process. |
| 130 | + |
| 131 | +```bibtex |
| 132 | +@article{lim2022jets, |
| 133 | + title={JETS: Jointly training FastSpeech2 and HiFi-GAN for end to end text to speech}, |
| 134 | + author={Lim, Dan and Jung, Sunghee and Kim, Eesung}, |
| 135 | + journal={arXiv preprint arXiv:2203.16852}, |
| 136 | + year={2022} |
| 137 | +} |
| 138 | +``` |
0 commit comments