[Feature]: can MaskGCT process Chinese zero-shot TTS? #302

hildazzz · 2024-10-29T11:48:34Z

when trying the inference for Chinese TTS, it will turn out the following error:
RuntimeError: The size of tensor a (1649) must match the size of tensor b (1758) at non-singleton dimension 3

I have chosen the language “zh”. so could you let me know:

does the current MaskGCT support Chinese?
or what did I do wrong? how can I handle it??

thank you very much!

The text was updated successfully, but these errors were encountered:

HeCheng0625 · 2024-10-29T15:24:11Z

Hi, the current MaskGCT supports Chinese (in fact, we support six languages: en, zh, fr, de, kr, ja), can you give me more details about the error, for example, a screenshot.

hildazzz · 2024-10-30T09:01:01Z

Hi, the current MaskGCT supports Chinese (in fact, we support six languages: en, zh, fr, de, kr, ja), can you give me more details about the error, for example, a screenshot.

like this:

Traceback (most recent call last):
  File "/try/Amphion/test.py", line 120, in <module>
    recovered_audio = maskgct_inference_pipeline.maskgct_inference(
  File "/try/Amphion/models/tts/maskgct/maskgct_utils.py", line 261, in maskgct_inference
    combine_semantic_code, _ = self.text2semantic(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/try/Amphion/models/tts/maskgct/maskgct_utils.py", line 175, in text2semantic
    predict_semantic = self.t2s_model.reverse_diffusion(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/try/Amphion/models/tts/maskgct/maskgct_t2s.py", line 292, in reverse_diffusion
    mask_embeds = self.diff_estimator(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/try/Amphion/models/tts/maskgct/llama_nar.py", line 621, in forward
    layer_outputs = decoder_layer(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/try/Amphion/models/tts/maskgct/llama_nar.py", line 173, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniforge3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 378, in forward
    attn_weights = attn_weights + causal_mask
RuntimeError: The size of tensor a (1008) must match the size of tensor b (1019) at non-singleton dimension 3

mostly in this case when using "zh" in language or target_language. sometimes it will disappear when the target_text set more shorter.
does the target text length has a setting or preference in this work?
thanks for your time!

hildazzz added the enhancement New feature or request label Oct 29, 2024

yuantuo666 mentioned this issue Oct 31, 2024

Update MaskGCT env setup and notebook #316

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: can MaskGCT process Chinese zero-shot TTS? #302

[Feature]: can MaskGCT process Chinese zero-shot TTS? #302

hildazzz commented Oct 29, 2024

HeCheng0625 commented Oct 29, 2024

hildazzz commented Oct 30, 2024 •

edited

Loading

[Feature]: can MaskGCT process Chinese zero-shot TTS? #302

[Feature]: can MaskGCT process Chinese zero-shot TTS? #302

Comments

hildazzz commented Oct 29, 2024

HeCheng0625 commented Oct 29, 2024

hildazzz commented Oct 30, 2024 • edited Loading

hildazzz commented Oct 30, 2024 •

edited

Loading