Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: can MaskGCT process Chinese zero-shot TTS? #302

Open
hildazzz opened this issue Oct 29, 2024 · 2 comments
Open

[Feature]: can MaskGCT process Chinese zero-shot TTS? #302

hildazzz opened this issue Oct 29, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@hildazzz
Copy link

when trying the inference for Chinese TTS, it will turn out the following error:
RuntimeError: The size of tensor a (1649) must match the size of tensor b (1758) at non-singleton dimension 3

I have chosen the language “zh”. so could you let me know:

  1. does the current MaskGCT support Chinese?
  2. or what did I do wrong? how can I handle it??

thank you very much!

@hildazzz hildazzz added the enhancement New feature or request label Oct 29, 2024
@HeCheng0625
Copy link
Collaborator

Hi, the current MaskGCT supports Chinese (in fact, we support six languages: en, zh, fr, de, kr, ja), can you give me more details about the error, for example, a screenshot.

@hildazzz
Copy link
Author

hildazzz commented Oct 30, 2024

Hi, the current MaskGCT supports Chinese (in fact, we support six languages: en, zh, fr, de, kr, ja), can you give me more details about the error, for example, a screenshot.

like this:

Traceback (most recent call last):
  File "/try/Amphion/test.py", line 120, in <module>
    recovered_audio = maskgct_inference_pipeline.maskgct_inference(
  File "/try/Amphion/models/tts/maskgct/maskgct_utils.py", line 261, in maskgct_inference
    combine_semantic_code, _ = self.text2semantic(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/try/Amphion/models/tts/maskgct/maskgct_utils.py", line 175, in text2semantic
    predict_semantic = self.t2s_model.reverse_diffusion(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/try/Amphion/models/tts/maskgct/maskgct_t2s.py", line 292, in reverse_diffusion
    mask_embeds = self.diff_estimator(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/try/Amphion/models/tts/maskgct/llama_nar.py", line 621, in forward
    layer_outputs = decoder_layer(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/try/Amphion/models/tts/maskgct/llama_nar.py", line 173, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniforge3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 378, in forward
    attn_weights = attn_weights + causal_mask
RuntimeError: The size of tensor a (1008) must match the size of tensor b (1019) at non-singleton dimension 3

mostly in this case when using "zh" in language or target_language. sometimes it will disappear when the target_text set more shorter.
does the target text length has a setting or preference in this work?
thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants