-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: can MaskGCT process Chinese zero-shot TTS? #302
Comments
Hi, the current MaskGCT supports Chinese (in fact, we support six languages: en, zh, fr, de, kr, ja), can you give me more details about the error, for example, a screenshot. |
like this: Traceback (most recent call last):
File "/try/Amphion/test.py", line 120, in <module>
recovered_audio = maskgct_inference_pipeline.maskgct_inference(
File "/try/Amphion/models/tts/maskgct/maskgct_utils.py", line 261, in maskgct_inference
combine_semantic_code, _ = self.text2semantic(
File "/root/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/try/Amphion/models/tts/maskgct/maskgct_utils.py", line 175, in text2semantic
predict_semantic = self.t2s_model.reverse_diffusion(
File "/root/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/try/Amphion/models/tts/maskgct/maskgct_t2s.py", line 292, in reverse_diffusion
mask_embeds = self.diff_estimator(
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/try/Amphion/models/tts/maskgct/llama_nar.py", line 621, in forward
layer_outputs = decoder_layer(
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/try/Amphion/models/tts/maskgct/llama_nar.py", line 173, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniforge3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 378, in forward
attn_weights = attn_weights + causal_mask
RuntimeError: The size of tensor a (1008) must match the size of tensor b (1019) at non-singleton dimension 3 mostly in this case when using "zh" in language or target_language. sometimes it will disappear when the target_text set more shorter. |
when trying the inference for Chinese TTS, it will turn out the following error:
RuntimeError: The size of tensor a (1649) must match the size of tensor b (1758) at non-singleton dimension 3
I have chosen the language “zh”. so could you let me know:
thank you very much!
The text was updated successfully, but these errors were encountered: