[Help]: MaskGCT's results were very strange #359

WhiteNightMo · 2024-11-21T03:17:56Z

Problem Overview

I modified this file models/tts/maskgct/maskgct_inference.py, changes are as follows:

    # inference
    prompt_wav_path = "./models/tts/maskgct/wav/5s.wav"
    save_path = "generated_audio7.wav"
    prompt_text = "想要交友吗？快来SOUL啊"
    target_text = "新用户真的可以享年化利率最低3.6%的优惠"
    # Specify the target duration (in seconds). If target_len = None, we use a simple rule to predict the target duration.
    target_len = None
    maskgct_inference_pipeline = MaskGCT_Inference_Pipeline(
        semantic_model,
        semantic_codec,
        codec_encoder,
        codec_decoder,
        t2s_model,
        s2a_model_1layer,
        s2a_model_full,
        semantic_mean,
        semantic_std,
        device,
    )

    recovered_audio = maskgct_inference_pipeline.maskgct_inference(
        prompt_wav_path, prompt_text, target_text, "zh", "zh", target_len=target_len
    )

    sf.write(save_path, recovered_audio, 24000)

Run command:

python -m models.tts.maskgct.maskgct_inference

The output did not meet my expectations.

My original file:
5s.zip

Output file:
generated_audio7.zip

The text was updated successfully, but these errors were encountered:

HeCheng0625 · 2024-11-21T05:15:13Z

It seems like the larget len is too long, you can specify the appropriate target length yourself.

WhiteNightMo · 2024-11-21T06:44:52Z

It seems like the larget len is too long, you can specify the appropriate target length yourself.

I tried to change target_len to 8, but the output audio was missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.
10s.zip

decajcd · 2024-11-22T01:49:20Z

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

WhiteNightMo · 2024-11-22T01:54:21Z

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

没呢，倒腾不出来

decajcd · 2024-11-22T01:58:14Z

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

没呢，倒腾不出来

调不出来，要么太快要么胡说八道

WhiteNightMo · 2024-11-22T01:59:27Z

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

没呢，倒腾不出来

调不出来，要么太快要么胡说八道

难顶，我是要么太慢要么胡说八道

decajcd · 2024-11-22T02:03:59Z

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

没呢，倒腾不出来

调不出来，要么太快要么胡说八道

难顶，我是要么太慢要么胡说八道

我还有背景音

digitalboy · 2024-11-24T05:00:44Z

有人解决了吗？ Anybody fixed this?

ruby11dog · 2024-11-25T04:03:19Z

your prompt audio and prompt text are not matched completely

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help]: MaskGCT's results were very strange #359

[Help]: MaskGCT's results were very strange #359

WhiteNightMo commented Nov 21, 2024 •

edited

Loading

HeCheng0625 commented Nov 21, 2024

WhiteNightMo commented Nov 21, 2024

decajcd commented Nov 22, 2024

WhiteNightMo commented Nov 22, 2024

decajcd commented Nov 22, 2024

WhiteNightMo commented Nov 22, 2024

decajcd commented Nov 22, 2024

digitalboy commented Nov 24, 2024

ruby11dog commented Nov 25, 2024

[Help]: MaskGCT's results were very strange #359

[Help]: MaskGCT's results were very strange #359

Comments

WhiteNightMo commented Nov 21, 2024 • edited Loading

Problem Overview

HeCheng0625 commented Nov 21, 2024

WhiteNightMo commented Nov 21, 2024

decajcd commented Nov 22, 2024

WhiteNightMo commented Nov 22, 2024

decajcd commented Nov 22, 2024

WhiteNightMo commented Nov 22, 2024

decajcd commented Nov 22, 2024

digitalboy commented Nov 24, 2024

ruby11dog commented Nov 25, 2024

WhiteNightMo commented Nov 21, 2024 •

edited

Loading