Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help]: It takes too long to load the model during inference #322

Open
zziC7 opened this issue Nov 1, 2024 · 2 comments
Open

[Help]: It takes too long to load the model during inference #322

zziC7 opened this issue Nov 1, 2024 · 2 comments

Comments

@zziC7
Copy link

zziC7 commented Nov 1, 2024

Problem Overview

When I run maskgct_inference.py, I found it takes too long to load the model during inference.

Steps Taken

  1. I record how long each step takes.

a. build stage

    start_time = time.time()
    
    # 1. build semantic model (w2v-bert-2.0)
    semantic_model, semantic_mean, semantic_std = build_semantic_model(device)
    # 2. build semantic codec
    semantic_codec = build_semantic_codec(cfg.model.semantic_codec, device)
    # 3. build acoustic codec
    codec_encoder, codec_decoder = build_acoustic_codec(
        cfg.model.acoustic_codec, device
    )
    # 4. build t2s model
    t2s_model = build_t2s_model(cfg.model.t2s_model, device)
    # 5. build s2a model
    s2a_model_1layer = build_s2a_model(cfg.model.s2a_model.s2a_1layer, device)
    s2a_model_full = build_s2a_model(cfg.model.s2a_model.s2a_full, device)
    
    end_time = time.time()
    build_time = end_time - start_time
    print(f"build_time: {build_time} seconds")

b. download stage

    start_time = time.time()
    # download checkpoint
    # download semantic codec ckpt
    semantic_code_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="semantic_codec/model.safetensors"
    )
    # download acoustic codec ckpt
    codec_encoder_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="acoustic_codec/model.safetensors"
    )
    codec_decoder_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="acoustic_codec/model_1.safetensors"
    )
    # download t2s model ckpt
    t2s_model_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="t2s_model/model.safetensors"
    )
    # download s2a model ckpt
    s2a_1layer_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="s2a_model/s2a_model_1layer/model.safetensors"
    )
    s2a_full_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="s2a_model/s2a_model_full/model.safetensors"
    )
    end_time = time.time()
    download_time = end_time - start_time
    print(f"download_time: {download_time} seconds")

c. load stage

    start_time = time.time()
    # load semantic codec
    safetensors.torch.load_model(semantic_codec, semantic_code_ckpt)
    # load acoustic codec
    safetensors.torch.load_model(codec_encoder, codec_encoder_ckpt)
    safetensors.torch.load_model(codec_decoder, codec_decoder_ckpt)
    # load t2s model
    safetensors.torch.load_model(t2s_model, t2s_model_ckpt)
    # load s2a model
    safetensors.torch.load_model(s2a_model_1layer, s2a_1layer_ckpt)
    safetensors.torch.load_model(s2a_model_full, s2a_full_ckpt)
    end_time = time.time()
    load_time = end_time - start_time
    print(f"load_time: {load_time} seconds")

d. inference stage

            start_time = time.time()
            recovered_audio = maskgct_inference_pipeline.maskgct_inference(
                prompt_wav_path, prompt_text, target_text, "zh", "zh", target_len=10
            )
            end_time = time.time()
            infer_time = end_time - start_time
            print(f"Inference time for line {line_num}: {infer_time} seconds")

Then I found:

build_time: 202.2886848449707 seconds
download_time: 60.34766387939453 seconds
load_time: 32.2975959777832 seconds
Inference time for line 2: 14.074496269226074 seconds

Expected Outcome

Does this mean that if I want to inference a piece of audio, then I have to wait a long time for the model to load?
Or is there something wrong with my Settings?

@JohnHerry
Copy link

Problem Overview

When I run maskgct_inference.py, I found it takes too long to load the model during inference.

Steps Taken

  1. I record how long each step takes.

a. build stage

    start_time = time.time()
    
    # 1. build semantic model (w2v-bert-2.0)
    semantic_model, semantic_mean, semantic_std = build_semantic_model(device)
    # 2. build semantic codec
    semantic_codec = build_semantic_codec(cfg.model.semantic_codec, device)
    # 3. build acoustic codec
    codec_encoder, codec_decoder = build_acoustic_codec(
        cfg.model.acoustic_codec, device
    )
    # 4. build t2s model
    t2s_model = build_t2s_model(cfg.model.t2s_model, device)
    # 5. build s2a model
    s2a_model_1layer = build_s2a_model(cfg.model.s2a_model.s2a_1layer, device)
    s2a_model_full = build_s2a_model(cfg.model.s2a_model.s2a_full, device)
    
    end_time = time.time()
    build_time = end_time - start_time
    print(f"build_time: {build_time} seconds")

b. download stage

    start_time = time.time()
    # download checkpoint
    # download semantic codec ckpt
    semantic_code_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="semantic_codec/model.safetensors"
    )
    # download acoustic codec ckpt
    codec_encoder_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="acoustic_codec/model.safetensors"
    )
    codec_decoder_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="acoustic_codec/model_1.safetensors"
    )
    # download t2s model ckpt
    t2s_model_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="t2s_model/model.safetensors"
    )
    # download s2a model ckpt
    s2a_1layer_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="s2a_model/s2a_model_1layer/model.safetensors"
    )
    s2a_full_ckpt = hf_hub_download(
        "amphion/MaskGCT", filename="s2a_model/s2a_model_full/model.safetensors"
    )
    end_time = time.time()
    download_time = end_time - start_time
    print(f"download_time: {download_time} seconds")

c. load stage

    start_time = time.time()
    # load semantic codec
    safetensors.torch.load_model(semantic_codec, semantic_code_ckpt)
    # load acoustic codec
    safetensors.torch.load_model(codec_encoder, codec_encoder_ckpt)
    safetensors.torch.load_model(codec_decoder, codec_decoder_ckpt)
    # load t2s model
    safetensors.torch.load_model(t2s_model, t2s_model_ckpt)
    # load s2a model
    safetensors.torch.load_model(s2a_model_1layer, s2a_1layer_ckpt)
    safetensors.torch.load_model(s2a_model_full, s2a_full_ckpt)
    end_time = time.time()
    load_time = end_time - start_time
    print(f"load_time: {load_time} seconds")

d. inference stage

            start_time = time.time()
            recovered_audio = maskgct_inference_pipeline.maskgct_inference(
                prompt_wav_path, prompt_text, target_text, "zh", "zh", target_len=10
            )
            end_time = time.time()
            infer_time = end_time - start_time
            print(f"Inference time for line {line_num}: {infer_time} seconds")

Then I found:

build_time: 202.2886848449707 seconds
download_time: 60.34766387939453 seconds
load_time: 32.2975959777832 seconds
Inference time for line 2: 14.074496269226074 seconds

Expected Outcome

Does this mean that if I want to inference a piece of audio, then I have to wait a long time for the model to load? Or is there something wrong with my Settings?

Hi, what is the inference speed of this model? It said that it is NAR model structure, but there are two big model in this arch, I guess it will be no quicker then former AR based pipelines.

@yuantuo666
Copy link
Collaborator

Hi, the model only needs to load the model once. You can use the Gradio demo or Jupyter Notebook to maintain the models in memory.
Besides, since not all required dependencies are pre-downloaded, it still takes time to download them from the web. This only takes time when you first generate a specific language sentence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants