[Help]: Is there anyway to speed up inferencing of MaskGCT #344

treya-lin · 2024-11-11T11:09:41Z

Problem Overview

Currently it takes about 5-6 seconds to generate an audio below 10 seconds with prompt audio about 10+ second on one 3090 ti. It takes about 12G VRAM and 100% GPU util. So seems not likely to run another instance on the same card.

So I wanna go a bit further into optimizing the speed but don't know where to start? Is there anyway to speed it up? I mean, like can it do batch inferencing, or use quantization, or use it to onnx model? I am not sure how this type of TTS models may be optimized. Any suggestion or guide or relevant resources for me to look into is much appreciated!

Expected Outcome

Faster inference without quality loss

Environment Information

Operating System: Ubuntu 22.04.5 LTS
Python Version: 3.10
Driver & CUDA Version: cuda 11.8
Error Messages and Logs:

yuantuo666 · 2024-11-18T16:01:30Z

Hi, we are working on a lightweight version of MaskGCT, which will take a few VRAM and faster speech.
At the same time, you may try methods like quantization to speed up the inference speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help]: Is there anyway to speed up inferencing of MaskGCT #344

[Help]: Is there anyway to speed up inferencing of MaskGCT #344

treya-lin commented Nov 11, 2024 •

edited

Loading

yuantuo666 commented Nov 18, 2024

[Help]: Is there anyway to speed up inferencing of MaskGCT #344

[Help]: Is there anyway to speed up inferencing of MaskGCT #344

Comments

treya-lin commented Nov 11, 2024 • edited Loading

Problem Overview

Expected Outcome

Environment Information

yuantuo666 commented Nov 18, 2024

treya-lin commented Nov 11, 2024 •

edited

Loading