Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help]: Is there anyway to speed up inferencing of MaskGCT #344

Open
treya-lin opened this issue Nov 11, 2024 · 1 comment
Open

[Help]: Is there anyway to speed up inferencing of MaskGCT #344

treya-lin opened this issue Nov 11, 2024 · 1 comment

Comments

@treya-lin
Copy link
Contributor

treya-lin commented Nov 11, 2024

Problem Overview

Currently it takes about 5-6 seconds to generate an audio below 10 seconds with prompt audio about 10+ second on one 3090 ti. It takes about 12G VRAM and 100% GPU util. So seems not likely to run another instance on the same card.

So I wanna go a bit further into optimizing the speed but don't know where to start? Is there anyway to speed it up? I mean, like can it do batch inferencing, or use quantization, or use it to onnx model? I am not sure how this type of TTS models may be optimized. Any suggestion or guide or relevant resources for me to look into is much appreciated!

Expected Outcome

Faster inference without quality loss

Environment Information

  • Operating System: Ubuntu 22.04.5 LTS
  • Python Version: 3.10
  • Driver & CUDA Version: cuda 11.8
  • Error Messages and Logs:
@yuantuo666
Copy link
Collaborator

Hi, we are working on a lightweight version of MaskGCT, which will take a few VRAM and faster speech.
At the same time, you may try methods like quantization to speed up the inference speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants