You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently it takes about 5-6 seconds to generate an audio below 10 seconds with prompt audio about 10+ second on one 3090 ti. It takes about 12G VRAM and 100% GPU util. So seems not likely to run another instance on the same card.
So I wanna go a bit further into optimizing the speed but don't know where to start? Is there anyway to speed it up? I mean, like can it do batch inferencing, or use quantization, or use it to onnx model? I am not sure how this type of TTS models may be optimized. Any suggestion or guide or relevant resources for me to look into is much appreciated!
Expected Outcome
Faster inference without quality loss
Environment Information
Operating System: Ubuntu 22.04.5 LTS
Python Version: 3.10
Driver & CUDA Version: cuda 11.8
Error Messages and Logs:
The text was updated successfully, but these errors were encountered:
Hi, we are working on a lightweight version of MaskGCT, which will take a few VRAM and faster speech.
At the same time, you may try methods like quantization to speed up the inference speed.
Problem Overview
Currently it takes about 5-6 seconds to generate an audio below 10 seconds with prompt audio about 10+ second on one 3090 ti. It takes about 12G VRAM and 100% GPU util. So seems not likely to run another instance on the same card.
So I wanna go a bit further into optimizing the speed but don't know where to start? Is there anyway to speed it up? I mean, like can it do batch inferencing, or use quantization, or use it to onnx model? I am not sure how this type of TTS models may be optimized. Any suggestion or guide or relevant resources for me to look into is much appreciated!
Expected Outcome
Faster inference without quality loss
Environment Information
The text was updated successfully, but these errors were encountered: