Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help]: Gradio playground for MaskGCT load fails with less than 16GB VRAM #328

Open
pdeegan opened this issue Nov 3, 2024 · 4 comments
Open

Comments

@pdeegan
Copy link

pdeegan commented Nov 3, 2024

Problem Overview

Attempting to test the MaskGCT Gradio playground locally
https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct

Steps Taken

Ran the steps in the README for the full install

when executing python -m models.tts.maskgct.gradio_demo, fails after loading facebook/w2v-bert-2.0 with error: torch.cuda.OutOfMemoryError: tried to allocate... 7.75 total capacity ... 6.64 already allocated.

Expected Outcome

(A clear and concise description of what you expected to happen.)

Is there another set of models that can make this possible to test with 8GB of VRAM?

Environment Information

  • Operating System: Debian 12
  • Python Version: 3.11
  • Driver & CUDA Version: 550 with CUDA 12.1
  • GPU: nvidia 4060 laptop with 8GB Ram
@zziC7
Copy link

zziC7 commented Nov 4, 2024

Hello, I had the same problem.
When I submit inference in Gradio, it shows CUDA out of memory even if my prompt audio and target text is short.
My GPU is RTX 3080 Ti(12GB).

@treya-lin
Copy link
Contributor

My experience of trying this model on 3090Ti:

Merely loading the model takes up about 10G VRAM, then when doing normal inference(prompt audio of reasonable length like 10+seconds), it takes up about 12G VRAM, so I am afraid your hardware is not adequate for this model without further optimization.

@zziC7 @pdeegan

@Tybost
Copy link

Tybost commented Nov 4, 2024

Hello, I had the same problem. When I submit inference in Gradio, it shows CUDA out of memory even if my prompt audio and target text is short. My GPU is RTX 3080 Ti(12GB).

Try Nvidia Control Panel > Manage 3D Settings > CUDA - Sysmem Fallback Policy > Set Prefer Sysmem Fallback and when you exceed your VRAM it will start using your system ram.

@GUUser91
Copy link

GUUser91 commented Nov 18, 2024

I uploaded a modified version of the original huggingface MaskGCT gradio app that uses 10GB and 370MB of VRAM. The file is called app-old-target-duration-correction.py
https://huggingface.co/Bluebomber182/MaskGCT-TTS-Old-App/tree/main
Here's a screenshot of what's eating up the VRAM on my 4090. The app-old.py file is the gradio app for MaskGCT.
nvtop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants