Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear the torch cuda cache after response #301

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

RandomGitUser321
Copy link

@RandomGitUser321 RandomGitUser321 commented Jun 26, 2024

If a user is using 2.5 with int4, it can just barely fit into 8gb of vram (an extremely common vram size) without using any shared memory. If you mess with the settings and switch from sampling to beam search mode, with the default settings, it will cause the GPU to use more than 8gb of vram and roll over into the shared pool, which exponentially slows things down.

If the user then tries to go back to using sampling mode, to regain the lost speed, the vram usage will still contain the leftovers from using beam and will still be slowed down.

This PR just purges the cuda cache after response, so if you change between the settings, you don't get stuck with the garbage in vram that will keep you stuck with a reduced speed.

EDIT: Updated to only run the command if device == "cuda"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant