Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer overflow with Llama 3 8B #109

Open
renepeinl opened this issue Jun 11, 2024 · 0 comments
Open

Buffer overflow with Llama 3 8B #109

renepeinl opened this issue Jun 11, 2024 · 0 comments

Comments

@renepeinl
Copy link

I tested with Ubuntu 24.04 LTS on two different PCs, both having 16 GB of main memory and no dedicated GPU. I therefore run all models solely on the CPU.
I was able to run Mistral 7B (AWQ Int4) together with Whisper small and Piper TTS without any problems.
However, when trying to run Llama 8B (AWQ Int4) the model loads but generates a buffer overflow as soon as I issue the first query, even without ASR and TTS running in parallel. I checked the main memory with top, but I can't see that RAM is full.
Any suggestions on how to get Llama 3 running with that HW configuration?
Any plans to support Phi 3 any time soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant