Buffer overflow with Llama 3 8B #109

renepeinl · 2024-06-11T13:35:37Z

I tested with Ubuntu 24.04 LTS on two different PCs, both having 16 GB of main memory and no dedicated GPU. I therefore run all models solely on the CPU.
I was able to run Mistral 7B (AWQ Int4) together with Whisper small and Piper TTS without any problems.
However, when trying to run Llama 8B (AWQ Int4) the model loads but generates a buffer overflow as soon as I issue the first query, even without ASR and TTS running in parallel. I checked the main memory with top, but I can't see that RAM is full.
Any suggestions on how to get Llama 3 running with that HW configuration?
Any plans to support Phi 3 any time soon?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer overflow with Llama 3 8B #109

Buffer overflow with Llama 3 8B #109

renepeinl commented Jun 11, 2024

Buffer overflow with Llama 3 8B #109

Buffer overflow with Llama 3 8B #109

Comments

renepeinl commented Jun 11, 2024