You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tested with Ubuntu 24.04 LTS on two different PCs, both having 16 GB of main memory and no dedicated GPU. I therefore run all models solely on the CPU.
I was able to run Mistral 7B (AWQ Int4) together with Whisper small and Piper TTS without any problems.
However, when trying to run Llama 8B (AWQ Int4) the model loads but generates a buffer overflow as soon as I issue the first query, even without ASR and TTS running in parallel. I checked the main memory with top, but I can't see that RAM is full.
Any suggestions on how to get Llama 3 running with that HW configuration?
Any plans to support Phi 3 any time soon?
The text was updated successfully, but these errors were encountered:
I tested with Ubuntu 24.04 LTS on two different PCs, both having 16 GB of main memory and no dedicated GPU. I therefore run all models solely on the CPU.
I was able to run Mistral 7B (AWQ Int4) together with Whisper small and Piper TTS without any problems.
However, when trying to run Llama 8B (AWQ Int4) the model loads but generates a buffer overflow as soon as I issue the first query, even without ASR and TTS running in parallel. I checked the main memory with top, but I can't see that RAM is full.
Any suggestions on how to get Llama 3 running with that HW configuration?
Any plans to support Phi 3 any time soon?
The text was updated successfully, but these errors were encountered: