-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Excesive RAM overhead in Cortex when loading a model #4727
Comments
A rough estimate, can we know the current model and llama.cpp settings? We are planning to integrate a more accurate estimation tool, rearrange settings, and provide better guidance for improved observation. Like what settings would cause which side effect and what is the benefit of that. E.g. disabling cache or changing the KV cache quantization level reduces memory consumption but is slow. |
64K is pretty high context length. I was using with 8192. I will check how LM Studio and AnythingLLM deals with memory to have a comparison. My settings (although I see this memory issue with any model):
|
Jan version
0.5.15 Win11
Describe the Bug
First, it looked to me that Cortex loads the model twice. Jan's memory usage got out of control, it's approaching a double the amount of the model. I loaded Mistral-Small-24B-Instruct-2501-Q8_0 that is 23.33GB and after loading, memory went up by 38GB, that's 14.67GB overhead!
Perhaps it loads the model with very big context window that balloons the memory usage?
I used to use some models on my laptop with LM Studio and they would work without issues, but today I tried to use them with Jan and they failed due to lack of memory. I then tried to load small 3B model that usually takes 3GB of RAM with LM Studio and noticed that after loading model in Jan, my RAM usage increased ~6GB. So laptop with 16GB of RAM now could not load 7GB model :(
Many users with bootstrapped systems at home are clearly at disadvantage with this memory leak.
Also it causes "Model failed to load" error for those users who think that they have enough RAM to run the model. In my experience RAM (VRAM+RAM) requirements in industry are Model size + 1GB.
Steps to Reproduce
No response
Screenshots / Logs
No response
What is your OS?
The text was updated successfully, but these errors were encountered: