-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ERROR] Worker (pid:25134) was sent SIGKILL! Perhaps out of memory? #556
Comments
I think worker as 2 will require double the memory requirement for GPU. 13*2>24 |
I have observed that 13gb is used when worker is 2. |
There is a known issue with safetensors that only shows up on some systems. Windows especially suffers from it, but I've seen it reported on some Linux systems as well. I think it has to do with memory mapping not working properly when you have too many files open at once, or something like that. There is an option to bypass safetensors when loading models, which can be enabled with either |
Does it depends on NVIDIA driver version and cuda version.At present Driver Version: 535.183.01 and CUDA Version: 12.2.We are running it on Ubuntu 22.04 |
No, it's an issue with safetensors and/or possibly the OS kernel. Try using one of the options above to see if it helps. |
I tried with setting config.fasttensors = True ,but it does not work out.I tried using this in g4dn.xlarge instance but the model is not loading . |
Can you share the code that fails? The config option has to be set after |
config = ExLlamaV2Config(model_dir)
I am running it on kubernetes with g5.xlarge gpu instance. |
I'm not sure there's any way to prevent PyTorch from using a lot of virtual memory. But just out of interest, what do you get from the following? cat /proc/sys/vm/overcommit_memory
ulimit -v |
for the command |
I'm not sure about the implications actually, but I think you might want to try changing the overcommit mode.
or
🤷 |
@turboderp Reducing the cache size appeared to help because the cache state was being serialized. If anyone else hits this issue, passing the |
Hi,turboderp!,
I am using A10 gpu with 24 gb ram for inferencing LLama3 .I am gunicorn with workers count 2 but It is giving Perhaps out of memory?.It is using 13 gb out of 24gb only ,but still showing Running out of VRAM
The text was updated successfully, but these errors were encountered: