[ERROR] Worker (pid:25134) was sent SIGKILL! Perhaps out of memory? #556

UTSAV-44 · 2024-07-18T10:17:03Z

Hi,turboderp!,

I am using A10 gpu with 24 gb ram for inferencing LLama3 .I am gunicorn with workers count 2 but It is giving Perhaps out of memory?.It is using 13 gb out of 24gb only ,but still showing Running out of VRAM

remichu-ai · 2024-07-18T10:24:10Z

I think worker as 2 will require double the memory requirement for GPU. 13*2>24

UTSAV-44 · 2024-07-18T10:43:36Z

I think worker as 2 will require double the memory requirement for GPU. 13*2>24

I have observed that 13gb is used when worker is 2.

turboderp · 2024-07-18T10:44:07Z

There is a known issue with safetensors that only shows up on some systems. Windows especially suffers from it, but I've seen it reported on some Linux systems as well. I think it has to do with memory mapping not working properly when you have too many files open at once, or something like that.

There is an option to bypass safetensors when loading models, which can be enabled with either -fst on the command line, setting the EXLLAMA_FASTTENSORS env variable, or setting config.fasttensors = True in Python.

UTSAV-44 · 2024-07-18T12:14:20Z

Does it depends on NVIDIA driver version and cuda version.At present Driver Version: 535.183.01 and CUDA Version: 12.2.We are running it on Ubuntu 22.04

turboderp · 2024-07-18T12:25:41Z

No, it's an issue with safetensors and/or possibly the OS kernel. Try using one of the options above to see if it helps.

UTSAV-44 · 2024-07-19T12:46:18Z

I tried with setting config.fasttensors = True ,but it does not work out.I tried using this in g4dn.xlarge instance but the model is not loading .

turboderp · 2024-07-19T13:19:16Z

Can you share the code that fails? The config option has to be set after config.prepare() is called but before model.load().

UTSAV-44 · 2024-07-20T05:55:44Z

config = ExLlamaV2Config(model_dir)
config.fasttensors = True
self.model = ExLlamaV2(config)

	self.cache = ExLlamaV2Cache_Q4(self.model, max_seq_len=256*96, lazy=True)  
	self.model.load_autosplit(self.cache, progress=True)

	print("Loading tokenizer...")
	self.tokenizer = ExLlamaV2Tokenizer(config)

	self.generator = ExLlamaV2DynamicGenerator(
		model=self.model,
		cache=self.cache,
		tokenizer=self.tokenizer,
	)

	self.generator.warmup()

I am running it on kubernetes with g5.xlarge gpu instance.

turboderp · 2024-07-20T07:59:24Z

I'm not sure there's any way to prevent PyTorch from using a lot of virtual memory. But just out of interest, what do you get from the following?

cat /proc/sys/vm/overcommit_memory
ulimit -v

UTSAV-44 · 2024-07-22T07:32:33Z

for the command
cat /proc/sys/vm/overcommit_memory I got 1
ulimit -v I got unlimited

turboderp · 2024-07-22T12:01:24Z

I'm not sure about the implications actually, but I think you might want to try changing the overcommit mode.

sudo sysctl vm.overcommit_memory=0

or

sudo sysctl vm.overcommit_memory=2

🤷

brthor · 2024-08-17T20:21:58Z

@turboderp
EDIT: The sigkill issue I detailed here was due to serialization of exllama state by the huggingface datasets.map() function when the exllama model is pre-initialized and is unrelated to exllama.

Reducing the cache size appeared to help because the cache state was being serialized.

If anyone else hits this issue, passing the new_fingerprint='some_rnd_str' to datasets.map() will prevent the serialization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ERROR] Worker (pid:25134) was sent SIGKILL! Perhaps out of memory? #556

[ERROR] Worker (pid:25134) was sent SIGKILL! Perhaps out of memory? #556

UTSAV-44 commented Jul 18, 2024 •

edited

Loading

remichu-ai commented Jul 18, 2024

UTSAV-44 commented Jul 18, 2024

turboderp commented Jul 18, 2024 •

edited

Loading

UTSAV-44 commented Jul 18, 2024 •

edited

Loading

turboderp commented Jul 18, 2024

UTSAV-44 commented Jul 19, 2024

turboderp commented Jul 19, 2024

UTSAV-44 commented Jul 20, 2024

turboderp commented Jul 20, 2024

UTSAV-44 commented Jul 22, 2024

turboderp commented Jul 22, 2024

brthor commented Aug 17, 2024 •

edited

Loading

[ERROR] Worker (pid:25134) was sent SIGKILL! Perhaps out of memory? #556

[ERROR] Worker (pid:25134) was sent SIGKILL! Perhaps out of memory? #556

Comments

UTSAV-44 commented Jul 18, 2024 • edited Loading

remichu-ai commented Jul 18, 2024

UTSAV-44 commented Jul 18, 2024

turboderp commented Jul 18, 2024 • edited Loading

UTSAV-44 commented Jul 18, 2024 • edited Loading

turboderp commented Jul 18, 2024

UTSAV-44 commented Jul 19, 2024

turboderp commented Jul 19, 2024

UTSAV-44 commented Jul 20, 2024

turboderp commented Jul 20, 2024

UTSAV-44 commented Jul 22, 2024

turboderp commented Jul 22, 2024

brthor commented Aug 17, 2024 • edited Loading

UTSAV-44 commented Jul 18, 2024 •

edited

Loading

turboderp commented Jul 18, 2024 •

edited

Loading

UTSAV-44 commented Jul 18, 2024 •

edited

Loading

brthor commented Aug 17, 2024 •

edited

Loading