gpu memory size recommended for pruning the llama2-7b-chat-hf model #44

rsong0606 · 2024-04-29T19:07:18Z

Great work team!

Currently, I am pruning on the llama2-7b-chat-hf model from hugging face.

python main.py

--model NousResearch/Llama-2-7b-chat-hf
--prune_method wanda
--sparsity_ratio 0.5
--sparsity_type 2:4
--save out/llama_7b-chat-hf/structured/wanda/

got this error message:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 21.99 GiB of which 11.69 MiB is free. Including non-PyTorch memory, this process has 21.98 GiB memory in use. Of the allocated memory 20.84 GiB is allocated by PyTorch, and 61.18 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Eric-mingjie · 2024-04-30T12:46:02Z

I think you need at least 14GB GPU memory to load the 7b model in fp16.

rsong0606 · 2024-04-30T14:19:28Z

@Eric-mingjie Thanks Eric, mine is 24 GB GPU memory. Given that at least 14GB would be used to load the model. I still have ~10 GB left in Nvidia L4. Are there any extra activities taking more memory and can we avoid in the arguments?

kast424 · 2024-05-08T19:42:17Z

Mine has 80GB of GPU RAM >>>>>NVIDIA A100 (and H100) GPU in Stanage has 80GB of GPU RAM
still got this error.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU

complete error for reference:

torch 2.3.0
transformers 4.41.0.dev0
accelerate 0.31.0.dev0

of gpus: 1

loading llm model mistralai/Mistral-7B-Instruct-v0.2
^MLoading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]^MLoading checkpoint shards: 33%|███▎ | 1/3 [00:12<00:24, 12.09s/it]^MLoading checkpoint shards: 67%|██████▋ | 2/3 [00:29<00:15,$
use device cuda:0
pruning starts
loading calibdation data
dataset loading complete
Traceback (most recent call last):
File "/mnt/parscratch/users/acq22stk/teamproject/wanda/main.py", line 110, in
main()
File "/mnt/parscratch/users/acq22stk/teamproject/wanda/main.py", line 69, in main
prune_wanda(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)
File "/mnt/parscratch/users/acq22stk/teamproject/wanda/lib/prune.py", line 160, in prune_wanda
outs[j] = layer(inps[j].unsqueeze(0), attention_mask=attention_mask, position_ids=position_ids)[0]
File "/users/acq22stk/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/users/acq22stk/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/users/acq22stk/.conda/envs/prune_llm/lib/python3.9/site-packages/transformers/models/mistral/modeling_mistral.py", line 754, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/users/acq22stk/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/users/acq22stk/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/users/acq22stk/.conda/envs/prune_llm/lib/python3.9/site-packages/transformers/models/mistral/modeling_mistral.py", line 85, in forward
hidden_states = hidden_states.to(torch.float32)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU

nehaprakriya · 2024-08-06T22:41:15Z

I have the same error with the Mixtral 8x7B model using 4 A6000 GPUs (48GiB memory per device).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu memory size recommended for pruning the llama2-7b-chat-hf model #44

gpu memory size recommended for pruning the llama2-7b-chat-hf model #44

rsong0606 commented Apr 29, 2024

Eric-mingjie commented Apr 30, 2024

rsong0606 commented Apr 30, 2024

kast424 commented May 8, 2024

nehaprakriya commented Aug 6, 2024

gpu memory size recommended for pruning the llama2-7b-chat-hf model #44

gpu memory size recommended for pruning the llama2-7b-chat-hf model #44

Comments

rsong0606 commented Apr 29, 2024

Eric-mingjie commented Apr 30, 2024

rsong0606 commented Apr 30, 2024

kast424 commented May 8, 2024

of gpus: 1

nehaprakriya commented Aug 6, 2024