How should I use an HF model? For any batch_size it gives cuda out of memmory error. #991

puraminy · 2022-03-06T14:44:41Z

I am trying to load a HuggingFace model, but whatever batch_size I give it, it throws CUDA memmory error!
I used t5-small even, and hadn't such problem with much higher models.

it occurs after saveing first checkpoint!

2022-03-06 20:15:01,951:INFO:absl: Loading from /home/pouramini/t5logs/last-large/sup/HF810_lr_0.001_0/model-0.checkpoint
============================ Training =========================
2022-03-06 20:15:02,050:INFO:absl: Loading from /home/pouramini/pret/t5-small/model-0.checkpoint
2022-03-06 20:15:02,111:WARNING:absl: _sup is both a Task and a Mixture, returning Mixture
2022-03-06 20:15:02.112207: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.114921: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.115217: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.115631: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-06 20:15:02.115902: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.116195: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.116464: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.300687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.301026: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.301306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-06 20:15:02.301572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6211 MB memory:  -> device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1
2022-03-06 20:15:02,368:INFO:absl: Automatically caching small dataset in memory: '_sup:train'
2022-03-06 20:15:02,785:WARNING:absl: _sup is both a Task and a Mixture, returning Mixture
2022-03-06 20:15:03,057:INFO:absl: Saving checkpoint for step 0
Traceback (most recent call last):
  File "/home/pouramini/rainbow/bin/fine-tune.py", line 553, in <module>
    fine_tune()
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/pouramini/rainbow/bin/fine-tune.py", line 481, in fine_tune
    model.finetune(
  File "/home/pouramini/text-to-text-transfer-transformer/t5/models/hf_model.py", line 572, in finetune
    self.train(mixture_or_task_name, finetune_steps, **train_kwargs)
  File "/home/pouramini/text-to-text-transfer-transformer/t5/models/hf_model.py", line 341, in train
    outputs = self._model(
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1612, in forward
    decoder_outputs = self.decoder(
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1005, in forward
    layer_outputs = layer_module(
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 666, in forward
    cross_attention_outputs = self.layer[1](
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 581, in forward
    attention_output = self.EncDecAttention(
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pouramini/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 518, in forward
    attn_output = unshape(torch.matmul(attn_weights, value_states))  # (batch_size, seq_length, dim)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.93 GiB total capacity; 936.32 MiB already allocated; 13.44 MiB free; 950.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
2022-03-06 20:15:04.744503: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

The text was updated successfully, but these errors were encountered:

yfqiu98 · 2022-04-16T05:35:36Z

any update on this issue? thx

puraminy changed the title ~~Which checkpoint should I use for HF model?~~ How checkpoint should I use for HF model? For any batch_size it gives cuda out of memmory error. Mar 6, 2022

puraminy changed the title ~~How checkpoint should I use for HF model? For any batch_size it gives cuda out of memmory error.~~ How should I use for HF model? For any batch_size it gives cuda out of memmory error. Mar 7, 2022

puraminy changed the title ~~How should I use for HF model? For any batch_size it gives cuda out of memmory error.~~ How should I use an HF model? For any batch_size it gives cuda out of memmory error. Mar 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should I use an HF model? For any batch_size it gives cuda out of memmory error. #991

How should I use an HF model? For any batch_size it gives cuda out of memmory error. #991

puraminy commented Mar 6, 2022 •

edited

Loading

yfqiu98 commented Apr 16, 2022

How should I use an HF model? For any batch_size it gives cuda out of memmory error. #991

How should I use an HF model? For any batch_size it gives cuda out of memmory error. #991

Comments

puraminy commented Mar 6, 2022 • edited Loading

yfqiu98 commented Apr 16, 2022

puraminy commented Mar 6, 2022 •

edited

Loading