Skip to content
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.

OOM on 345M with GPU #69

Open
babaraza opened this issue Feb 1, 2021 · 4 comments
Open

OOM on 345M with GPU #69

babaraza opened this issue Feb 1, 2021 · 4 comments

Comments

@babaraza
Copy link

babaraza commented Feb 1, 2021

Hi,
I am getting an error when trying to train using 345M with GPU. If I use CPU it trains fine, albeit very slowly. I am using Nvidia GTX 1070 and have CUDA and CUDNN installed.

The interactive_conditional_samples.py and generate_unconditional_samples.py work fine with GPU so I know GPU is working. I only encounter the OOM when trying to train.

I tried using the "--optimizer sg" flag and using default batch_size of 1:
python train.py --dataset data.npz --model_name 345M --optimizer sgd

Error (truncated):
2021-01-31 21:52:28.744480: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory
trying to allocate 4.00MiB (rounded to 4194304). Current allocation summary follows.
2021-01-31 21:52:28.744901: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (256): Total Chunks: 276, Chunks
in use: 266. 69.0KiB allocated for chunks. 66.5KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-01-31 21:52:28.745756: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (512): Total Chunks: 0, Chunks in
use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-01-31 21:52:28.746117: I tensorflow/core/common_runtime/bfc_allocator.cc:869] Bin (1024): Total Chunks: 3, Chunks in
use: 1. 4.0KiB allocated for chunks. 1.3KiB in use in bin. 1.0KiB client-requested in use in bin.

...

2021-01-31 21:52:28.966844: I tensorflow/core/common_runtime/bfc_allocator.cc:921] Sum Total of in-use chunks: 6.64GiB
2021-01-31 21:52:28.966890: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 71357135
36 memory_limit_: 7135713690 available bytes: 154 curr_region_allocation_bytes_: 8589934592
2021-01-31 21:52:28.966950: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats:
Limit: 7135713690
InUse: 7134272000
MaxInUse: 7134274048
NumAllocs: 3078
MaxAllocSize: 268435456

2021-01-31 21:52:28.967091: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ***************************************


2021-01-31 21:52:28.967156: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at cwise_ops_common.cc:82 :
Resource exhausted: OOM when allocating tensor with shape[1,16,1024,1024] and type bool on /job:localhost/replica:0/task:0
/device:GPU:0 by allocator GPU_0_bfc

Any idea on how to resolve this?

@jaimu97
Copy link

jaimu97 commented Feb 23, 2021

On a single 1070? I don't think that's possible. I'm currently training a 10GB dataset using the 345M model on a 3090 and it's using ~17GB on VRAM

345MTraining

@babaraza
Copy link
Author

Thank you for your reply, I am trying to get my hands on a 3080 or 3090 for this very reason and your screenshot and message just confirmed I need the 3090!

@jaimu97
Copy link

jaimu97 commented Feb 23, 2021

I honestly couldn't recommend getting a 3090 just for training/fine-tuning 345M gpt-2. 117M Is definitely good enough for every use case (for me anyway) if your 1070 can handle that. I only use it to train when I'm at work. ;)

@babaraza
Copy link
Author

I agree it won’t solely be for training I just wanted to justify it for my gaming needs ;) with that said it’s almost impossible to find 3080/3090 for good prices so I’m waiting it out. I do appreciate everyone’s input. I did train the 117M and it does seem to give good results. There’s also the google collabs that everyone’s been using to train since they give you access to a nvidia t4 for free.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants