Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running On Multiple GPUs #27

Open
playmakerbugger opened this issue Jun 23, 2021 · 5 comments
Open

Running On Multiple GPUs #27

playmakerbugger opened this issue Jun 23, 2021 · 5 comments

Comments

@playmakerbugger
Copy link

Hi I am running the image harmonization part of the model with a --train_stages 6 --max_size 350 and --lr_scale 0.5 to increase the quality of the images.

However, once I get to the 2 stage of the training, it crashes because of lack of CUDA memory. I altered the torch device for the model to accept more than 1 gpu (let's say gpus 0 and 1) and made changes to the model to be encapsulated in a DataParallel model so that it can run parallel on multiple GPUs. However, it still only runs on 1 GPU.

Do you have any suggestions to fix this issue?

@tohinz
Copy link
Owner

tohinz commented Jun 23, 2021

Without seeing the code it's difficult.
Have you changed how the parameter --gpu is handled (in the main_train.py file)?
By default it's set to 0 and later in the code we do (see here)
if torch.cuda.is_available(): torch.cuda.set_device(opt.gpu)
You might have to change that to get it to work.

@playmakerbugger
Copy link
Author

Hi, still having the problem. I changed that line to equal a torch device of two gpus (passed in set_device). It still runs on 1 gpu.

@tohinz
Copy link
Owner

tohinz commented Jun 30, 2021

Sorry for the late response.
What kind of GPU are you running this on and how much VRAM does it have?
I run all of my experiments on a single GPU with ~12GB VRAM without problems.

@playmakerbugger
Copy link
Author

GPU 0 with about 30000 MiB

@Liz1317
Copy link

Liz1317 commented Jul 16, 2022

I have the same problem. Is there any solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants