DDP number of training iterations independent of #GPUs? #6337

timothybrooks · 2021-03-03T22:49:19Z

timothybrooks
Mar 3, 2021

Hello--

I am using DDP and anywhere from 1-8 GPUs. The PL documentation explains that it will wrap my training DataLoader in a DistributedSampler which gives each process a unique partition of the dataset. In this case, since it is also my understanding that the batch size is per-GPU, I would expect the total number of iterations to decrease proportionally to the number of GPUs. However the number of training batch iterations is constant (in my case ~100k) regardless of how many GPUs I use. It is equal to dataset_size / batch_size while I expected it to be dataset_size / (batch_size * num_gpus).

Am I missing something here? Why isn't PL partitioning the dataset such that each process sees a unique subset?

mrmotallebi · 2021-03-05T05:22:42Z

mrmotallebi
Mar 5, 2021

May it be related to my post, that each GPU is loading the whole dataset, thus the number of iterations for your case is constant?

#6350

Edit: Saw that you had already reacted to it. :)

2 replies

timothybrooks Mar 5, 2021
Author

Yes I think this is the same underlying issue!

mrmotallebi Mar 7, 2021

I tried changing the number of GPUs for my own case and the number of iterations changed accordingly.

mrmotallebi · 2021-03-07T16:34:34Z

mrmotallebi
Mar 7, 2021

How are you setting the number of GPUs? You have to pass it to the Trainer() otherwise it won't have effect.

1 reply

timothybrooks Mar 9, 2021
Author

I created the following issue:
#6423

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DDP number of training iterations independent of #GPUs? #6337

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

DDP number of training iterations independent of #GPUs? #6337

Uh oh!

timothybrooks Mar 3, 2021

Replies: 2 comments · 3 replies

Uh oh!

Uh oh!

mrmotallebi Mar 5, 2021

Uh oh!

timothybrooks Mar 5, 2021 Author

Uh oh!

mrmotallebi Mar 7, 2021

Uh oh!

mrmotallebi Mar 7, 2021

Uh oh!

timothybrooks Mar 9, 2021 Author

timothybrooks
Mar 3, 2021

Replies: 2 comments 3 replies

mrmotallebi
Mar 5, 2021

timothybrooks Mar 5, 2021
Author

mrmotallebi
Mar 7, 2021

timothybrooks Mar 9, 2021
Author