Manually syncing gradients in DDP with manual_backward #6340

paramhanji · 2021-03-04T00:51:32Z

paramhanji
Mar 4, 2021

I have a specific application that requires multiple gradient updates. Here is a basic example:

opt = self.optimizers()

y_hat1 = self.mynet(x1)
loss1 = MSE(y_hat1, y1)
opt.manual_backward(loss1)

y_hat2 = self.mynet(x2)
loss2 = MSE(y_hat2, y2)
opt.manual_backward(loss2)

opt.step()
opt.zero_grad()

As you can see, I use manual_backward() to compute and add gradients from multiple passes and perform gradient descent at the end. Is there any way I can use multi-GPU training for such a training step? Surely I will run into race conditions if I attempt multi-GPU training without any modifications to the above snippet! Generally, computation and syncing of gradients across GPUs are handled automatically by lightning.

The DDP docs state that:

5. The gradients are synced and averaged across all processes.
6. Each process updates its optimizer.

I'd like to find out if pytorch lighting has any mechanism to do gradient syncing manually. Happy to hear about alternative approaches as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manually syncing gradients in DDP with manual_backward #6340

{{title}}

Replies: 0 comments

Select a reply

Manually syncing gradients in DDP with manual_backward #6340

paramhanji Mar 4, 2021

Replies: 0 comments

paramhanji
Mar 4, 2021