Out of memory #9

hongyuntw · 2022-06-07T15:38:03Z

Hi author
I refer to your code which include GradNorm part, and rewrite for my own transformer based model training.
Everything is good, but when the iteration growth up, the error CUDA out of memory. will occur
I would like to know if you have encountered the same error in your training stage?
I thought that is because of retain_graph

loss.backward(retain_graph=True)
and
gygw = torch.autograd.grad(task_losses[k], W.parameters(), retain_graph=True)

Am I right?
And is there any method to avoid this error when iteration growth up?

Thank you for your nice code :)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory #9

Out of memory #9

hongyuntw commented Jun 7, 2022

Out of memory #9

Out of memory #9

Comments

hongyuntw commented Jun 7, 2022