Why ddp is better than ddp_spawn #12253
Unanswered
binzhougithub
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
Same question. Does anyone have any ideas about this? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using ddp_spawn strategy now, it seems that ddp is better, because I found this page, see below
========
We use DDP this way because ddp_spawn has a few limitations (due to Python and PyTorch):
Since .spawn() trains the model in subprocesses, the model on the main process does not get updated.
Dataloader(num_workers=N), where N is large, bottlenecks training with DDP… ie: it will be VERY slow or won’t work at all. This is a PyTorch limitation.
=========
I didn't get these two points, for example, the first argument is about model on main process, my train/validation steps are in subprocesses, therefore, the model on the main process won't bother me. And in 2nd argument, "This is a PyTorch limitation", why doesn't "DDP" strategy have the same issue?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions