Why ddp is better than ddp_spawn #12253

binzhougithub · 2022-03-07T05:52:05Z

binzhougithub
Mar 7, 2022

I am using ddp_spawn strategy now, it seems that ddp is better, because I found this page, see below

========
We use DDP this way because ddp_spawn has a few limitations (due to Python and PyTorch):

Since .spawn() trains the model in subprocesses, the model on the main process does not get updated.
Dataloader(num_workers=N), where N is large, bottlenecks training with DDP… ie: it will be VERY slow or won’t work at all. This is a PyTorch limitation.

=========

I didn't get these two points, for example, the first argument is about model on main process, my train/validation steps are in subprocesses, therefore, the model on the main process won't bother me. And in 2nd argument, "This is a PyTorch limitation", why doesn't "DDP" strategy have the same issue?

Thanks

RaoZhongtao · 2024-08-28T04:08:47Z

RaoZhongtao
Aug 28, 2024

Same question. Does anyone have any ideas about this?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why ddp is better than ddp_spawn #12253

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Why ddp is better than ddp_spawn #12253

binzhougithub Mar 7, 2022

Replies: 1 comment

RaoZhongtao Aug 28, 2024

binzhougithub
Mar 7, 2022

RaoZhongtao
Aug 28, 2024