Changing CustomDataParallel to DistributedDataParallel #487

adrianosantospb · 2020-08-24T20:54:17Z

Changing the previous CustomDataParallel to DistributedDataParallel to improve the training speed.

zylo117 · 2020-08-25T01:25:14Z

train.py

 def save_checkpoint(model, name):
-    if isinstance(model, CustomDataParallel):
+    if isinstance(model, torch.nn.parallel.DistributedDataParallel):
        torch.save(model.module.model.state_dict(), os.path.join(opt.saved_path, name))


I'm not sure if DDP's module is the real model like DP

ASAIK, it's model.model.state_dict() for DDP like the ordinary model. Or both are ok?

zylo117 · 2020-08-25T01:33:06Z

Hi, thanks for your contribution.
Have you tested it by training and loading the last training weights successfully? I think the weights saving & loading could be a problem.
And my another concern is that the printing will be showed for N times (N for number of gpus) and tensorboard will be recorded for N times too.

adrianosantospb · 2020-08-25T02:22:20Z

It's a pleasure. You did a great job. Yes. I've tested in the afternoon, but I'm training a big model now. Tomorrow I will do more tests to see; this model is running on my machine (job).

Changing CustomDataParallel to DistributedDataParallel

03cc549

zylo117 reviewed Aug 25, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changing CustomDataParallel to DistributedDataParallel #487

Changing CustomDataParallel to DistributedDataParallel #487

Uh oh!

adrianosantospb commented Aug 24, 2020

Uh oh!

zylo117 Aug 25, 2020 •

edited

Loading

Uh oh!

zylo117 Aug 25, 2020 •

edited

Loading

Uh oh!

zylo117 commented Aug 25, 2020

Uh oh!

adrianosantospb commented Aug 25, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Changing CustomDataParallel to DistributedDataParallel #487

Are you sure you want to change the base?

Changing CustomDataParallel to DistributedDataParallel #487

Uh oh!

Conversation

adrianosantospb commented Aug 24, 2020

Uh oh!

zylo117 Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zylo117 Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zylo117 commented Aug 25, 2020

Uh oh!

adrianosantospb commented Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zylo117 Aug 25, 2020 •

edited

Loading

zylo117 Aug 25, 2020 •

edited

Loading

adrianosantospb commented Aug 25, 2020 •

edited

Loading