Update DDP tutorial for the correct order of set_device #1285

fegin · 2024-09-13T20:24:02Z

Summary:
set_device should be called before init_process_group.

Summary: `set_device` should be called before `init_process_group`. Test Plan: Reviewers: Subscribers: Tasks: Tags:

netlify · 2024-09-13T20:24:17Z

✅ Deploy Preview for pytorch-examples-preview canceled.

Name	Link
🔨 Latest commit	`b46fba4`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-examples-preview/deploys/66e49f64f48f9d0008b89436

Summary: Add "set_device" call to keep things consistent between all DDP tutorials. This was inspired by the following change in the PyTorch repo: pytorch/examples#1285 (review) Test Plan: Ran tutorial with the applied changes and we see: """ Running basic DDP example on rank 3. Running basic DDP example on rank 1. Running basic DDP example on rank 2. Running basic DDP example on rank 0. Finished running basic DDP example on rank 0. Finished running basic DDP example on rank 1. Finished running basic DDP example on rank 3. Finished running basic DDP example on rank 2. Running DDP checkpoint example on rank 2. Running DDP checkpoint example on rank 1. Running DDP checkpoint example on rank 0. Running DDP checkpoint example on rank 3. Finished DDP checkpoint example on rank 0. Finished DDP checkpoint example on rank 3. Finished DDP checkpoint example on rank 1. Finished DDP checkpoint example on rank 2. Running DDP with model parallel example on rank 0. Running DDP with model parallel example on rank 1. Finished running DDP with model parallel example on rank 0. Finished running DDP with model parallel example on rank 1. """

Summary: 1. Add "set_device" call to keep things consistent between all DDP tutorials. This was inspired by the following change in the PyTorch repo: pytorch/examples#1285 (review) 2. Fix up the tutorial and add additional prints when the model exits. Test Plan: Ran tutorial with the applied changes and we see: """ Running basic DDP example on rank 3. Running basic DDP example on rank 1. Running basic DDP example on rank 2. Running basic DDP example on rank 0. Finished running basic DDP example on rank 0. Finished running basic DDP example on rank 1. Finished running basic DDP example on rank 3. Finished running basic DDP example on rank 2. Running DDP checkpoint example on rank 2. Running DDP checkpoint example on rank 1. Running DDP checkpoint example on rank 0. Running DDP checkpoint example on rank 3. Finished DDP checkpoint example on rank 0. Finished DDP checkpoint example on rank 3. Finished DDP checkpoint example on rank 1. Finished DDP checkpoint example on rank 2. Running DDP with model parallel example on rank 0. Running DDP with model parallel example on rank 1. Finished running DDP with model parallel example on rank 0. Finished running DDP with model parallel example on rank 1. """

Summary: 1. Add "set_device" call to keep things consistent between all DDP tutorials. This was inspired by the following change in the PyTorch repo: pytorch/examples#1285 (review) 2. Fix up the tutorial and add additional prints when the model exits. Test Plan: Ran tutorial with the applied changes. """ Co-authored-by: Svetlana Karslioglu <[email protected]>

Update DDP tutorial for the correct order of set_device

b46fba4

Summary: `set_device` should be called before `init_process_group`. Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the cla signed label Sep 13, 2024

wz337 approved these changes Sep 13, 2024

View reviewed changes

fegin merged commit a308b4e into main Sep 17, 2024
7 of 8 checks passed

fegin deleted the chienchin/ddp_tutorial_fix branch September 17, 2024 07:15

c-p-i-o mentioned this pull request Oct 28, 2024

[doc] Fix to DDP tutorial pytorch/tutorials#3120

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update DDP tutorial for the correct order of set_device #1285

Update DDP tutorial for the correct order of set_device #1285

fegin commented Sep 13, 2024

netlify bot commented Sep 13, 2024 •

edited

Loading

Update DDP tutorial for the correct order of set_device #1285

Update DDP tutorial for the correct order of set_device #1285

Conversation

fegin commented Sep 13, 2024

netlify bot commented Sep 13, 2024 • edited Loading

✅ Deploy Preview for pytorch-examples-preview canceled.

netlify bot commented Sep 13, 2024 •

edited

Loading