Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DDP tutorial for the correct order of set_device #1285

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

fegin
Copy link
Contributor

@fegin fegin commented Sep 13, 2024

Summary:
set_device should be called before init_process_group.

Summary:
`set_device` should be called before `init_process_group`.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Copy link

netlify bot commented Sep 13, 2024

Deploy Preview for pytorch-examples-preview canceled.

Name Link
🔨 Latest commit b46fba4
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-examples-preview/deploys/66e49f64f48f9d0008b89436

@fegin fegin merged commit a308b4e into main Sep 17, 2024
7 of 8 checks passed
@fegin fegin deleted the chienchin/ddp_tutorial_fix branch September 17, 2024 07:15
c-p-i-o added a commit to pytorch/tutorials that referenced this pull request Oct 28, 2024
Summary:
Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)

Test Plan:
Ran tutorial with the applied changes and we see:
"""
Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0.
Running DDP with model parallel example on rank 1.
Finished running DDP with model parallel example on rank 0.
Finished running DDP with model parallel example on rank 1.
"""
c-p-i-o added a commit to pytorch/tutorials that referenced this pull request Oct 29, 2024
Summary:
Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)

Test Plan:
Ran tutorial with the applied changes and we see:
"""
Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0.
Running DDP with model parallel example on rank 1.
Finished running DDP with model parallel example on rank 0.
Finished running DDP with model parallel example on rank 1.
"""
c-p-i-o added a commit to pytorch/tutorials that referenced this pull request Oct 29, 2024
Summary:
Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)

Test Plan:
Ran tutorial with the applied changes and we see:
"""
Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0.
Running DDP with model parallel example on rank 1.
Finished running DDP with model parallel example on rank 0.
Finished running DDP with model parallel example on rank 1.
"""
c-p-i-o added a commit to pytorch/tutorials that referenced this pull request Oct 29, 2024
Summary:
Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)

Test Plan:
Ran tutorial with the applied changes and we see:
"""
Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0.
Running DDP with model parallel example on rank 1.
Finished running DDP with model parallel example on rank 0.
Finished running DDP with model parallel example on rank 1.
"""
c-p-i-o added a commit to pytorch/tutorials that referenced this pull request Oct 29, 2024
Summary:
Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)

Test Plan:
Ran tutorial with the applied changes and we see:
"""
Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0.
Running DDP with model parallel example on rank 1.
Finished running DDP with model parallel example on rank 0.
Finished running DDP with model parallel example on rank 1.
"""
c-p-i-o added a commit to pytorch/tutorials that referenced this pull request Oct 29, 2024
Summary:
1. Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)
2. Fix up the tutorial and add additional prints when the model exits.

Test Plan:
Ran tutorial with the applied changes and we see:
"""
Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0.
Running DDP with model parallel example on rank 1.
Finished running DDP with model parallel example on rank 0.
Finished running DDP with model parallel example on rank 1.
"""
c-p-i-o added a commit to pytorch/tutorials that referenced this pull request Oct 29, 2024
Summary:
1. Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)
2. Fix up the tutorial and add additional prints when the model exits.

Test Plan:
Ran tutorial with the applied changes and we see:
"""
Running basic DDP example on rank 3.
Running basic DDP example on rank 1.
Running basic DDP example on rank 2.
Running basic DDP example on rank 0.
Finished running basic DDP example on rank 0.
Finished running basic DDP example on rank 1.
Finished running basic DDP example on rank 3.
Finished running basic DDP example on rank 2.
Running DDP checkpoint example on rank 2.
Running DDP checkpoint example on rank 1.
Running DDP checkpoint example on rank 0.
Running DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 0.
Finished DDP checkpoint example on rank 3.
Finished DDP checkpoint example on rank 1.
Finished DDP checkpoint example on rank 2.
Running DDP with model parallel example on rank 0.
Running DDP with model parallel example on rank 1.
Finished running DDP with model parallel example on rank 0.
Finished running DDP with model parallel example on rank 1.
"""
svekars added a commit to pytorch/tutorials that referenced this pull request Oct 30, 2024
Summary:
1. Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)
2. Fix up the tutorial and add additional prints when the model exits.

Test Plan:
Ran tutorial with the applied changes.
"""

Co-authored-by: Svetlana Karslioglu <[email protected]>
svekars added a commit to pytorch/tutorials that referenced this pull request Oct 31, 2024
Summary:
1. Add "set_device" call to keep things consistent between all DDP
tutorials.
This was inspired by the following change in the PyTorch repo:
pytorch/examples#1285 (review)
2. Fix up the tutorial and add additional prints when the model exits.

Test Plan:
Ran tutorial with the applied changes.
"""

Co-authored-by: Svetlana Karslioglu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants