Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warning added for single GPU and NCCL #1226

Merged
merged 1 commit into from
Feb 3, 2024

Conversation

Jaiaid
Copy link
Contributor

@Jaiaid Jaiaid commented Feb 1, 2024

  • issue link referred to hint at single GPU user what to do if nccl internal check error
  • nccl required version mentioned

* issue link referred
* nccl required version mentioned
Copy link

netlify bot commented Feb 1, 2024

Deploy Preview for pytorch-examples-preview canceled.

Name Link
🔨 Latest commit 493da93
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-examples-preview/deploys/65bbcefbaf65700007818b3b

@Jaiaid
Copy link
Contributor Author

Jaiaid commented Feb 1, 2024

re-committing this change with slight modification of warning rather than assertion. Also the GPU count variable is set to one to work with later world_size variable initiation. Although I think semantically it is confusing for CPU only platform.
This is done to help users avoid some possible frustration when using default dist backend which is NCCL.

@msaroufim msaroufim self-requested a review February 3, 2024 05:58
@msaroufim msaroufim merged commit 8c246ba into pytorch:main Feb 3, 2024
7 checks passed
@sebastian-burlacu
Copy link

Does this warning imply that the default backend is nccl? The '--dist-backend' argument could be documented better, in that case - as I would assume the first example given, with no --dist-backend specified, should work fine without giving a warning. As this is a warning, I am also wary of the 'requires' since that would imply it errors out if used incorrectly, but it still seems to work fine?

@Jaiaid
Copy link
Contributor Author

Jaiaid commented Apr 3, 2024

@sebastian-burlacu
Yes the default backend is nccl, see line 68
This warning is just to give users a context where things may go wrong.
I may have mentioned the version requirement of nccl wrong but this error happened to me and relevant issue link is provided with the warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants