Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fabric: add support for 'auto' accelerator #20473

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

GeminiLn
Copy link

@GeminiLn GeminiLn commented Dec 7, 2024

What does this PR do?

This PR adds support for --accelerator=auto in the Fabric CLI. When auto is passed as the accelerator, the code now dynamically resolves it to the best available hardware accelerator:

  • Uses cuda if GPUs are available.
  • Falls back to mps for Apple Silicon machines.
  • Defaults to cpu if no accelerators are available.

This ensures that users can specify --accelerator=auto without needing to manually detect hardware availability.

Fixes Issue #20451

No breaking changes introduced.


Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

@github-actions github-actions bot added the fabric lightning.fabric.Fabric label Dec 7, 2024
@lantiga
Copy link
Collaborator

lantiga commented Dec 8, 2024

Thanks for the draft PR!

So if you don't specify --accelerator, Fabric will already auto-discover. That is, autodiscovery will happen if accelerator is None. I would avoid duplicating the logic, and instead make sure that "auto" follows the same path as None.

@GeminiLn
Copy link
Author

GeminiLn commented Dec 8, 2024

Thanks for the draft PR!

So if you don't specify --accelerator, Fabric will already auto-discover. That is, autodiscovery will happen if accelerator is None. I would avoid duplicating the logic, and instead make sure that "auto" follows the same path as None.

Thanks for the information!

I'm not sure if I understand it correctly. I ran a quick test on src/lightning/fabric/cli.py and noticed that If the --accelerator is None, _get_num_processes will call the CPUAccelerator. Should the auto follow the same logic? If so, it seems that simply including auto in _SUPPORTED_ACCELERATORS would be enough.

@lantiga
Copy link
Collaborator

lantiga commented Dec 8, 2024

Auto-detection of the accelerator type (cpu vs cuda, etc) will only happen of accelerator is None, so adding to the list will only make it allowed but then it will not trigger autodetection.

What you are referring to is auto detection of the number of devices. In theory not specifying accelerator and devices should detect gpus and set devices to the number of gpus available. If that’s not the case we need to fix it.

@GeminiLn
Copy link
Author

GeminiLn commented Dec 8, 2024

Thanks for clarifying!

Here’s where I’m confused: if the auto-detection of the accelerator works correctly, I expect accelerator=cuda when _get_num_processes is called, which should then set the number of GPUs (I’m running the code on three NVIDIA L40S).

However, if I leave accelerator=None, _get_num_processes still sees accelerator=None. As a result, the CPUAccelerator is called because None doesn’t match any accelerator categories in _get_num_processes.

@lantiga
Copy link
Collaborator

lantiga commented Dec 8, 2024

yes you are right: the logic that handles accelerators when None is provided is triggered later
we need to make sure that we don’t fall back to picking 1 in that case (as for the cpu case), but defer it to auto-detection

@GeminiLn
Copy link
Author

GeminiLn commented Dec 8, 2024

In that case, which file should we update to support accelerator=auto? I feel in the src/lightning/fabric/cli.py, simply adding the auto into _SUPPORTED_ACCELERATORS seems sufficient. I tested the code by only adding auto into _SUPPORTED_ACCELERATORS. Fabric correctly chooses the <lightning.fabric.accelerators.cuda.CUDAAccelerator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fabric lightning.fabric.Fabric
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants