Fabric: add support for 'auto' accelerator #20473

GeminiLn · 2024-12-07T06:18:01Z

What does this PR do?

This PR adds support for --accelerator=auto in the Fabric CLI. When auto is passed as the accelerator, the code now dynamically resolves it to the best available hardware accelerator:

Uses cuda if GPUs are available.
Falls back to mps for Apple Silicon machines.
Defaults to cpu if no accelerators are available.

This ensures that users can specify --accelerator=auto without needing to manually detect hardware availability.

Fixes Issue #20451

No breaking changes introduced.

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

for more information, see https://pre-commit.ci

lantiga · 2024-12-08T03:04:31Z

Thanks for the draft PR!

So if you don't specify --accelerator, Fabric will already auto-discover. That is, autodiscovery will happen if accelerator is None. I would avoid duplicating the logic, and instead make sure that "auto" follows the same path as None.

GeminiLn · 2024-12-08T03:59:11Z

Thanks for the draft PR!

So if you don't specify --accelerator, Fabric will already auto-discover. That is, autodiscovery will happen if accelerator is None. I would avoid duplicating the logic, and instead make sure that "auto" follows the same path as None.

Thanks for the information!

I'm not sure if I understand it correctly. I ran a quick test on src/lightning/fabric/cli.py and noticed that If the --accelerator is None, _get_num_processes will call the CPUAccelerator. Should the auto follow the same logic? If so, it seems that simply including auto in _SUPPORTED_ACCELERATORS would be enough.

lantiga · 2024-12-08T04:21:18Z

Auto-detection of the accelerator type (cpu vs cuda, etc) will only happen of accelerator is None, so adding to the list will only make it allowed but then it will not trigger autodetection.

What you are referring to is auto detection of the number of devices. In theory not specifying accelerator and devices should detect gpus and set devices to the number of gpus available. If that’s not the case we need to fix it.

GeminiLn · 2024-12-08T04:35:25Z

Thanks for clarifying!

Here’s where I’m confused: if the auto-detection of the accelerator works correctly, I expect accelerator=cuda when _get_num_processes is called, which should then set the number of GPUs (I’m running the code on three NVIDIA L40S).

However, if I leave accelerator=None, _get_num_processes still sees accelerator=None. As a result, the CPUAccelerator is called because None doesn’t match any accelerator categories in _get_num_processes.

lantiga · 2024-12-08T04:51:04Z

yes you are right: the logic that handles accelerators when None is provided is triggered later
we need to make sure that we don’t fall back to picking 1 in that case (as for the cpu case), but defer it to auto-detection

GeminiLn · 2024-12-08T05:13:21Z

In that case, which file should we update to support accelerator=auto? I feel in the src/lightning/fabric/cli.py, simply adding the auto into _SUPPORTED_ACCELERATORS seems sufficient. I tested the code by only adding auto into _SUPPORTED_ACCELERATORS. Fabric correctly chooses the <lightning.fabric.accelerators.cuda.CUDAAccelerator.

Fabric: add support for 'auto' accelerator

865176f

github-actions bot added the fabric lightning.fabric.Fabric label Dec 7, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

3949b29

for more information, see https://pre-commit.ci

GeminiLn mentioned this pull request Dec 7, 2024

Error: Invalid value for '--accelerator': 'auto' is not one of 'cpu', 'gpu', 'cuda', 'mps', 'tpu'. #20451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fabric: add support for 'auto' accelerator #20473

Fabric: add support for 'auto' accelerator #20473

GeminiLn commented Dec 7, 2024 •

edited

Loading

lantiga commented Dec 8, 2024

GeminiLn commented Dec 8, 2024

lantiga commented Dec 8, 2024

GeminiLn commented Dec 8, 2024

lantiga commented Dec 8, 2024

GeminiLn commented Dec 8, 2024

Fabric: add support for 'auto' accelerator #20473

Are you sure you want to change the base?

Fabric: add support for 'auto' accelerator #20473

Conversation

GeminiLn commented Dec 7, 2024 • edited Loading

What does this PR do?

lantiga commented Dec 8, 2024

GeminiLn commented Dec 8, 2024

lantiga commented Dec 8, 2024

GeminiLn commented Dec 8, 2024

lantiga commented Dec 8, 2024

GeminiLn commented Dec 8, 2024

GeminiLn commented Dec 7, 2024 •

edited

Loading