Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for clusters with AMD gpus? #2402

Open
lasuomela opened this issue May 31, 2024 · 3 comments
Open

Support for clusters with AMD gpus? #2402

lasuomela opened this issue May 31, 2024 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@lasuomela
Copy link

Hello,

I would like to run Habitat on a cluster with AMD Instinct MI250x GPU's. However, at least the version installed with conda (habitat-sim=0.3.0 [withbullet headless]) fails to run with the error

Platform::WindowlessEglApplication::tryCreateContext(): unable to find CUDA device 0 among 9 EGL devices in total

From the docs it is not completely clear if the conda packages are built with cuda support, so I'm wondering if building Habitat from source without cuda would work with the AMD gpu's.

Now, reading through #1511 I get the impression that Habitat actually depends on the Nvidia OpenGL driver. Is this so, or is there a way to run Habitat without Nvidia gpu's?

@lasuomela
Copy link
Author

I'm answering myself after a bit of digging. Seems like AMD GPU's / ROCm do not provide functionality to align the GPU 'CUDA device index' with the GPU 'EGL device index'.

See:
https://doc.magnum.graphics/magnum/classMagnum_1_1Platform_1_1WindowlessEglApplication.html#Platform-WindowlessEglApplication-device-selection

This is required in the Magnum library to place the simulator instance on the same GPU as PyTorch. So, enabling AMD support seems unlikely.

Could @erikwijmans comment, since I see you wrote the Magnum code for Nvidia GPU device selection? Would similar functionality be possible with the mesa driver?

Br,
Lauri

@aclegg3
Copy link
Contributor

aclegg3 commented Jul 10, 2024

Hey @lasuomela

@erikwijmans is not actively working on the project at this time.

I also don't see much motivation for our team to support AMD GPUs in the foreseeable future.

However, thanks for pointing out the primary issue! I would love to see you or someone else solve this problem and unblock the feature, but for now I don't think we'll have the bandwidth to do so.

@aclegg3 aclegg3 added enhancement New feature or request help wanted Extra attention is needed labels Jul 10, 2024
@lasuomela
Copy link
Author

All right, I see. I also don't have the time to pursue this, but at least this issue makes this concern visible.

Thanks!
-Lauri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants