Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to configure nvidia-container-runtime to expose only certain GPUs from the host to docker #836

Open
lordofire opened this issue Dec 16, 2024 · 0 comments

Comments

@lordofire
Copy link

lordofire commented Dec 16, 2024

Hi there,

In our use case, we have 1 k8s node (special-designed hardware) with 3 GPUs, 2 GPUs used for container workloads and 1 GPU used for display purposes. In the current setup, all three GPUs are exposed to containers by default. I would like to know how to make nvidia-container-runtime and docker to only expose 2 GPUs by default for any pods scheduled on this node. Specifically:

  1. When the nvidia-device-plugin expose the nvidia.com/gpu to the k8s node capacity, it should be 2 instead of 3.
  2. When the pod is using NVIDIA_VISIBLE_DEVICES=all on the node, it should only see 2 instead of 3.

Note that we could not drain that 1 GPU, since it will still be needed for non-container GPU workloads.

Searched a bit on both this repo as well as other online, but did not find a good solution so far. Thanks in advance for the help.
Jianan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant