You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
InstaSlice injects the NVIDIA_VISIBLE_DEVICES env variable into pod specs to assign containers to specific MIG slices. However, pod specs are immutable once created, hence before a decision about MIG slice allocation can be made. To work around this limitation, the InstaSlice webhook injects into the pod spec a reference to a ConfigMap using this indirection to make it possible to later populate the ConfigMap with the id of the chosen MIG slice. Once the ConfigMap is ready, InstaSlice ungates the pod.
Because the MIG slice id is pulled from the ConfigMap after a delay, a user or process with permission to create/update/patch/delete ConfigMaps in the pod namespace could alter the content of the ConfigMap prior to the container runtime configuring and starting the container. This could result in the container having access to unintended slices possibly interfering with other pods.
For the intended use case, i.e., backend clusters with no or limited user access, this is acceptable. However, we should document InstaSlice's dependency on unaltered ConfigMaps in general. It should be noted that typical GPU clusters with standard deployments of the NVIDIA GPU operator are susceptible to similar abuse as they permit accessing GPUs without having to request an nvidia.com/gpu resource in the first place. See NVIDIA/k8s-device-plugin#61 for details.
Alternative approaches could be considered to remove the dependency on a ConfigMap (with other drawbacks):
InstaSlice could target (i.e., intercept) deployments or jobs instead of pods.
InstaSlice could delete and recreate the pod once an allocation decision has been made.
The text was updated successfully, but these errors were encountered:
InstaSlice injects the
NVIDIA_VISIBLE_DEVICES
env variable into pod specs to assign containers to specific MIG slices. However, pod specs are immutable once created, hence before a decision about MIG slice allocation can be made. To work around this limitation, the InstaSlice webhook injects into the pod spec a reference to a ConfigMap using this indirection to make it possible to later populate the ConfigMap with the id of the chosen MIG slice. Once the ConfigMap is ready, InstaSlice ungates the pod.Because the MIG slice id is pulled from the ConfigMap after a delay, a user or process with permission to create/update/patch/delete ConfigMaps in the pod namespace could alter the content of the ConfigMap prior to the container runtime configuring and starting the container. This could result in the container having access to unintended slices possibly interfering with other pods.
For the intended use case, i.e., backend clusters with no or limited user access, this is acceptable. However, we should document InstaSlice's dependency on unaltered ConfigMaps in general. It should be noted that typical GPU clusters with standard deployments of the NVIDIA GPU operator are susceptible to similar abuse as they permit accessing GPUs without having to request an
nvidia.com/gpu
resource in the first place. See NVIDIA/k8s-device-plugin#61 for details.Alternative approaches could be considered to remove the dependency on a ConfigMap (with other drawbacks):
The text was updated successfully, but these errors were encountered: