Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Comparison to Nvidia GPU Operator + GPU Feature Discovery #9

Open
gadkins opened this issue Sep 12, 2023 · 2 comments
Open

Comments

@gadkins
Copy link

gadkins commented Sep 12, 2023

Apologies if Issues is the wrong place for my question, but I don't see a Discussions forum for this repo.

I've read your Medium article, which provides a nice summary of what problem nvshare is solving.

However, I also came across this blog from VMWare, which describes GPU virtualization in Kubernetes via Nvidia's GPU Operator and GPU Feature Discovery, which adds labels to the Nodes such as nvidia.com/vgpu.present=true and facilitates fractional allocation of GPUs to Pods.

How does nvshare differ and/or what additional value does it provide?

@grgalex
Copy link
Owner

grgalex commented Sep 14, 2023

@gadkins

The GPU Operator and Feature Discovery are auxiliary mechanisms that make it easier to manage GPUs in a K8s cluster. They just make life easier by automatically installing the Nvidia drivers and Device Plugin (in the case of the Operator) as well as automatically adding labels/taints to nodes with GPUs (in the case of the Feature Discovery thing).

AFAIK, Nvidia offers two mechanisms for sharing a GPU between multiple containers. These are exposed to a Kubernetes cluster through the official device plugin [1].

TL;DR

  1. Multi-instance GPU (MIG) requires special hardware and chops a GPU into disjoint pieces, each with its own memory and computation units. There is no sharing between the pieces. Each process/container exclusively uses one or more pieces. If you want a process to use the whole GPU, it gets all the slices, so there is no sharing.
  2. "Nvidia Device Plugin GPU Sharing" lets processes/containers loose on the same GPU. They can cause each other to go OOM. Quite chaotic.

To get an experience akin to (2), in nvshare, turn the nvshare-scheduler OFF through the CLI. This will use the default "CUDA black-box" scheduling.

Instead of OOM, processes may thrash the GPU instead.

1. MIG (Multi-Instance-GPU)

This requires special hardware (Ampere architecture GPUs). The GPU's hardware is segmented in a way that allows the driver to offer "true" splits of the GPU as independent devices.

You can skim through the official docs [2] for an overview on how that works.

2. "Nvidia Device Plugin GPU Sharing"

NVIDIA device plugin 0.12.0 officially provides an option to enable sharing a GPU between multiple containers (https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/).

  • They call the black-box NVIDIA scheduling mechanism "time-slicing"
  • CUDA 11.1 allows configuring the context-switch interval (albeit in 4 discrete values [DEFAULT, SHORT, MEDIUM, LONG])
  • They just advertize multiple [nvidia.com/gpu](http://nvidia.com/gpu) devices for every physical GPU

Memory is still the core problem:

Quoting them:

The tradeoffs with time-slicing are increased latency, jitter, and potential out-of-memory (OOM)
conditions when many different applications are time-slicing on the GPU.

This simply solves the 1-1 assignment on K8s and doesn't do anything to prevent OOM and friction between co-located apps.

I'll quote my thesis [3] (the abstract and first chapter are especially worth a read) on this very important distinction that we must always keep in mind when evaluating these alternative approaches:

While the problem of exclusive assignment of GPUs can be solved trivially
(for example by tweaking device-plugin to advertise a greater number of nvidia.com/gpu
than physical GPUs), the **CORE ISSUE** is that of managing the friction
between co-located tasks (how 2+ processes on the same node behave,
irrespective of Kubernetes) and that is hard to solve.

[1] https://github.com/NVIDIA/k8s-device-plugin
[2] https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
[3] https://github.com/grgalex/nvshare/blob/main/grgalex-thesis.pdf

@gadkins
Copy link
Author

gadkins commented Sep 14, 2023

Great answer! Thank you!

Ahhh, I did not realize that the Nvidia device plugin for GPU sharing does not gracefully handle fair-sharing of memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants