Skip to content

Commit daa04d5

Browse files
committed
rename AKS to Azure Kubernetes Service
Signed-off-by: vsoch <[email protected]>
1 parent 379e7db commit daa04d5

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025/2025-02-22-rootless-usernetes-gpu.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ This is a first prototype to get GPU devices working in [User-space Kubernetes](
99

1010
We want to test User-space Kubernetes "Usernetes" ability to run a GPU workload, and compare between Kubernetes (as provided by a cloud) and the equivalent user-space setup deployed with the same resources on the VM equivalent. Google Cloud has excellent tooling for deploying GPU and installed drivers for GKE, so I was able to get this [vanilla setup](https://github.com/converged-computing/flux-usernetes/tree/main/google/experiment/mnist-gpu/test/gke) working and tested in under an hour. The setup of the same stack, but on user-space Kubernetes on Compute Engine deployed with a custom VM base on Terraform, would prove to be more challenging.
1111

12-
I've designed various driver installers for previous work, including [infiniband on AKS](https://github.com/converged-computing/aks-infiniband-install) and more experimental ones like [deploying a Flux instance alongside the Kubelet](https://github.com/converged-computing/flux-distribute). NVIDIA GPU drivers are typically installed in a similar fashion, in the simplest case with [nvidia device plugin](https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/static/nvidia-device-plugin.yml) but now NVIDIA has exploded their software and Kubernetes tooling so everything but the kitchen sink is installed with the [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#operator-install-guide). Getting this working in user-space was uncharted territory, because we had two layers to work through - first the physical node to the rootless docker node (control plane or kubelet) and then from that node to the container in a pod deployed by containerd. Even just for the case of one layer of abstraction, I found many unsolved issues on GitHub and no single source of truth for how to do it. Needless to say, I wasn't sure about the complexity that would be warranted to get this working, or if I could do it at all.
12+
I've designed various driver installers for previous work, including [infiniband on Azure Kubernetes Service](https://github.com/converged-computing/aks-infiniband-install) and more experimental ones like [deploying a Flux instance alongside the Kubelet](https://github.com/converged-computing/flux-distribute). NVIDIA GPU drivers are typically installed in a similar fashion, in the simplest case with [nvidia device plugin](https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/static/nvidia-device-plugin.yml) but now NVIDIA has exploded their software and Kubernetes tooling so everything but the kitchen sink is installed with the [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#operator-install-guide). Getting this working in user-space was uncharted territory, because we had two layers to work through - first the physical node to the rootless docker node (control plane or kubelet) and then from that node to the container in a pod deployed by containerd. Even just for the case of one layer of abstraction, I found many unsolved issues on GitHub and no single source of truth for how to do it. Needless to say, I wasn't sure about the complexity that would be warranted to get this working, or if I could do it at all.
1313

1414
### Resources and Cloud
1515

0 commit comments

Comments
 (0)