Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting for gpu node to be ready before scheduling pods using NVML #615

Open
easyrider14 opened this issue Nov 23, 2023 · 3 comments
Open

Comments

@easyrider14
Copy link

Hi everyone

I face an issue with gpu-operator and scaling of my K8S cluster
When adding a GPU node to cluster, gpu-operator will, amon others things, install container runtime and drivers
I've got a daemonset which uses NVML, but it is scheduled on the newly added gpu node as soon as it is available. But the driver is not ready, and initializing NVML fails. The container in my pod exits, but the pod is restarted and not deleted/created, so NVML initialization still fails. Which criteria should I use in mu daemonset definition to make sur my pod will be able to initialize NVML and run correctly when it will be scheduled on the node ?

Thanks

@tariq1890
Copy link
Contributor

You can add consider adding the gpu-operator-validator as an init container to your daemonset. This way, the daemonset would block on the nvidia-driver-daemonset transitioning to the Ready/Running state

Sample snippet

      initContainers:
      - name: driver-validation
        image: "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v23.9.0"
        imagePullPolicy: IfNotPresent
        command: ['sh', '-c']
        args: ["nvidia-validator"]
        env:
          - name: WITH_WAIT
            value: "true"
          - name: COMPONENT
            value: driver
        securityContext:
          privileged: true
          seLinuxOptions:
            level: "s0"
        volumeMounts:
          - name: driver-install-path
            mountPath: /run/nvidia/driver
            mountPropagation: HostToContainer
          - name: run-nvidia-validations
            mountPath: /run/nvidia/validations
            mountPropagation: Bidirectional
          - name: host-root
            mountPath: /host
            readOnly: true
            mountPropagation: HostToContainer
          - name: host-dev-char
            mountPath: /host-dev-char

@cdesiniotis
Copy link
Contributor

@easyrider14 is your pod requesting a GPU using resource requests/limits (e.g. requesting an nvidia.com/gpu resource)? This is the recommended way for requesting GPUs in Kubernetes and would solve this issue. The pod would not get scheduled on the newly added node until the GPU device-plugin is up and running (which only starts after both the NVIDIA driver and NVIDIA Container Toolkit are installed). If using resource requests/limits is not an option for you, then something along the lines of what @tariq1890 suggested would work.

@easyrider14
Copy link
Author

easyrider14 commented Nov 28, 2023

You can add consider adding the gpu-operator-validator as an init container to your daemonset. This way, the daemonset would block on the nvidia-driver-daemonset transitioning to the Ready/Running state

Hi @tariq1890

I've tried this after digging in gpu-operator manifest files, but still have the same result
I've made a simple test with an initContainer simply waiting for 10 minutes before exiting (a simple sleep 10m on an alpine image)
When the initContainer exists, my container is run but still fails with no access to NVML. I thought the container would be created after the initContainer finishes but it does not seem to be the case. The container is like created but not starte until the initContainer terminates. If I delete the pod, the container is recreated and restarted and has directly access to NVML

@cdesiniotis I don't need/want the resources to be reserved for this pod, as it is mainly keeping a state of available resources on the node in an ETCD database. This is no workload running continuously, just a regular update of available ram/cpu/gpu at regular intervals. I don't want to reserve and block resources for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants