Failed to get sandbox runtime: no runtime for nvidia is configured #432

Bec-k · 2022-11-02T16:49:06Z

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Quick Debug Checklist

Are you running on an Ubuntu 18.04 node?
Are you running Kubernetes v1.13+?
Are you running Docker (>= 18.06) or CRIO (>= 1.13+)?
Do you have i2c_core and ipmi_msghandler loaded on the nodes?
Did you apply the CRD (kubectl describe clusterpolicies --all-namespaces)

1. Issue or feature description

nov 02 18:00:58 beck containerd[10237]: time="2022-11-02T18:00:58.738797825+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:gpu-feature-discovery-qfjgk,Uid:02c7d4ad-db02-4145-846b-616a94416008,Namespace:gpu-operator,Attempt:2,} failed, error" error="failed to get sandbox runtime: no runtime for \"nvidia\" is configured"

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

(base) beck@beck:/$ ls -la /run/nvidia/
total 4
drwxr-xr-x  4 root root  100 nov  2 18:48 .
drwxr-xr-x 39 root root 1140 nov  2 18:47 ..
drwxr-xr-x  2 root root   40 nov  2 17:59 driver
-rw-r--r--  1 root root    7 nov  2 18:48 toolkit.pid
drwxr-xr-x  2 root root   80 nov  2 18:48 validations

Driver folder is empty:

(base) beck@beck:/$ ls -la /run/nvidia/driver/
total 0
drwxr-xr-x 2 root root 40 nov  2 17:59 .
drwxr-xr-x 4 root root 80 nov  2 18:48 ..

The text was updated successfully, but these errors were encountered:

Bec-k · 2022-11-02T16:50:29Z

(base) beck@beck:/$ sudo ctr run --rm -t \
    --runc-binary=/usr/bin/nvidia-container-runtime \
    --env NVIDIA_VISIBLE_DEVICES=all \
    docker.io/nvidia/cuda:11.0.3-base-ubuntu20.04 \
    cuda-11.0.3-base-ubuntu20.04 nvidia-smi
Wed Nov  2 16:50:04 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P0    46W /  N/A |    601MiB /  8192MiB |      9%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Bec-k · 2022-11-02T16:51:37Z

When i launch nvidia/cuda image via containerd cli, it is correctly detects and outputs my Nvidia GeForce video card, but for some reason, it doesn't see inside pods when deployed via helm.

shivamerla · 2022-11-02T16:53:34Z

Can you run kubectl get pods -n gpu-operator to show which pods are running. If you deployed with driver enabled, it takes 3-5 minutes for the drivers to be installed and followed by nvidia runtime setup. If you have already installed them on the host, please specify --set driver.enabled=false --set toolkit.enabled=false.

Bec-k · 2022-11-02T16:53:37Z

I was checking /etc/containerd/config.toml , it is changing it contantly back and forth.
containerd is always gets restarted by itself, because it fails to cleanup sandboxes and dead shims.

Bec-k · 2022-11-02T16:54:08Z

(base) beck@beck:/$ kubectl get pods -n gpu-operator
NAME                                                              READY   STATUS            RESTARTS      AGE
gpu-feature-discovery-w7vk6                                       1/1     Running           0             6m23s
gpu-operator-59b9d49c6f-7282l                                     1/1     Running           0             6m41s
nvidia-container-toolkit-daemonset-9rvz8                          1/1     Running           7 (67s ago)   5m56s
nvidia-cuda-validator-7mp9j                                       0/1     Init:0/1          0             4m4s
nvidia-dcgm-exporter-2ktzc                                        0/1     PodInitializing   0             6m24s
nvidia-device-plugin-daemonset-wvvh4                              0/1     PodInitializing   0             5m57s
nvidia-gpu-operator-node-feature-discovery-master-68495df8t9vd7   1/1     Running           0             6m41s
nvidia-gpu-operator-node-feature-discovery-worker-8gc88           1/1     Running           0             6m40s
nvidia-gpu-operator-node-feature-discovery-worker-stwpp           1/1     Running           9 (26s ago)   5m58s
nvidia-operator-validator-ptdgd                                   0/1     Init:Error        0             5m55s

shivamerla · 2022-11-02T16:55:59Z

you can disable toolkit as well by editing kubectl edit clusterpolicy and setting toolkit.enabled=false. Looks like you have nvidia-container-runtime already configured on the host and containerd config updated manually?

shivamerla · 2022-11-02T16:56:47Z

Can you also paste logs of nvidia-container-toolkit-daemonset-9rvz8 pod, curious as to why it is restarting. Which containerd and OS version is this?

Bec-k · 2022-11-02T17:03:04Z

Nope, didn't help. I have updated it, pod was removed and still complaining about:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured

Bec-k · 2022-11-02T17:03:30Z

I have removed all pods, to trigger everything from scratch.

Bec-k · 2022-11-02T17:03:53Z

(base) beck@beck:/$ kubectl get pods -n gpu-operator
NAME                                                              READY   STATUS     RESTARTS   AGE
gpu-feature-discovery-8b8ls                                       0/1     Init:0/1   0          115s
gpu-operator-59b9d49c6f-gkk4j                                     1/1     Running    0          2m20s
nvidia-dcgm-exporter-6bmlt                                        0/1     Init:0/1   0          115s
nvidia-device-plugin-daemonset-f7xgb                              0/1     Init:0/1   0          117s
nvidia-gpu-operator-node-feature-discovery-master-68495df8kscw7   1/1     Running    0          2m20s
nvidia-gpu-operator-node-feature-discovery-worker-pcxwq           1/1     Running    0          2m20s
nvidia-gpu-operator-node-feature-discovery-worker-s2jjn           1/1     Running    0          2m20s
nvidia-operator-validator-rwt6z                                   0/1     Init:0/4   0          117s

Bec-k · 2022-11-02T17:05:25Z

here are error from systemd containerd logs:
https://gist.github.com/denissabramovs/a77e97972b5aa01c86955d812d3e8188

Bec-k · 2022-11-02T17:07:33Z

Here is updated, latest one:
https://gist.github.com/denissabramovs/2272051bb2f684f623cd15273ea6dd25

Bec-k · 2022-11-02T17:08:16Z

at least, now containerd is not constantly restarting, it is already up for 9 minutes:

● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-11-02 18:57:44 EET; 9min ago

Bec-k · 2022-11-02T17:12:47Z

All 3 systemd services are up and running on GPU node:

(base) beck@beck:/$ sudo systemctl status --no-pager kubelet containerd docker | grep active
     Active: active (running) since Wed 2022-11-02 18:57:49 EET; 14min ago
     Active: active (running) since Wed 2022-11-02 18:57:44 EET; 14min ago
     Active: active (running) since Wed 2022-11-02 19:09:12 EET; 3min 11s ago

Bec-k · 2022-11-02T17:16:24Z

Sorry, missed your message. Here it is:

(base) beck@beck:/$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Bec-k · 2022-11-02T17:16:41Z

(base) beck@beck:/$ containerd --version
containerd containerd.io 1.6.9 1c90a442489720eec95342e1789ee8a5e1b9536f

wjentner · 2022-11-02T17:16:45Z

@denissabramovs this is a wild guess: are you using containerd 1.6.9? I believe we had problems with this version and the operator. We downgraded to containerd 1.6.8 and things started working again.

Bec-k · 2022-11-02T17:23:08Z

revision=9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 version=1.6.8:

nov 02 19:20:49 beck containerd[202761]: time="2022-11-02T19:20:49.723337417+02:00" level=info msg="starting containerd" revision=9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 version=1.6.8
...
...
...
nov 02 19:22:34 beck containerd[202761]: time="2022-11-02T19:22:34.246953180+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:nvidia-device-plugin-daemonset-lbnzw,Uid:fd4f1d3f-29d2-4d11-a724-96f4ed107cd5,Namespace:gpu-operator,Attempt:0,} failed, error" error="failed to get sandbox runtime: no runtime for \"nvidia\" is configured"

Killed/re-scheduled all pods in gpu-operator namespace after downgrading containerd.

Bec-k · 2022-11-02T17:30:16Z

Oh wow! @wjentner you actually were right, i have re-enabled above mentioned toolkit and after downgrade, it finished without problems and all pods are up and running now!

Bec-k · 2022-11-02T17:33:08Z

Good that i have captured both logs @shivamerla , adding those below.

These logs are from failing toolkit:
https://gist.github.com/denissabramovs/0c3ad150ea2b9450a91b430a91704d08

These from successful toolkit:
https://gist.github.com/denissabramovs/343c8fb0169866133fa1cc35b9d5365c

Hope this helps to find the problem and resolve it. It seems that they are different after all.

shivamerla · 2022-11-02T17:36:12Z

Thanks @denissabramovs will check these out and try to repro with 1.6.9 containerd version.

Bec-k · 2022-11-02T17:38:59Z

If you won't be able to reproduce, please ping me and i'll try to reproduce it locally again. Then we could catch that issue and possibly make some patch together.
In any case, thank you guys.

klueska · 2022-11-08T14:11:29Z

Issue diagnosed and workaround MR can be found here:
https://gitlab.com/nvidia/kubernetes/gpu-operator/-/merge_requests/568

as kind has upgraded its containerd version to 1.9 which triggered issues to gpu-operator (see issue NVIDIA/gpu-operator#432) so we sticked kind version with containerd 1.8 also fix gpu installation

wjentner · 2022-12-03T00:13:27Z

@klueska thanks! When will this be released? I assume it has been also tested with contained 1.6.10 which has been released recently?

cdesiniotis · 2022-12-14T02:12:29Z

Hi @denissabramovs @wjentner. We just released v22.9.1. This includes the workaround mentioned above for resolving the containerd issues. Please give it a try and let us know if there are any issues.

wjentner · 2022-12-16T17:35:50Z

Thanks @cdesiniotis, I can confirm that it works with containerd 1.6.12 as well.
Edit: 1.6.14 is also working.

tuxtof · 2023-01-09T14:39:07Z

Hi @cdesiniotis @klueska

it seems i have exactly the same issue with

OS: CentOS 7.9.2009
Kernel: 3.10.0-1160.76.1.el7.x86_64
Containerd: 1.6.9 & 1.6.14 (tested both)
Gpu Operator: v22.9.1

my nvidia-driver-daemonset is looping
module build seems OK, i see them appear in lsmod
but after few second they disappear and everything restart

it failed after

nvidia-driver-ctr Post-install sanity check passed.
nvidia-driver-ctr
nvidia-driver-ctr Installation of the kernel module for the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 525.60.13) is now complete.
nvidia-driver-ctr
nvidia-driver-ctr Parsing kernel module parameters...
nvidia-driver-ctr Loading ipmi and i2c_core kernel modules...
nvidia-driver-ctr Loading NVIDIA driver kernel modules...
nvidia-driver-ctr + modprobe nvidia
nvidia-driver-ctr + modprobe nvidia-uvm
nvidia-driver-ctr + modprobe nvidia-modeset
nvidia-driver-ctr + set +o xtrace -o nounset
nvidia-driver-ctr Starting NVIDIA persistence daemon...
nvidia-driver-ctr ls: cannot access /proc/driver/nvidia-nvswitch/devices/*: No such file or directory
nvidia-driver-ctr Mounting NVIDIA driver rootfs...
nvidia-driver-ctr Done, now waiting for signal
nvidia-driver-ctr Caught signal
nvidia-driver-ctr Stopping NVIDIA persistence daemon...
nvidia-driver-ctr Unloading NVIDIA driver kernel modules...
nvidia-driver-ctr Unmounting NVIDIA driver rootfs...

if i downgrade containerd to 1.6.8 everything is fixed

xhejtman · 2023-01-10T01:00:54Z

There is another issue with containerd:
containerd/containerd#7843

if containerd is restared (version 1.6.9 and above), most pods are restarted, so together with nvidia container toolkit pod they end in endless restarting loop as toolkit tries to restart containerd which restarts the toolkit and driver and everything loops again. There is a fix for containerd, but it may not land yet everywhere.

@tuxtof, I think you are hitting exactly this issue.

shivamerla · 2023-01-10T05:07:47Z

thanks @xhejtman for linking the relevant issue.

tuxtof · 2023-01-10T06:58:44Z

thanks @xhejtman

so what is the situation ? , GPU operator is no more working with containerd version 1.6.9 and above

danlenar · 2023-01-10T20:36:35Z

I am no longer experiencing the issue once upgrading to containerd 1.6.15.

Containerd 1.6.15 contains the fix to
containerd/containerd#7843

tuxtof · 2023-01-11T14:05:19Z

Ok i confirm the freshly released docker RPM containerd 1.6.15 fix the issue on my side too

Nice

msherm2 · 2023-10-12T14:45:23Z

I am currently having this issue with RHEL 8.8, rke2, containerd 1.6.24.

gpu-operator gpu-operator-6f97b7b47c-vzfnm 1/1 Running 0 19h
gpu-operator gpu-operator-node-feature-discovery-master-77984d5f58-zd88s 1/1 Running 0 19h
gpu-operator gpu-operator-node-feature-discovery-worker-4k9mt 1/1 Running 0 19h
gpu-operator gpu-operator-node-feature-discovery-worker-kch9g 1/1 Running 0 19h
gpu-operator gpu-operator-node-feature-discovery-worker-mwqpc 1/1 Running 0 19h
gpu-operator nvidia-container-toolkit-daemonset-79p9v 1/1 Running 0 17h
gpu-operator nvidia-container-toolkit-daemonset-hnrwf 1/1 Running 0 17h
gpu-operator nvidia-container-toolkit-daemonset-vnddf 1/1 Running 0 17h
gpu-operator nvidia-dcgm-exporter-khlll 0/1 Init:0/1 0 19h
gpu-operator nvidia-dcgm-exporter-qmd4t 0/1 Init:0/1 0 19h
gpu-operator nvidia-dcgm-exporter-ts5hs 0/1 Init:0/1 0 19h
gpu-operator nvidia-device-plugin-daemonset-6zw7m 0/1 Init:0/1 0 19h
gpu-operator nvidia-device-plugin-daemonset-btgtz 0/1 Init:0/1 0 19h
gpu-operator nvidia-device-plugin-daemonset-n4lhk 0/1 Init:0/1 0 19h
gpu-operator nvidia-operator-validator-bqrsb 0/1 Init:0/4 0 19h
gpu-operator nvidia-operator-validator-fnn5w 0/1 Init:0/4 0 19h
gpu-operator nvidia-operator-validator-g95nc 0/1 Init:0/4 0 19h

The following seems to function properly as long as runtime is default or set to runc, but if the runtime is set to nvidia, there is an error:

nerdctl run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

FATA[0000] failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/default/f3c271c63533254369a34950e71f085e5119af387e5a85793203057ac0c7f5d4/log.json: no such file or directory): exec: "nvidia": executable file not found in $PATH: unknown

nerdctl run --rm --gpus all ubuntu nvidia-smi

Thu Oct 12 14:43:22 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 25C P8 9W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

shivamerla · 2023-10-12T17:05:46Z

@msherm2 did you configure the container-toolkit correctly for RKE2 as documented here?

toolkit:
   env:
   - name: CONTAINERD_CONFIG
     value: /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl
   - name: CONTAINERD_SOCKET
     value: /run/k3s/containerd/containerd.sock
   - name: CONTAINERD_RUNTIME_CLASS
     value: nvidia
   - name: CONTAINERD_SET_AS_DEFAULT
     value: "true"

msherm2 · 2023-10-12T17:53:37Z

@shivamerla yes this is my helm chart configuration:

Note: I have tested both files for CONTAINERD_CONFIG,
/etc/containerd/config.toml as well as /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl

helm install gpu-operator -n gpu-operator
nvidia/gpu-operator $HELM_OPTIONS
--set driver.enabled=false
--set gfd.enabled=false
--set operator.defaultRuntime="containerd"
--set toolkit.enabled=true
--set toolkit.version=v1.14.2-ubi8
--set toolkit.env[0].name=CONTAINERD_CONFIG
--set toolkit.env[0].value=/etc/containerd/config.toml
--set toolkit.env[1].name=CONTAINERD_SOCKET
--set toolkit.env[1].value=/run/k3s/containerd/containerd.sock
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS
--set toolkit.env[2].value=nvidia
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT
--set-string toolkit.env[3].value="true"

msherm2 · 2023-10-13T13:49:03Z

Update: I followed the instructions here to install containerd using this method, and I believe the critical part is enabling systemd cgroup. Since doing this, I am able to schedule the pods and workloads.

hotspoons · 2023-10-23T03:20:46Z

Update: I followed the instructions here to install containerd using this method, and I believe the critical part is enabling systemd cgroup. Since doing this, I am able to schedule the pods and workloads.

Totally unrelated to the gpu operator, but this fixed my problem with getting the spin wasm shim working on a Rocky 8 cluster. Many thanks!

SaadKaleem · 2024-01-09T00:15:21Z

Update: I followed the instructions here to install containerd using this method, and I believe the critical part is enabling systemd cgroup. Since doing this, I am able to schedule the pods and workloads.

Thanks! with sudo privileges, I generated the configuration via containerd config default and also modified it to add NVIDIA run-times (below). GPU Feature Discovery alongside the NVIDIA device plugin as DaemonSets seems to be working fine now on a cluster managed via kubeadm.

I'm not using the GPU-operator, since I already have the drivers and container toolkit installed on the host machine.

Will keep monitoring for any intermittent pod sandbox crashes though.

Versions:
containerd.io 1.6.25
nvdp/nvidia-device-plugin 0.14.3 (via Helm with runtimeClassName as "nvidia", with gfd enabled)

[/etc/containerd/config.toml]:

    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runc"
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
      ignore_rdt_not_enabled_errors = false
      no_pivot = false
      snapshotter = "overlayfs"

      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]

     [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""
          privileged_without_host_devices = false
            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
              BinaryName = "/usr/bin/nvidia-container-runtime"
              SystemdCgroup = true

chokosabe · 2024-08-03T06:26:37Z

@shivamerla yes this is my helm chart configuration:

Note: I have tested both files for CONTAINERD_CONFIG, /etc/containerd/config.toml as well as /var/lib/rancher/rke2/agent/etc/containerd/config.toml.tmpl

helm install gpu-operator -n gpu-operator
nvidia/gpu-operator $HELM_OPTIONS
--set driver.enabled=false
--set gfd.enabled=false
--set operator.defaultRuntime="containerd"
--set toolkit.enabled=true
--set toolkit.version=v1.14.2-ubi8
--set toolkit.env[0].name=CONTAINERD_CONFIG
--set toolkit.env[0].value=/etc/containerd/config.toml
--set toolkit.env[1].name=CONTAINERD_SOCKET
--set toolkit.env[1].value=/run/k3s/containerd/containerd.sock
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS
--set toolkit.env[2].value=nvidia
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT
--set-string toolkit.env[3].value="true"

@msherm2 which containerd process did you end up using?
From your answer, it isn't too clear.

The rke2 containerd or the node based containerd?

Thanks.

benlsheets mentioned this issue Dec 2, 2022

Possible incompatibility with cpumanager, memorymanager, or topologymanager. #455

Open

klueska mentioned this issue Dec 3, 2022

Failure to deploy on new containerd releases NVIDIA/nvidia-container-toolkit#39

Closed

shivamerla mentioned this issue Mar 23, 2023

Failing to install nvidia drivers on a new GPU node on a fresh LTS Ubuntu 22.04 #504

Open

hotspoons mentioned this issue Oct 23, 2023

failed to get sandbox runtime: no runtime for "spin" is configured (vanilla Kubernetes) deislabs/containerd-wasm-shims#165

Closed

chokosabe mentioned this issue Aug 3, 2024

Deployment Issue: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "spin" is configured spinkube/spin-operator#289

Closed

Failed to get sandbox runtime: no runtime for nvidia is configured #432

Failed to get sandbox runtime: no runtime for nvidia is configured #432

Comments

Bec-k commented Nov 2, 2022 • edited Loading

1. Quick Debug Checklist

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

Bec-k commented Nov 2, 2022

Bec-k commented Nov 2, 2022 • edited Loading

shivamerla commented Nov 2, 2022

Bec-k commented Nov 2, 2022

Bec-k commented Nov 2, 2022

shivamerla commented Nov 2, 2022

shivamerla commented Nov 2, 2022

Bec-k commented Nov 2, 2022

Bec-k commented Nov 2, 2022

Bec-k commented Nov 2, 2022 • edited Loading

Bec-k commented Nov 2, 2022

Bec-k commented Nov 2, 2022

Bec-k commented Nov 2, 2022 • edited Loading

Bec-k commented Nov 2, 2022 • edited Loading

Bec-k commented Nov 2, 2022

Bec-k commented Nov 2, 2022

wjentner commented Nov 2, 2022

Bec-k commented Nov 2, 2022 • edited Loading

Bec-k commented Nov 2, 2022

Bec-k commented Nov 2, 2022

shivamerla commented Nov 2, 2022

Bec-k commented Nov 2, 2022

klueska commented Nov 8, 2022

wjentner commented Dec 3, 2022

cdesiniotis commented Dec 14, 2022

wjentner commented Dec 16, 2022 • edited Loading

tuxtof commented Jan 9, 2023 • edited Loading

xhejtman commented Jan 10, 2023 • edited Loading

shivamerla commented Jan 10, 2023

tuxtof commented Jan 10, 2023

danlenar commented Jan 10, 2023

tuxtof commented Jan 11, 2023

msherm2 commented Oct 12, 2023 • edited Loading

shivamerla commented Oct 12, 2023

msherm2 commented Oct 12, 2023

msherm2 commented Oct 13, 2023

hotspoons commented Oct 23, 2023

SaadKaleem commented Jan 9, 2024 • edited Loading

chokosabe commented Aug 3, 2024

Bec-k commented Nov 2, 2022 •

edited

Loading

Bec-k commented Nov 2, 2022 •

edited

Loading

Bec-k commented Nov 2, 2022 •

edited

Loading

Bec-k commented Nov 2, 2022 •

edited

Loading

Bec-k commented Nov 2, 2022 •

edited

Loading

Bec-k commented Nov 2, 2022 •

edited

Loading

wjentner commented Dec 16, 2022 •

edited

Loading

tuxtof commented Jan 9, 2023 •

edited

Loading

xhejtman commented Jan 10, 2023 •

edited

Loading

msherm2 commented Oct 12, 2023 •

edited

Loading

SaadKaleem commented Jan 9, 2024 •

edited

Loading