Skip to content

Commit 63d98ea

Browse files
authored
Pin nvidia-container-toolkit to version 1.16.2 (#5852)
Yesterday's nvidia-container-toolkit v1.17.0 [release](https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.17.0) seems to have broken some of our domain images, causing `docker run --gpus all [image]" to fail with the error: ``` $ docker run --gpus all [IMAGE] docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: error parsing IMEX info: unsupported IMEX channel value: all: unknown. ERRO[0000] error waiting for container: context canceled ``` Pinning the toolkit to the previous version to mitigate the failure for now Testing: - Validated locally - TBD: Currently testing on a domain repo
1 parent 49fb39b commit 63d98ea

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

.github/actions/setup-nvidia/action.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ runs:
4040
fi
4141
4242
sudo yum-config-manager --add-repo "${YUM_REPO_URL}"
43-
sudo yum install -y nvidia-docker2
43+
sudo yum install -y nvidia-docker2 nvidia-container-toolkit-1.16.2
4444
sudo systemctl restart docker
4545
)
4646
}
@@ -51,7 +51,7 @@ runs:
5151
# Install nvidia-driver package if not installed
5252
status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)"
5353
if [ ! $? = 0 ] || [ ! "$status" = installed ]; then
54-
sudo apt-get install -y nvidia-docker2
54+
sudo apt-get install -y nvidia-docker2 nvidia-container-toolkit-1.16.2
5555
sudo systemctl restart docker
5656
fi
5757
)

0 commit comments

Comments
 (0)