-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing default IMEX info fails for legacy images #797
Comments
can anyone help me with this?: 2024-11-15T11:55:12Z create container gshaibi/gpu-burn:latest tested on NVIDIA Container Toolkit CLI version 1.17.1 |
Possible WAR. Set
Or for k8s Pod Spec, set:
|
We have just released v1.17.2 that should address this issue. Please let us know if the problem persists. |
Now i am on, but its another problem think. Nvidia-smi showing all gpus without problem. 2024-11-16T09:26:40.108376033Z Failed to initialize NVML: Unknown Error |
Fixed it by: Think version 1.17.2 fixed problem with imex channel. Many thx for quick fix!! |
@higi do you know why |
I think its just for nvml test, for ai tools. Nvml test doesnt work without settings this. This script fixed it wget https://raw.githubusercontent.com/jjziets/vasttools/main/nvml_fix.py. an,way im3x channel error was fixed by your fix |
Since the latest 1.17.x versions, containers with images considered "legacy" and that do not have the
NVIDIA_IMEX_CHANNELS
environment variable set fail to start with the following error:It seems the
NVIDIA_IMEX_CHANNELS
environment variable is defaulted toall
here for "legacy" images:nvidia-container-toolkit/internal/config/image/cuda_image.go
Line 145 in 1995925
Which cannot be parsed by https://github.com/NVIDIA/libnvidia-container/blob/63d366ee3b4183513c310ac557bf31b05b83328f/src/cli/common.c#L446.
An occurrence of that issue has been reported here for example: pytorch/test-infra#5852.
That case should ideally be more gracefully handled.
The text was updated successfully, but these errors were encountered: