-
Notifications
You must be signed in to change notification settings - Fork 482
Open
Labels
bugIssue/PR to expose/discuss/fix a bugIssue/PR to expose/discuss/fix a buglifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.
Description
Since the latest 1.17.x versions, containers with images considered "legacy" and that do not have the NVIDIA_IMEX_CHANNELS environment variable set fail to start with the following error:
Error: container create failed: time="2024-11-13T16:24:41Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: error parsing IMEX info: unsupported IMEX channel value: all\n"
It seems the NVIDIA_IMEX_CHANNELS environment variable is defaulted to all here for "legacy" images:
| return NewVisibleDevices("all") |
Which cannot be parsed by https://github.com/NVIDIA/libnvidia-container/blob/63d366ee3b4183513c310ac557bf31b05b83328f/src/cli/common.c#L446.
An occurrence of that issue has been reported here for example: pytorch/test-infra#5852.
That case should ideally be more gracefully handled.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugIssue/PR to expose/discuss/fix a bugIssue/PR to expose/discuss/fix a buglifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.