Open
Description
Since the latest 1.17.x versions, containers with images considered "legacy" and that do not have the NVIDIA_IMEX_CHANNELS
environment variable set fail to start with the following error:
Error: container create failed: time="2024-11-13T16:24:41Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: error parsing IMEX info: unsupported IMEX channel value: all\n"
It seems the NVIDIA_IMEX_CHANNELS
environment variable is defaulted to all
here for "legacy" images:
Which cannot be parsed by https://github.com/NVIDIA/libnvidia-container/blob/63d366ee3b4183513c310ac557bf31b05b83328f/src/cli/common.c#L446.
An occurrence of that issue has been reported here for example: pytorch/test-infra#5852.
That case should ideally be more gracefully handled.