Skip to content

Parsing default IMEX info fails for legacy images #797

@astefanutti

Description

@astefanutti

Since the latest 1.17.x versions, containers with images considered "legacy" and that do not have the NVIDIA_IMEX_CHANNELS environment variable set fail to start with the following error:

Error: container create failed: time="2024-11-13T16:24:41Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: error parsing IMEX info: unsupported IMEX channel value: all\n" 

It seems the NVIDIA_IMEX_CHANNELS environment variable is defaulted to all here for "legacy" images:

return NewVisibleDevices("all")

Which cannot be parsed by https://github.com/NVIDIA/libnvidia-container/blob/63d366ee3b4183513c310ac557bf31b05b83328f/src/cli/common.c#L446.

An occurrence of that issue has been reported here for example: pytorch/test-infra#5852.

That case should ideally be more gracefully handled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue/PR to expose/discuss/fix a buglifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions