Skip to content

Parsing default IMEX info fails for legacy images #797

Open
@astefanutti

Description

@astefanutti

Since the latest 1.17.x versions, containers with images considered "legacy" and that do not have the NVIDIA_IMEX_CHANNELS environment variable set fail to start with the following error:

Error: container create failed: time="2024-11-13T16:24:41Z" level=error msg="runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: error parsing IMEX info: unsupported IMEX channel value: all\n" 

It seems the NVIDIA_IMEX_CHANNELS environment variable is defaulted to all here for "legacy" images:

return NewVisibleDevices("all")

Which cannot be parsed by https://github.com/NVIDIA/libnvidia-container/blob/63d366ee3b4183513c310ac557bf31b05b83328f/src/cli/common.c#L446.

An occurrence of that issue has been reported here for example: pytorch/test-infra#5852.

That case should ideally be more gracefully handled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue/PR to expose/discuss/fix a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions