Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/gpu.md: docker.io/nvidia/cuda:9.0-base: not found #2755

Closed
AkihiroSuda opened this issue Jan 16, 2024 · 2 comments · Fixed by #2756 · May be fixed by #3441
Closed

docs/gpu.md: docker.io/nvidia/cuda:9.0-base: not found #2755

AkihiroSuda opened this issue Jan 16, 2024 · 2 comments · Fixed by #2756 · May be fixed by #3441
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@AkihiroSuda
Copy link
Member

https://github.com/containerd/nerdctl/blob/v1.7.2/docs/gpu.md#options-for-nerdctl-run---gpus

nvidia/cuda:9.0-base image no longer seems to exist:

$ nerdctl run -it --rm --gpus all nvidia/cuda:9.0-base nvidia-smi
docker.io/nvidia/cuda:9.0-base: resolving      |--------------------------------------| 
elapsed: 1.1 s                  total:   0.0 B (0.0 B/s)                                         
INFO[0001] trying next host - response was http.StatusNotFound  host=registry-1.docker.io
FATA[0001] failed to resolve reference "docker.io/nvidia/cuda:9.0-base": docker.io/nvidia/cuda:9.0-base: not found

The plain ubuntu image still works though

$ nerdctl run -it --rm --gpus all ubuntu nvidia-smi
Tue Jan 16 07:27:01 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   24C    P8               8W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

cc @ktock

@AkihiroSuda AkihiroSuda added the documentation Improvements or additions to documentation label Jan 16, 2024
@AkihiroSuda AkihiroSuda added this to the v2.0.0 milestone Jan 16, 2024
@yankay
Copy link
Contributor

yankay commented Jan 16, 2024

The Nvidia CUDA image has been updated at https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags.
The image nvidia/cuda:9.0-base can be updated to nvidia/cuda:12.3.1-base-ubuntu20.04 like https://docs.docker.com/compose/gpu-support/#example-of-a-compose-file-for-running-a-service-with-access-to-1-gpu-device.


Additional Information: In my environment, Nvidia Driver is installed with https://github.com/NVIDIA/gpu-operator. The --runtime must be configured , like https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html. So this command can work successfully:

nerdctl run -it --rm --gpus all --runtime=/usr/local/nvidia/toolkit/nvidia-container-runtime docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04  nvidia-smi

Tue Jan 16 10:16:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                      On  | 00000000:03:00.0 Off |                    0 |
| N/A   25C    P8               9W / 250W |      0MiB / 23040MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

But the nerdctl run -it --rm --gpus all docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi will fail with error message:

FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown

@jxfruit
Copy link

jxfruit commented Jul 9, 2024

same issue.
it works with docker.
why closed? so what is the solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
3 participants