-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build cuda:12.4.0-cudnn8-devel-ubuntu22.04 docker image and host it in pytorch AWS #1811
Comments
Alternatively you could just use the existing DockerHub image with cudnn9? Or is that not valid to build/support? I wasn't aware of existing issues when I saw the CI failure for a PR I'm involved in, but looked into it here: pytorch/pytorch#125632 (comment) A quick fix is to just have the matrix for docker generate a versionless Otherwise, won't you need to build (or republish) all the nvidia images being used from DockerHub? The CI is failing specifically because it's trying to pull an invalid tag for
So you need to avoid building that in the docker matrix, and separately build/publish your AWS image, or as I've suggested just add the logic to select the appropriate |
Fixes #125094 Please note: Docker CUDa 12.4 failure is existing issue, related to docker image not being available on gitlab: ``` docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: not found ``` https://github.com/pytorch/pytorch/actions/runs/8974959068/job/24648540236?pr=125617 Here is the reference issue: https://gitlab.com/nvidia/container-images/cuda/-/issues/225 Tracked on our side: pytorch/builder#1811 Pull Request resolved: #125617 Approved by: https://github.com/huydhn, https://github.com/malfet
For cuda 12.2 with cudnn Dockerfile: For cuda 12.4 with no cudnn: |
For cuda 12.4.1 with cudnn (9):
|
Fixes #125526 [#1811](pytorch/builder#1811) Adopt syntax=docker/dockerfile:1 whcih has been stable since 2018, while still best practice to declare in 2024. - Syntax features dependent upon the [syntax directive version are documented here](https://hub.docker.com/r/docker/dockerfile). - While you can set a fixed minor version, [Docker officially advises to only pin the major version] ``` (https://docs.docker.com/build/dockerfile/frontend/#stable-channel): We recommend using docker/dockerfile:1, which always points to the latest stable release of the version 1 syntax, and receives both "minor" and "patch" updates for the version 1 release cycle. BuildKit automatically checks for updates of the syntax when performing a build, making sure you are using the most current version. ``` **Support for building with Docker prior to v23 (released on Feb 2023)** NOTE: 18.06 may not be the accurate minimum version for using docker/dockerfile:1, according to the [DockerHub tag history](https://hub.docker.com/layers/docker/dockerfile/1.0/images/sha256-92f5351b2fca8f7e2f452aa9aec1c34213cdd2702ca92414eee6466fab21814a?context=explore) 1.0 of the syntax seems to be from Dec 2018, which is probably why docker/dockerfile:experimental was paired with it in this file. Personally, I'd favor only supporting builds with Docker v23. This is only relevant for someone building this Dockerfile locally, the user could still extend the already built and published image from a registry on older versions of Docker without any concern for this directive which only applies to building this Dockerfile, not images that extend it. However if you're reluctant, you may want to refer others to [this Docker docs page](https://docs.docker.com/build/buildkit/#getting-started) where they should only need the ENV DOCKER_BUILDKIT=1, presumably the requirement for experimental was dropped with syntax=docker/dockerfile:1 with releases of Docker since Dec 2018. Affected users can often quite easily install a newer version of Docker on their OS, as per Dockers official guidance (usually via including an additional repo to the package manager). **Reference links** Since one of these was already included in the inline note (now a broken link), I've included relevant links mentioned above. You could alternatively rely on git blame with a commit message referencing the links or this PR for more information. Feel free to remove any of the reference links, they're mostly only relevant to maintainers to be aware of (which this PR itself has detailed adequately above). Pull Request resolved: #125632 Approved by: https://github.com/malfet
Fixes #125094 Please note: Docker CUDa 12.4 failure is existing issue, related to docker image not being available on gitlab: ``` docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: not found ``` https://github.com/pytorch/pytorch/actions/runs/8974959068/job/24648540236?pr=125617 Here is the reference issue: https://gitlab.com/nvidia/container-images/cuda/-/issues/225 Tracked on our side: pytorch/builder#1811 Pull Request resolved: #125617 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit b29d77b)
Separate arm64 and amd64 docker builds (#125617) Fixes #125094 Please note: Docker CUDa 12.4 failure is existing issue, related to docker image not being available on gitlab: ``` docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: docker.io/nvidia/cuda:12.4.0-cudnn8-devel-ubuntu22.04: not found ``` https://github.com/pytorch/pytorch/actions/runs/8974959068/job/24648540236?pr=125617 Here is the reference issue: https://gitlab.com/nvidia/container-images/cuda/-/issues/225 Tracked on our side: pytorch/builder#1811 Pull Request resolved: #125617 Approved by: https://github.com/huydhn, https://github.com/malfet (cherry picked from commit b29d77b) Co-authored-by: atalman <[email protected]>
Build Nvidia docker image: cuda:12.4.0-cudnn8-devel-ubuntu22.04
See reference issue here:
https://gitlab.com/nvidia/container-images/cuda/-/issues/225
Upload to pytorch aws so this workflow can be fixed:
pytorch/pytorch/actions/runs/8974959068/job/24648540236?pr=125617
The text was updated successfully, but these errors were encountered: