Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudatoolkit: reintroduce version 11.7.0 to master #185557

Merged
merged 4 commits into from
Oct 5, 2022

Conversation

dguibert
Copy link
Member

@dguibert dguibert commented Aug 7, 2022

Description of changes

This patch fixes #179912 which has been reverted on master due to https://gist.github.com/GrahamcOfBorg/45ac7f5bc9e02a74cb1e4264f365417f

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resubmitting this @dguibert and apologies that this has been more involved than expected!

I think these changes look good, but I'd actually like us to go a bit further and de-couple TensorRT and cudatoolkit evaluation! It's totally fine for an upgrade to one of the packages to end up marking other derivations as broken, but breaking evaluation is a huge problem. Having to constantly keep the two in sync is an unnecessary burden on both packages and really came back to bite us in #179912.

Could we refactor pkgs/development/libraries/science/math/tensorrt/extension.nix a bit such that upgrading cudatoolkit can be done without breaking the evaluation of tensorrt?

@aidalgol
Copy link
Contributor

Could we refactor pkgs/development/libraries/science/math/tensorrt/extension.nix a bit such that upgrading cudatoolkit can be done without breaking the evaluation of tensorrt?

@samuela Sorry, I only just saw this PR. Since I introduced TensorRT to nixpkgs, I feel I should take this on. Could you elaborate a bit on what exactly you're proposing?

@samuela
Copy link
Member

samuela commented Sep 26, 2022

Hi @aidalgol, I think you nailed it in #192958. IIRC the issue was that TensorRT's evaluation would break if cudaVersion was upgraded without editing TensorRT's derivation. But it looks like #192958 should fix that!

@samuela
Copy link
Member

samuela commented Sep 26, 2022

@dguibert If you rebase off of #192958, then I think this should be good to go. Note that adding 11.7 to the supported versions of cudnn and TensorRT is still necessary in order to prevent those packages from being marked meta.broken but we shouldn't have any evaluation issues from here on out (thanks @aidalgol!!)

@SuperSandro2000 SuperSandro2000 added the 2.status: merge conflict This PR has merge conflicts with the target branch label Sep 26, 2022
@dguibert dguibert force-pushed the dg/cudatoolkit_11_7_0 branch from 1a9b818 to 3cb66cf Compare October 4, 2022 15:21
@dguibert
Copy link
Member Author

dguibert commented Oct 4, 2022

Hi @samuela and @SuperSandro2000,
I took time to rebase this PR and apply your suggestions ;-)

@ofborg ofborg bot added 8.has: clean-up and removed 2.status: merge conflict This PR has merge conflicts with the target branch labels Oct 4, 2022
Copy link
Member

@samuela samuela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think this looks good overall. Running nixpkgs-review now...

@samuela
Copy link
Member

samuela commented Oct 5, 2022

EDIT: this run is a little wonky since NIXPKGS_ALLOW_BROKEN was not set, running a new nixpkgs-review now.

Result of nixpkgs-review pr 185557 run on x86_64-linux 1

4 packages marked as broken and skipped:
  • cudaPackages.nvidia_driver
  • python310Packages.cupy
  • python39Packages.cupy
  • truecrack-cuda
9 packages failed to build:
  • cudaPackages.tensorrt (cudaPackages.tensorrt_8_4_0)
  • ethminer (ethminer-cuda)
  • mathematica-cuda
  • python310Packages.numbaWithCuda
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.tensorrt
  • python39Packages.numbaWithCuda
  • python39Packages.pyrealsense2WithCuda
  • python39Packages.tensorrt
59 packages built:
  • colmapWithCuda
  • cudaPackages.cuda_cccl
  • cudaPackages.cuda_cudart
  • cudaPackages.cuda_cuobjdump
  • cudaPackages.cuda_cupti
  • cudaPackages.cuda_cuxxfilt
  • cudaPackages.cuda_demo_suite
  • cudaPackages.cuda_documentation
  • cudaPackages.cuda_gdb
  • cudaPackages.cuda_memcheck
  • cudaPackages.cuda_nsight
  • cudaPackages.cuda_nvcc
  • cudaPackages.cuda_nvdisasm
  • cudaPackages.cuda_nvml_dev
  • cudaPackages.cuda_nvprof
  • cudaPackages.cuda_nvprune
  • cudaPackages.cuda_nvrtc
  • cudaPackages.cuda_nvtx
  • cudaPackages.cuda_nvvp
  • cudaPackages.cuda_sanitizer_api
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cudnn (cudaPackages.cudnn_8_4_0)
  • cudaPackages.cudnn_8_3_2
  • cudaPackages.cutensor
  • cudaPackages.fabricmanager
  • cudaPackages.libcublas
  • cudaPackages.libcufft
  • cudaPackages.libcufile
  • cudaPackages.libcurand
  • cudaPackages.libcusolver
  • cudaPackages.libcusparse
  • cudaPackages.libnpp
  • cudaPackages.libnvidia_nscq
  • cudaPackages.libnvjpeg
  • cudaPackages.nccl
  • cudaPackages.nsight_compute
  • cudaPackages.nsight_systems
  • cudaPackages.nvidia_fs
  • forge
  • gpu-burn
  • gromacsCudaMpi
  • gwe
  • katagoWithCuda
  • librealsenseWithCuda
  • magma
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.tensorflowWithCuda
  • python310Packages.torchWithCuda
  • python39Packages.TheanoWithCuda
  • python39Packages.pycuda
  • python39Packages.pynvml
  • python39Packages.tensorflowWithCuda
  • python39Packages.torchWithCuda
  • xgboostWithCuda
  • xpraWithNvenc

@samuela
Copy link
Member

samuela commented Oct 5, 2022

Result of nixpkgs-review pr 185557 run on x86_64-linux 1

4 packages marked as broken and skipped:
  • cudaPackages.nvidia_driver
  • python310Packages.cupy
  • python39Packages.cupy
  • truecrack-cuda
9 packages failed to build:
  • cudaPackages.tensorrt (cudaPackages.tensorrt_8_4_0)
  • ethminer (ethminer-cuda)
  • mathematica-cuda
  • python310Packages.numbaWithCuda
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.tensorrt
  • python39Packages.numbaWithCuda
  • python39Packages.pyrealsense2WithCuda
  • python39Packages.tensorrt
59 packages built:
  • colmapWithCuda
  • cudaPackages.cuda_cccl
  • cudaPackages.cuda_cudart
  • cudaPackages.cuda_cuobjdump
  • cudaPackages.cuda_cupti
  • cudaPackages.cuda_cuxxfilt
  • cudaPackages.cuda_demo_suite
  • cudaPackages.cuda_documentation
  • cudaPackages.cuda_gdb
  • cudaPackages.cuda_memcheck
  • cudaPackages.cuda_nsight
  • cudaPackages.cuda_nvcc
  • cudaPackages.cuda_nvdisasm
  • cudaPackages.cuda_nvml_dev
  • cudaPackages.cuda_nvprof
  • cudaPackages.cuda_nvprune
  • cudaPackages.cuda_nvrtc
  • cudaPackages.cuda_nvtx
  • cudaPackages.cuda_nvvp
  • cudaPackages.cuda_sanitizer_api
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cudnn (cudaPackages.cudnn_8_4_0)
  • cudaPackages.cudnn_8_3_2
  • cudaPackages.cutensor
  • cudaPackages.fabricmanager
  • cudaPackages.libcublas
  • cudaPackages.libcufft
  • cudaPackages.libcufile
  • cudaPackages.libcurand
  • cudaPackages.libcusolver
  • cudaPackages.libcusparse
  • cudaPackages.libnpp
  • cudaPackages.libnvidia_nscq
  • cudaPackages.libnvjpeg
  • cudaPackages.nccl
  • cudaPackages.nsight_compute
  • cudaPackages.nsight_systems
  • cudaPackages.nvidia_fs
  • forge
  • gpu-burn
  • gromacsCudaMpi
  • gwe
  • katagoWithCuda
  • librealsenseWithCuda
  • magma
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.tensorflowWithCuda
  • python310Packages.torchWithCuda
  • python39Packages.TheanoWithCuda
  • python39Packages.pycuda
  • python39Packages.pynvml
  • python39Packages.tensorflowWithCuda
  • python39Packages.torchWithCuda
  • xgboostWithCuda
  • xpraWithNvenc

1 similar comment
@samuela

This comment was marked as duplicate.

@samuela
Copy link
Member

samuela commented Oct 5, 2022

Here are the failure logs:

error: builder for '/nix/store/ybsmhdrmjy8xarlvf53kzlj4bbfw7b20-Mathematica_13.1.0_BNDL_LINUX.sh.drv' failed with exit code 1;
       last 8 log lines:
       >
       > ***
       > This nix expression requires that Mathematica_13.1.0_BNDL_LINUX.sh is
       > already part of the store. Find the file on your Mathematica CD
       > and add it to the nix store with nix-store --add-fixed sha256 <FILE>.
       >
       > ***
       >
       For full logs, run 'nix log /nix/store/ybsmhdrmjy8xarlvf53kzlj4bbfw7b20-Mathematica_13.1.0_BNDL_LINUX.sh.drv'.
error: 1 dependencies of derivation '/nix/store/nas8dqxwnya0h3bc5ddnygbzhi1ydjr6-mathematica-cuda-13.1.0.drv' failed to build
error: builder for '/nix/store/yg5akr0m7llg7d4ngjwd9i96jw6sz1qw-TensorRT-8.4.0.6.Linux.x86_64-gnu.cuda-11.6.cudnn8.3.tar.gz.drv' failed with exit code 1;
       last 10 log lines:
       > download the 8.4.0.6 Linux x86_64 TAR package for CUDA 11.7 from
       > https://developer.nvidia.com/tensorrt.
       >
       > Once you have downloaded the file, add it to the store with the following
       > command, and try building this derivation again.
       >
       > $ nix-store --add-fixed sha256 TensorRT-8.4.0.6.Linux.x86_64-gnu.cuda-11.6.cudnn8.3.tar.gz
       >
       > ***
       >
       For full logs, run 'nix log /nix/store/yg5akr0m7llg7d4ngjwd9i96jw6sz1qw-TensorRT-8.4.0.6.Linux.x86_64-gnu.cuda-11.6.cudnn8.3.tar.gz.drv'.
error: 1 dependencies of derivation '/nix/store/9lqc02hrkn62fv9nsd6iw81kdyy55qmj-cudatoolkit-11.7-tensorrt-8.4.0.6.drv' failed to build
error: 2 dependencies of derivation '/nix/store/1qlmd5zrk7l2bm77r0p36qg76mnygcr2-python3.10-tensorrt-8.4.0.6.drv' failed to build
error: 2 dependencies of derivation '/nix/store/7lim89bfbd19k8rbgjcpkkkdh4cysh1r-python3.9-tensorrt-8.4.0.6.drv' failed to build
error: builder for '/nix/store/052hqk4l07mmx9vzyh0zx6ppmgr99iw0-python3.10-numba-0.56.2.drv' failed with exit code 1;
       last 10 log lines:
       > unpacking source archive /nix/store/s69sx2va20j7casl68dsf0b2xjjv0ic2-numba-0.56.2.tar.gz
       > source root is numba-0.56.2
       > setting SOURCE_DATE_EPOCH to timestamp 1662090503 of file numba-0.56.2/numba/_version.py
       > patching sources
       > applying patch /nix/store/hmp6p98pg3gkjcr9nvb5a825dhcch41n-cuda_path.patch
       > patching file numba/cuda/cuda_paths.py
       > Hunk #2 FAILED at 32.
       > Hunk #3 succeeded at 61 with fuzz 2 (offset -10 lines).
       > Hunk #4 succeeded at 77 (offset -10 lines).
       > 1 out of 4 hunks FAILED -- saving rejects to file numba/cuda/cuda_paths.py.rej
       For full logs, run 'nix log /nix/store/052hqk4l07mmx9vzyh0zx6ppmgr99iw0-python3.10-numba-0.56.2.drv'.
error: builder for '/nix/store/h9q30xk65f89k4lnw68gdm3zffkka41g-python3.9-numba-0.56.2.drv' failed with exit code 1;
       last 10 log lines:
       > unpacking source archive /nix/store/s69sx2va20j7casl68dsf0b2xjjv0ic2-numba-0.56.2.tar.gz
       > source root is numba-0.56.2
       > setting SOURCE_DATE_EPOCH to timestamp 1662090503 of file numba-0.56.2/numba/_version.py
       > patching sources
       > applying patch /nix/store/hmp6p98pg3gkjcr9nvb5a825dhcch41n-cuda_path.patch
       > patching file numba/cuda/cuda_paths.py
       > Hunk #2 FAILED at 32.
       > Hunk #3 succeeded at 61 with fuzz 2 (offset -10 lines).
       > Hunk #4 succeeded at 77 (offset -10 lines).
       > 1 out of 4 hunks FAILED -- saving rejects to file numba/cuda/cuda_paths.py.rej
       For full logs, run 'nix log /nix/store/h9q30xk65f89k4lnw68gdm3zffkka41g-python3.9-numba-0.56.2.drv'.
error: builder for '/nix/store/ylafyjflmppbnxq5zjypqhxpdyg39ciz-ethminer-0.19.0.drv' failed with exit code 2;
       last 10 log lines:
       > /nix/store/5i10bszy2380gn0z1a6d4gf4pyhg51ps-cli11-2.2.0/include/CLI/App.hpp:594:35: note:   no known conversion for argument 2 from 'unsigned int' to 'CLI::callback_t' {aka 'std::function<bool(const std::vector<std::__cxx11::basic_string<char> >&)>'}
       >   594 |                        callback_t option_callback,
       >       |                        ~~~~~~~~~~~^~~~~~~~~~~~~~~
       > /nix/store/5i10bszy2380gn0z1a6d4gf4pyhg51ps-cli11-2.2.0/include/CLI/App.hpp:701:13: note: candidate: 'CLI::Option* CLI::App::add_option(std::string)'
       >   701 |     Option *add_option(std::string option_name) {
       >       |             ^~~~~~~~~~
       > /nix/store/5i10bszy2380gn0z1a6d4gf4pyhg51ps-cli11-2.2.0/include/CLI/App.hpp:701:13: note:   candidate expects 1 argument, 4 provided
       > make[2]: *** [ethminer/CMakeFiles/ethminer.dir/build.make:76: ethminer/CMakeFiles/ethminer.dir/main.cpp.o] Error 1
       > make[1]: *** [CMakeFiles/Makefile2:516: ethminer/CMakeFiles/ethminer.dir/all] Error 2
       > make: *** [Makefile:156: all] Error 2
       For full logs, run 'nix log /nix/store/ylafyjflmppbnxq5zjypqhxpdyg39ciz-ethminer-0.19.0.drv'.
error: builder for '/nix/store/bam31rgddbrr4k4awcv30f2y8yvskiz6-librealsense-2.45.0.drv' failed with exit code 1;
       last 10 log lines:
       >    84 |             self.deleter = [](void* ptr){ delete[] ptr; };
       >       |                                                    ^~~
       > /build/source/wrappers/python/pyrs_internal.cpp: In lambda function:
       > /build/source/wrappers/python/pyrs_internal.cpp:100:52: warning: deleting 'void*' is undefined [-Wdelete-incomplete]
       >   100 |             self.deleter = [](void* ptr){ delete[] ptr; };
       >       |                                                    ^~~
       > [182/219] Building CXX object wrappers/python/CMakeFiles/pyrealsense2.dir/pyrs_sensor.cpp.o
       > [183/219] Building CXX object wrappers/python/CMakeFiles/pyrealsense2.dir/c_files.cpp.o
       > [184/219] Building CXX object wrappers/python/CMakeFiles/pybackend2.dir/pybackend.cpp.o
       > ninja: build stopped: subcommand failed.
       For full logs, run 'nix log /nix/store/bam31rgddbrr4k4awcv30f2y8yvskiz6-librealsense-2.45.0.drv'.
error: builder for '/nix/store/i5gfqsph1rnff5q674y7wdpf5zc1ql9w-librealsense-2.45.0.drv' failed with exit code 1;
       last 10 log lines:
       > /build/source/wrappers/python/pyrs_internal.cpp:100:52: warning: deleting 'void*' is undefined [-Wdelete-incomplete]
       >   100 |             self.deleter = [](void* ptr){ delete[] ptr; };
       >       |                                                    ^~~
       > [174/219] Building CXX object wrappers/python/CMakeFiles/pyrealsense2.dir/pyrs_frame.cpp.o
       > [175/219] Building CXX object CMakeFiles/realsense2.dir/src/rs.cpp.o
       > [176/219] Building CXX object wrappers/python/CMakeFiles/pyrealsense2.dir/pyrs_sensor.cpp.o
       > [177/219] Building CXX object wrappers/python/CMakeFiles/pyrealsense2.dir/c_files.cpp.o
       > [178/219] Building CXX object wrappers/python/CMakeFiles/pybackend2.dir/pybackend.cpp.o
       > [179/219] Building C object CMakeFiles/realsense2.dir/third-party/sqlite/sqlite3.c.o
       > ninja: build stopped: subcommand failed.
       For full logs, run 'nix log /nix/store/i5gfqsph1rnff5q674y7wdpf5zc1ql9w-librealsense-2.45.0.drv'.
error: 9 dependencies of derivation '/nix/store/pa79sf239g5x61lvv6zbv951gyyq8xc0-env.drv' failed to build
error: 1 dependencies of derivation '/nix/store/394lrlh4bpkaxnafknlfwadp3qardczh-review-shell.drv' failed to build

@samuela
Copy link
Member

samuela commented Oct 5, 2022

TensorRT and mathematica require special downloads.

ethminer, numbaWithCuda, and pyrealsense2WithCuda are all failing on master. We should mark them as broken.

So this change LGTM modulo the formatting changes mentioned above.

@dguibert dguibert force-pushed the dg/cudatoolkit_11_7_0 branch from 3cb66cf to a0e9973 Compare October 5, 2022 06:06
@dguibert
Copy link
Member Author

dguibert commented Oct 5, 2022

I've amending the formatting changes.
Thx for reviewing this.

(it's time to package version 11.8 released few days ago 😉 )

@samuela samuela merged commit 6a55613 into NixOS:master Oct 5, 2022
@samuela
Copy link
Member

samuela commented Oct 5, 2022

Thanks so much for your PR @dguibert, and thank you for your patience getting it merged! Can't believe that 11.8 is already here haha

@dguibert
Copy link
Member Author

dguibert commented Oct 6, 2022

11.8 update has been proposed in #194705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants