Skip to content

Releases: brycelelbach/cub_historical_2019_2020

CUB 1.10.0 (NVIDIA HPC SDK 20.9)

16 Sep 05:44
Compare
Choose a tag to compare

Summary

CUB 1.10.0 is the major release accompanying the NVIDIA HPC SDK 20.9 release. It drops support for C++03, GCC < 5, Clang < 6, and MSVC < 2017. It also overhauls CMake support. Finally, we now have a Code of Conduct for contributors: https://github.com/thrust/cub/blob/main/CODE_OF_CONDUCT.md

Breaking Changes

  • C++03 is no longer supported.
  • GCC < 5, Clang < 6, and MSVC < 2017 are no longer supported.
  • C++11 is deprecated. Using this dialect will generate a compile-time warning. These warnings can be suppressed by defining CUB_IGNORE_DEPRECATED_CPP_DIALECT or CUB_IGNORE_DEPRECATED_CPP_11. Suppression is only a short term solution. We will be dropping support for C++11 in the near future.
  • CMake < 3.15 is no longer supported.
  • The default branch on GitHub is now called main.

Other Enhancements

Bug Fixes

  • NVIDIA/thrust#1244: Check for macro collisions with system headers during header testing.
  • thrust/thrust#1153: Switch to placement new instead of assignment to construct items in uninitialized memory. Thanks to Hugh Winkler for this contribution.
  • thrust/cub#38: Fix cub::DeviceHistogram for size_t OffsetTs. Thanks to Leo Fang for this contribution.
  • thrust/cub#35: Fix GCC-5 maybe-uninitialized warning. Thanks to Rong Ou for this contribution.
  • thrust/cub#36: Qualify namespace for va_printf in _CubLog. Thanks to Andrei Tchouprakov for this contribution.

CUB 1.9.10-1 (NVIDIA HPC SDK 20.7, CUDA Toolkit 11.1)

27 Jul 22:19
Compare
Choose a tag to compare

Summary

CUB 1.9.10-1 is the minor release accompanying the NVIDIA HPC SDK 20.7 release and the CUDA Toolkit 11.1 release.

Bug Fixes

  • #1217: Move static local in cub::DeviceCount to a separate host-only function because NVC++ doesn't support static locals in host-device functions.

CUB 1.9.10 (NVIDIA HPC SDK 20.5)

19 May 09:22
Compare
Choose a tag to compare

Summary

CUB 1.9.10 is the release accompanying the NVIDIA HPC SDK 20.5 release. It adds CMake find_package support. C++03, C++11, GCC < 5, Clang < 6, and MSVC < 2017 are now deprecated. Starting with the upcoming 1.10.0 release, C++03 support will be dropped entirely.

Breaking Changes

  • Thrust now checks that it is compatible with the version of CUB found in your include path, generating an error if it is not. If you are using your own version of CUB, it may be too old. It is recommended to simply delete your own version of CUB and use the version of CUB that comes with Thrust.
  • C++03 and C++11 are deprecated. Using these dialects will generate a compile-time warning. These warnings can be suppressed by defining CUB_IGNORE_DEPRECATED_CPP_DIALECT (to suppress C++03 and C++11 deprecation warnings) or CUB_IGNORE_DEPRECATED_CPP_11 (to suppress C++11 deprecation warnings). Suppression is only a short term solution. We will be dropping support for C++03 in the 1.10.0 release and C++11 in the near future.
  • GCC < 5, Clang < 6, and MSVC < 2017 are deprecated. Using these compilers will generate a compile-time warning. These warnings can be suppressed by defining CUB_IGNORE_DEPRECATED_COMPILER. Suppression is only a short term solution. We will be dropping support for these compilers in the near future.

New Features

  • CMake find_package support. Just point CMake at the cmake folder in your CUB include directory (ex: cmake -DCUB_DIR=/usr/local/cuda/include/cub/cmake/ .) and then you can add CUB to your CMake project with find_package(CUB REQUIRED CONFIG).

CUB 1.9.9 (CUDA 11.0)

19 May 09:13
Compare
Choose a tag to compare

CUB 1.9.9 (CUDA 11.0)

Summary

CUB 1.9.9 is the release accompanying the CUDA Toolkit 11.0 release. It introduces CMake support, version macros, platform detection machinery, and support for NVC++, which uses Thrust (and thus CUB) to implement GPU-accelerated C++17 Parallel Algorithms. Additionally, the scan dispatch layer was refactored and modernized. C++03, C++11, GCC < 5, Clang < 6, and MSVC < 2017 are now deprecated. Starting with the upcoming 1.10.0 release, C++03 support will be dropped entirely.

Breaking Changes

  • Thrust now checks that it is compatible with the version of CUB found in your include path, generating an error if it is not. If you are using your own version of CUB, it may be too old. It is recommended to simply delete your own version of CUB and use the version of CUB that comes with Thrust.
  • C++03 and C++11 are deprecated. Using these dialects will generate a compile-time warning. These warnings can be suppressed by defining CUB_IGNORE_DEPRECATED_CPP_DIALECT (to suppress C++03 and C++11 deprecation warnings) or CUB_IGNORE_DEPRECATED_CPP11 (to suppress C++11 deprecation warnings). Suppression is only a short term solution. We will be dropping support for C++03 in the 1.10.0 release and C++11 in the near future.
  • GCC < 5, Clang < 6, and MSVC < 2017 are deprecated. Using these compilers will generate a compile-time warning. These warnings can be suppressed by defining CUB_IGNORE_DEPRECATED_COMPILER. Suppression is only a short term solution. We will be dropping support for these compilers in the near future.

New Features

  • CMake support. Thanks to Francis Lemaire for this contribution.
  • Refactorized and modernized scan dispatch layer. Thanks to Francis Lemaire for this contribution.
  • Policy hooks for device-wide reduce, scan, and radix sort facilities to simplify tuning and allow users to provide custom policies. Thanks to Francis Lemaire for this contribution.
  • <cub/version.cuh>: CUB_VERSION, CUB_VERSION_MAJOR, CUB_VERISON_MINOR, CUB_VERSION_SUBMINOR, and CUB_PATCH_NUMBER.
  • Platform detection machinery:
    • <cub/util_cpp_dialect.cuh>: Detects the C++ standard dialect.
    • <cub/util_compiler.cuh>: host and device compiler detection.
    • <cub/util_deprecated.cuh>: CUB_DEPRECATED.
    • <cub/config.cuh>: Includes <cub/util_arch.cuh>, <cub/util_compiler.cuh>, <cub/util_cpp_dialect.cuh>, <cub/util_deprecated.cuh>, <cub/util_macro.cuh>, <cub/util_namespace.cuh>`
  • cub::DeviceCount and cub::DeviceCountUncached, caching abstractions for cudaGetDeviceCount.

Other Enhancements

  • Lazily initialize the per-device CUDAattribute caches, because CUDA context creation is expensive and adds up with large CUDA binaries on machines with many GPUs. Thanks to the NVIDIA PyTorch team for bringing this to our attention.
  • Make cub::SwitchDevice avoid setting/resetting the device if the current device is the same as the target device.

Bug Fixes

  • Add explicit failure parameter to CAS in the CUB attribute cache to workaround a GCC 4.8 bug.
  • Revert a change in reductions that changed the signedness of the lane_id variable to suppress a warning, as this introduces a bug in optimized device code.
  • Fix initialization in cub::ExclusiveSum. Thanks to Conor Hoekstra for this contribution.
  • Fix initialization of the std::array in the CUB attribute cache.
  • Fix -Wsign-compare warnings. Thanks to Elias Stehle for this contribution.
  • Fix test_block_reduce.cu to build without parameters. Thanks to Francis Lemaire for this contribution.
  • Add missing includes to grid_even_share.cuh. Thanks to Francis Lemaire for this contribution.
  • Add missing includes to thread_search.cuh. Thanks to Francis Lemaire for this contribution.
  • Add missing includes to cub.cuh. Thanks to Felix Kallenborn for this contribution.

CUB 1.9.8-1 (NVIDIA HPC SDK 20.3)

19 May 09:05
Compare
Choose a tag to compare

Summary

CUB 1.9.8-1 is a variant of 1.9.8 accompanying the NVIDIA HPC SDK 20.3 release. It contains modifications necessary to serve as the implementation of NVC++'s GPU-accelerated C++17 Parallel Algorithms.

CUB 1.9.8 (CUDA 11.0 Early Access)

19 May 09:02
Compare
Choose a tag to compare

Summary

CUB 1.9.8 is the first release of CUB to be officially supported and included in the CUDA Toolkit.
When compiling CUB in C++11 mode, CUB now caches calls to CUDA attribute query APIs, which improves performance of these queries by 20x to 50x when they are called concurrently by multiple host threads.

Enhancements

  • (C++11 or later) Cache calls to cudaFuncGetAttributes and cudaDeviceGetAttribute within cub::PtxVersion and cub::SmVersion. These CUDA APIs acquire locks to CUDA driver/runtime mutex and perform poorly under contention; with the caching, they are 20 to 50x faster when called concurrently. Thanks to Bilge Acun for bringing this issue to our attention.
  • DispatchReduce now takes an OutputT template parameter so that users can specify the intermediate type explicitly.
  • Radix sort tuning policies updates to fix performance issues for element types smaller than 4 bytes.

Bug Fixes

  • Change initialization style from copy initialization to direct initialization (which is more permissive) in AgentReduce to allow a wider range of types to be used with it.
  • Fix bad signed/unsigned comparisons in WarpReduce.
  • Fix computation of valid lanes in warp-level reduction primitive to correctly handle the case where there are 0 input items per warp.

CUB 1.8.0

19 May 08:59
Compare
Choose a tag to compare

Summary

CUB 1.8.0 introduces changes to the cub::Shuffle* interfaces.

Breaking Changes

  • The interfaces of cub::ShuffleIndex, cub::ShuffleUp, and cub::ShuffleDown have been changed to allow for better computation of the PTX SHFL control constant for logical warps smaller than 32 threads.

Bug Fixes

  • NVIDIA#112: Fix cub::WarpScan's broadcast of warp-wide aggregate for logical warps smaller than 32 threads.

CUB 1.7.5

19 May 08:56
Compare
Choose a tag to compare

Summary

CUB 1.7.5 adds support for radix sorting __half keys and improved sorting performance for 1 byte keys. It was incorporated into Thrust 1.9.2.

Enhancements

  • Radix sort support for __half keys.
  • Radix sort tuning policy updates to improve 1 byte key performance.

Bug Fixes

  • Syntax tweaks to mollify Clang.
  • NVIDIA#127: cub::DeviceRunLengthEncode::Encode returns incorrect results.
  • NVIDIA#128: 7-bit sorting passes fail for SM61 with large values.

CUB 1.7.4

19 May 08:56
Compare
Choose a tag to compare

Summary

CUB 1.7.4 is a minor release that was incorporated into Thrust 1.9.1-2.

Bug Fixes

  • NVIDIA#114: Can't pair non-trivially-constructible values in radix sort.
  • NVIDIA#115: cub::WarpReduce segmented reduction is broken in CUDA 9 for logical warp sizes smaller than 32.

CUB 1.7.3

19 May 08:56
Compare
Choose a tag to compare

Summary

CUB 1.7.3 is a minor release.

Bug Fixes

  • NVIDIA#110: cub::DeviceHistogram null-pointer exception bug for iterator inputs.