Skip to content

Releases: ROCm/rocBLAS

rocblas 5.0.0 for ROCm 7.0.1

17 Sep 16:37
Compare
Choose a tag to compare

rocBLAS code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.

rocBLAS 5.0.0 for ROCm 7.0.0

16 Sep 06:32
Compare
Choose a tag to compare

Added

  • gfx950 support
  • ROCBLAS_LAYER = 8 internal API logging for gemm debugging
  • Support for AOCL 5.0 gcc build as a client reference library
  • Allow PkgConfig for client reference library fallback detection

Changed

  • CMAKE_CXX_COMPILER is now passed on during compilation for a Tensile build
  • Change default atomics mode from allowed to not allowed

Removed

  • Support code for non-production gfx targets
  • rocblas_hgemm_kernel_name, rocblas_sgemm_kernel_name, and rocblas_dgemm_kernel_name API functions
  • Use of warpSize as a constexpr
  • Use of deprecated behavior of hipPeekLastError
  • rocblas_float8.h and rocblas_hip_f8_impl.h files
  • rocblas_gemm_ex3, rocblas_gemm_batched_ex3, rocblas_gemm_strided_batched_ex3 API functions

Optimized

  • Optimized gemm by using gemv kernels when applicable
  • Optimized gemv for small m and n with a large batch count on gfx942
  • Improved the performance of Level 1 dot for all precisions and variants when N > 100000000 on gfx942
  • Improved the performance of Level 1 asum and nrm2 for all precisions and variants on gfx942
  • Improved the performance of Level 2 sger (single precision) on gfx942
  • Improved the performance of Level 3 dgmm for all precisions and variants on gfx942

Resolved issues

  • Fixed environment variable path-based logging to append multiple handle output to the same file
  • Support numerics when trsm is running with rocblas_status_perf_degraded
  • Fixed the build dependency installation of joblib on some operating systems
  • Return rocblas_status_internal_error when rocblas_[set,get]_ [matrix,vector] is called with a host pointer in place of a device pointer
  • Reduced the default verbosity level for internal GEMM backend information
  • Updated from the deprecated rocm-cmake to ROCmCMakeBuildTools
  • Corrected AlmaLinux gfortran package dependencies

Upcoming changes

  • Deprecated the use of negative indices to indicate the default solution is being used for gemm_ex with rocblas_gemm_algo_solution_index

rocBLAS 4.4.1 for ROCm 6.4.3

07 Aug 14:20
f08d23e
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.

rocBLAS 4.4.1 for ROCm 6.4.2

21 Jul 16:54
f08d23e
Compare
Choose a tag to compare

Resolved issues

  • Zero imaginary portion of diagonal of C matrix for cherk/zherk for gfx90a/gfx942 with problem sizes k > 500

rocBLAS 4.4.0 for ROCm 6.4.1

20 May 13:16
80e5394
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.

rocBLAS 4.4.0 for ROCm 6.4.0

11 Apr 13:35
80e5394
Compare
Choose a tag to compare

Added

  • rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
  • On gfx12, all functions now support full rocblas_int dynamic range for batch_count
  • --ninja build option
  • Support for GPU_TARGETS cmake variable

Changed

  • rocblas-test client removes the stress tests unless YAML-based testing or gtest_filter adds them
  • rocblas clients OpenMP default threading is reduced to be less than the logical core count
  • gemm_ex testing and timing reuses device memory
  • gemm_ex timing initializes matrices on device

Optimized

  • Significantly reduced workspace memory requirements for Level 1 ILP64: iamax and iamin
  • Reduced workspace memory requirements for Level 1 ILP64: dot, asum, nrm2
  • Improved the performance of Level 2 gemv for the problem sizes (TransA == N && m > 2*n) and (TransA == T)
  • Improved the performance of Level 3 syrk and herk for the problem size (k > 500 && n < 4000)

Resolved issues

  • gfx12: ger, geam, geam_ex, dgmm, trmm, symm, hemm, ILP64 gemm, and larger data support
  • Added a gfortran package dependency for Azure Linux OS
  • Outdated SLES OS package dependencies (cxxtools and joblib) in install.sh -d
  • Code object stripping for RPM packages

Upcoming changes

  • Deprecated the cmake variable AMDGPU_TARGETS. Use GPU_TARGETS instead.

rocBLAS 4.3.0 for ROCm 6.3.3

19 Feb 17:47
8ebd6c1
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.

rocBLAS 4.3.0 for ROCm 6.3.2

28 Jan 15:44
8ebd6c1
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.

rocBLAS 4.3.0 for ROCm 6.3.1

20 Dec 16:12
8ebd6c1
Compare
Choose a tag to compare

rocBLAS code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.

rocBLAS 4.3.0 for ROCm 6.3.0

03 Dec 19:49
8ebd6c1
Compare
Choose a tag to compare

Added

  • Level 3 and EX functions have an additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments

Changed

  • amdclang is used as the default compiler instead of hipcc
  • Internal performance scripts use amd-smi instead of the deprecated rocm-smi

Optimized

  • Improved performance of Level 2 gbmv
  • Improved performance of Level 2 gemv for float and double precisions for problem sizes (TransA == N && m==n && m % 128 == 0) measured on a gfx942 GPU

Resolved issues

  • Fixed stbsv_strided_batched_64 Fortran binding

Upcoming changes

  • rocblas_Xgemm_kernel_name APIs are deprecated