Releases: ROCm/rocBLAS
Releases · ROCm/rocBLAS
rocblas 5.0.0 for ROCm 7.0.1
rocBLAS code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.
rocBLAS 5.0.0 for ROCm 7.0.0
Added
- gfx950 support
ROCBLAS_LAYER = 8
internal API logging forgemm
debugging- Support for AOCL 5.0 gcc build as a client reference library
- Allow
PkgConfig
for client reference library fallback detection
Changed
CMAKE_CXX_COMPILER
is now passed on during compilation for a Tensile build- Change default atomics mode from
allowed
tonot allowed
Removed
- Support code for non-production gfx targets
rocblas_hgemm_kernel_name
,rocblas_sgemm_kernel_name
, androcblas_dgemm_kernel_name
API functions- Use of
warpSize
as a constexpr - Use of deprecated behavior of
hipPeekLastError
rocblas_float8.h
androcblas_hip_f8_impl.h
filesrocblas_gemm_ex3
,rocblas_gemm_batched_ex3
,rocblas_gemm_strided_batched_ex3
API functions
Optimized
- Optimized
gemm
by usinggemv
kernels when applicable - Optimized
gemv
for smallm
andn
with a large batch count on gfx942 - Improved the performance of Level 1
dot
for all precisions and variants whenN > 100000000
on gfx942 - Improved the performance of Level 1
asum
andnrm2
for all precisions and variants on gfx942 - Improved the performance of Level 2
sger
(single precision) on gfx942 - Improved the performance of Level 3
dgmm
for all precisions and variants on gfx942
Resolved issues
- Fixed environment variable path-based logging to append multiple handle output to the same file
- Support numerics when
trsm
is running withrocblas_status_perf_degraded
- Fixed the build dependency installation of
joblib
on some operating systems - Return
rocblas_status_internal_error
whenrocblas_[set,get]_ [matrix,vector]
is called with a host pointer in place of a device pointer - Reduced the default verbosity level for internal GEMM backend information
- Updated from the deprecated rocm-cmake to ROCmCMakeBuildTools
- Corrected AlmaLinux gfortran package dependencies
Upcoming changes
- Deprecated the use of negative indices to indicate the default solution is being used for
gemm_ex
withrocblas_gemm_algo_solution_index
rocBLAS 4.4.1 for ROCm 6.4.3
rocBLAS code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.
rocBLAS 4.4.1 for ROCm 6.4.2
Resolved issues
- Zero imaginary portion of diagonal of C matrix for cherk/zherk for gfx90a/gfx942 with problem sizes
k > 500
rocBLAS 4.4.0 for ROCm 6.4.1
rocBLAS code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.
rocBLAS 4.4.0 for ROCm 6.4.0
Added
- rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
- On gfx12, all functions now support full
rocblas_int
dynamic range forbatch_count
--ninja
build option- Support for GPU_TARGETS cmake variable
Changed
- rocblas-test client removes the stress tests unless YAML-based testing or
gtest_filter
adds them - rocblas clients OpenMP default threading is reduced to be less than the logical core count
gemm_ex
testing and timing reuses device memorygemm_ex
timing initializes matrices on device
Optimized
- Significantly reduced workspace memory requirements for Level 1 ILP64:
iamax
andiamin
- Reduced workspace memory requirements for Level 1 ILP64:
dot
,asum
,nrm2
- Improved the performance of Level 2 gemv for the problem sizes (
TransA == N && m > 2*n
) and (TransA == T
) - Improved the performance of Level 3 syrk and herk for the problem size (
k > 500 && n < 4000
)
Resolved issues
- gfx12:
ger
,geam
,geam_ex
,dgmm
,trmm
,symm
,hemm
, ILP64gemm
, and larger data support - Added a
gfortran
package dependency for Azure Linux OS - Outdated SLES OS package dependencies (
cxxtools
andjoblib
) ininstall.sh -d
- Code object stripping for RPM packages
Upcoming changes
- Deprecated the cmake variable
AMDGPU_TARGETS
. UseGPU_TARGETS
instead.
rocBLAS 4.3.0 for ROCm 6.3.3
rocBLAS code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.
rocBLAS 4.3.0 for ROCm 6.3.2
rocBLAS code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.
rocBLAS 4.3.0 for ROCm 6.3.1
rocBLAS code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.
rocBLAS 4.3.0 for ROCm 6.3.0
Added
- Level 3 and EX functions have an additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
Changed
- amdclang is used as the default compiler instead of hipcc
- Internal performance scripts use amd-smi instead of the deprecated rocm-smi
Optimized
- Improved performance of Level 2 gbmv
- Improved performance of Level 2 gemv for float and double precisions for problem sizes (TransA == N && m==n && m % 128 == 0) measured on a gfx942 GPU
Resolved issues
- Fixed stbsv_strided_batched_64 Fortran binding
Upcoming changes
- rocblas_Xgemm_kernel_name APIs are deprecated