Releases: saadrahim/rocFFT
Releases · saadrahim/rocFFT
rocFFT-1.0.11 for ROCm 4.2.0
Optimizations
- Improved performance for single precision kernels exercising all except radix-2/7 butterfly ops.
- Minor optimization for C2R 3D 100, 200 cube sizes.
- Optimized some C2C/R2C 3D 64, 81, 100, 128, 200, 256 rectangular sizes.
- When factoring, test to see if remaining length is explicitly supported.
- Explicitly add radix-7 lengths 14, 21, and 224 to list of supported lengths.
- Optimized R2C 2D/3D 128, 200, 256 cube sizes.
Fixed
rocFFT-1.0.10 for ROCm 4.1.0
Added
- Explicitly specify MAX_THREADS_PER_BLOCK through _launch_bounds for all kernels.
- Switch to new syntax for specifying AMD GPU architecture names and features.
Optimizations
- Optimized C2C/R2C 3D 64, 81, 100, 128, 200, 256 cube sizes.
- Improved performance of the standalone out-of-place transpose kernel.
- Optimized 1D length 40000 C2C case.
- Enabled radix-7 for size 336.
- New radix-11 and radix-13 kernels; used in length 11 and 13 (and some of their multiples) transforms.
Changed
- rocFFT now automatically allocates a work buffer if the plan requires one but none is provided.
- An explicit rocfft_status_invalid_work_buffer error is now returned when a work buffer of insufficient size is provided.
- Updated online documentation.
- Updated debian package name version with separated '_'.
- Adjusted accuracy test tolerances and how they are compared.
Fixed
- Fixed 4x4x8192 accuracy failure.
Known Issues
- None
rocFFT-1.0.10 for ROCm 4.1.0
New Features
Added
- Explicitly specify MAX_THREADS_PER_BLOCK through _launch_bounds for all kernels.
- Switch to new syntax for specifying AMD GPU architecture names and features.
Optimizations
- Optimized C2C/R2C 3D 64, 81, 100, 128, 200, 256 cube sizes.
- Improved performance of the standalone out-of-place transpose kernel.
- Optimized 1D length 40000 C2C case.
- Enabled radix-7 for size 336.
- New radix-11 and radix-13 kernels; used in length 11 and 13 (and some of their multiples) transforms.
Changed
- rocFFT now automatically allocates a work buffer if the plan requires one but none is provided.
- An explicit rocfft_status_invalid_work_buffer error is now returned when a work buffer of insufficient size is provided.
- Updated online documentation.
- Updated debian package name version with separated '_'.
- Adjusted accuracy test tolerances and how they are compared.
Fixed
- Fixed 4x4x8192 accuracy failure.
Known Issues
- None
rocFFT-1.0.5 for ROCm 3.7.0
New Features
- Optimized 2D C2C middle sizes with fused 2 kernels for pow-of-2
- Change package dependency to hip-rocclr
- Fixed build issue with C++ 17
- Improved test infrastructure
Known Issues
- None
rocFFT-1.0.5 for ROCm 3.7.0
New Features
- Optimized 2D C2C middle sizes with fused 2 kernels for pow-of-2
- Change package dependency to hip-rocclr
- Fixed build issue with C++ 17
- Improved test infrastructure
Known Issues
- None
rocFFT-1.0.4 for ROCm 3.6.0
New Features
- Fixed non-unit stride issue of 1D middle size
- Updated client package installation path
- Improved internal device memory usage check
- Improved log
- Improved test infrastructure
Known Issues
- None
rocFFT-1.0.4 for ROCm 3.6.0
New Features
- Fixed non-unit stride issue of 1D middle size
- Updated client package installation path
- Improved internal device memory usage check
- Improved log
- Improved test infrastructure
Known Issues
- None
rocFFT-1.0.4 for ROCm 3.6.0
New Features
- Fixed non-unit stride issue of 1D middle size
- Updated client package installation path
- Improved internal device memory usage check
- Improved log
- Improved test infrastructure
Known Issues
- None
rocFFT-1.0.4 for ROCm 3.6.0
New Features
- Fixed non-unit stride issue of 1D middle size
- Updated client package installation path
- Improved internal device memory usage check
- Improved log
- Improved test infrastructure
Known Issues
None
rocFFT-1.0.4 for ROCm 3.6.0
New Features
- bidiagonalization of general matrices
- optimizations to LU factorization
Known Issues
None