rocFFT-1.0.10 for ROCm 4.1.0
Added
- Explicitly specify MAX_THREADS_PER_BLOCK through _launch_bounds for all kernels.
- Switch to new syntax for specifying AMD GPU architecture names and features.
Optimizations
- Optimized C2C/R2C 3D 64, 81, 100, 128, 200, 256 cube sizes.
- Improved performance of the standalone out-of-place transpose kernel.
- Optimized 1D length 40000 C2C case.
- Enabled radix-7 for size 336.
- New radix-11 and radix-13 kernels; used in length 11 and 13 (and some of their multiples) transforms.
Changed
- rocFFT now automatically allocates a work buffer if the plan requires one but none is provided.
- An explicit rocfft_status_invalid_work_buffer error is now returned when a work buffer of insufficient size is provided.
- Updated online documentation.
- Updated debian package name version with separated '_'.
- Adjusted accuracy test tolerances and how they are compared.
Fixed
- Fixed 4x4x8192 accuracy failure.
Known Issues
- None