Skip to content

rocFFT-1.0.10 for ROCm 4.1.0

Compare
Choose a tag to compare
@saadrahim saadrahim released this 23 Mar 01:06
c3110db

Added

  • Explicitly specify MAX_THREADS_PER_BLOCK through _launch_bounds for all kernels.
  • Switch to new syntax for specifying AMD GPU architecture names and features.

Optimizations

  • Optimized C2C/R2C 3D 64, 81, 100, 128, 200, 256 cube sizes.
  • Improved performance of the standalone out-of-place transpose kernel.
  • Optimized 1D length 40000 C2C case.
  • Enabled radix-7 for size 336.
  • New radix-11 and radix-13 kernels; used in length 11 and 13 (and some of their multiples) transforms.

Changed

  • rocFFT now automatically allocates a work buffer if the plan requires one but none is provided.
  • An explicit rocfft_status_invalid_work_buffer error is now returned when a work buffer of insufficient size is provided.
  • Updated online documentation.
  • Updated debian package name version with separated '_'.
  • Adjusted accuracy test tolerances and how they are compared.

Fixed

  • Fixed 4x4x8192 accuracy failure.

Known Issues

  • None