Skip to content

Releases: saadrahim/rocFFT

rocFFT-1.0.11 for ROCm 4.2.0

10 May 23:13
a470ba6
Compare
Choose a tag to compare

Optimizations

  • Improved performance for single precision kernels exercising all except radix-2/7 butterfly ops.
  • Minor optimization for C2R 3D 100, 200 cube sizes.
  • Optimized some C2C/R2C 3D 64, 81, 100, 128, 200, 256 rectangular sizes.
  • When factoring, test to see if remaining length is explicitly supported.
  • Explicitly add radix-7 lengths 14, 21, and 224 to list of supported lengths.
  • Optimized R2C 2D/3D 128, 200, 256 cube sizes.

Fixed

  • Fixed potential crashes in small 3D transforms with unusual strides. (ROCm#311)
  • Fixed potential crashes when executing transforms on multiple devices. (ROCm#310)

rocFFT-1.0.10 for ROCm 4.1.0

23 Mar 01:06
c3110db
Compare
Choose a tag to compare

Added

  • Explicitly specify MAX_THREADS_PER_BLOCK through _launch_bounds for all kernels.
  • Switch to new syntax for specifying AMD GPU architecture names and features.

Optimizations

  • Optimized C2C/R2C 3D 64, 81, 100, 128, 200, 256 cube sizes.
  • Improved performance of the standalone out-of-place transpose kernel.
  • Optimized 1D length 40000 C2C case.
  • Enabled radix-7 for size 336.
  • New radix-11 and radix-13 kernels; used in length 11 and 13 (and some of their multiples) transforms.

Changed

  • rocFFT now automatically allocates a work buffer if the plan requires one but none is provided.
  • An explicit rocfft_status_invalid_work_buffer error is now returned when a work buffer of insufficient size is provided.
  • Updated online documentation.
  • Updated debian package name version with separated '_'.
  • Adjusted accuracy test tolerances and how they are compared.

Fixed

  • Fixed 4x4x8192 accuracy failure.

Known Issues

  • None

rocFFT-1.0.10 for ROCm 4.1.0

23 Mar 00:54
c3110db
Compare
Choose a tag to compare

New Features
Added

  • Explicitly specify MAX_THREADS_PER_BLOCK through _launch_bounds for all kernels.
  • Switch to new syntax for specifying AMD GPU architecture names and features.

Optimizations

  • Optimized C2C/R2C 3D 64, 81, 100, 128, 200, 256 cube sizes.
  • Improved performance of the standalone out-of-place transpose kernel.
  • Optimized 1D length 40000 C2C case.
  • Enabled radix-7 for size 336.
  • New radix-11 and radix-13 kernels; used in length 11 and 13 (and some of their multiples) transforms.

Changed

  • rocFFT now automatically allocates a work buffer if the plan requires one but none is provided.
  • An explicit rocfft_status_invalid_work_buffer error is now returned when a work buffer of insufficient size is provided.
  • Updated online documentation.
  • Updated debian package name version with separated '_'.
  • Adjusted accuracy test tolerances and how they are compared.

Fixed

  • Fixed 4x4x8192 accuracy failure.

Known Issues

  • None

rocFFT-1.0.5 for ROCm 3.7.0

15 Aug 04:20
50fea91
Compare
Choose a tag to compare

New Features

  • Optimized 2D C2C middle sizes with fused 2 kernels for pow-of-2
  • Change package dependency to hip-rocclr
  • Fixed build issue with C++ 17
  • Improved test infrastructure

Known Issues

  • None

rocFFT-1.0.5 for ROCm 3.7.0

15 Aug 04:17
50fea91
Compare
Choose a tag to compare

New Features

  • Optimized 2D C2C middle sizes with fused 2 kernels for pow-of-2
  • Change package dependency to hip-rocclr
  • Fixed build issue with C++ 17
  • Improved test infrastructure

Known Issues

  • None

rocFFT-1.0.4 for ROCm 3.6.0

10 Jul 23:15
50fea91
Compare
Choose a tag to compare

New Features

  • Fixed non-unit stride issue of 1D middle size
  • Updated client package installation path
  • Improved internal device memory usage check
  • Improved log
  • Improved test infrastructure

Known Issues

  • None

rocFFT-1.0.4 for ROCm 3.6.0

10 Jul 23:13
50fea91
Compare
Choose a tag to compare

New Features

  • Fixed non-unit stride issue of 1D middle size
  • Updated client package installation path
  • Improved internal device memory usage check
  • Improved log
  • Improved test infrastructure

Known Issues

  • None

rocFFT-1.0.4 for ROCm 3.6.0

10 Jul 23:07
50fea91
Compare
Choose a tag to compare

New Features

  • Fixed non-unit stride issue of 1D middle size
  • Updated client package installation path
  • Improved internal device memory usage check
  • Improved log
  • Improved test infrastructure

Known Issues

  • None

rocFFT-1.0.4 for ROCm 3.6.0

10 Jul 23:03
50fea91
Compare
Choose a tag to compare

New Features

  • Fixed non-unit stride issue of 1D middle size
  • Updated client package installation path
  • Improved internal device memory usage check
  • Improved log
  • Improved test infrastructure

Known Issues

None

rocFFT-1.0.4 for ROCm 3.6.0

10 Jul 22:58
50fea91
Compare
Choose a tag to compare

New Features

  • bidiagonalization of general matrices
  • optimizations to LU factorization

Known Issues

None