Skip to content

Releases: ROCm/rocFFT

rocFFT 1.0.32 for ROCm 6.4.0

11 Apr 13:35
058ba87
Compare
Choose a tag to compare

Changed

  • Building with the address sanitizer option sets xnack+ on relevant GPU
    architectures and adds address-sanitizer support to runtime-compiled
    kernels.
  • The AMDGPU_TARGETS build variable should be replaced with GPU_TARGETS. AMDGPU_TARGETS is deprecated.

Removed

  • Removed ahead-of-time compiled kernels for the gfx906, gfx940, and gfx941 architectures. These architectures still
    function the same, but kernels for them are now compiled at runtime.
  • Removed consumer GPU architectures from the precompiled kernel cache that ships with
    rocFFT. rocFFT continues to ship with a cache of precompiled RTC kernels for data-center
    and workstation architectures. As before, user-level caches can be enabled by setting the
    environment variable ROCFFT_RTC_CACHE_PATH to a writeable file location.

Optimized

  • Improved MPI transform performance by using all-to-all communication for global transpose operations.
    Point-to-point communications are still used when all-to-all is not possible.
  • Improved the performance of unit-strided, complex interleaved, forward and inverse, length (64,64,64) FFTs.

Resolved issues

  • Fixed incorrect results from 2-kernel 3D FFT plans that used non-default output strides. For more information, see the rocFFT GitHub issue.
  • Plan descriptions can be reused with different strides for different plans. For more information, see the rocFFT GitHub issue.
  • Fixed client packages to depend on hipRAND instead of rocRAND.
  • Fixed potential integer overflows during large MPI transforms.

rocFFT 1.0.31 for ROCm 6.3.3

19 Feb 17:47
3806d68
Compare
Choose a tag to compare

rocFFT code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.

rocFFT 1.0.31 for ROCm 6.3.2

28 Jan 15:44
3806d68
Compare
Choose a tag to compare

rocFFT code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.

rocFFT 1.0.31 for ROCm 6.3.1

20 Dec 16:12
3806d68
Compare
Choose a tag to compare

rocFFT code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.

rocFFT 1.0.31 for ROCm 6.3.0

03 Dec 19:49
3806d68
Compare
Choose a tag to compare

Added

  • rocfft-test now includes a --smoketest option.

  • Support for the gfx1151, gfx1200, and gfx1201 architectures.

  • Implemented experimental APIs to allow computing FFTs on data
    distributed across multiple MPI ranks. These APIS can be enabled with the
    ROCFFT_MPI_ENABLE CMake option. This option defaults to OFF.

    When ROCFFT_MPI_ENABLE is ON:

    • rocfft_plan_description_set_comm can be called to provide an
      MPI communicator to a plan description, which can then be passed
      to rocfft_plan_create. Each rank calls
      rocfft_field_add_brick to specify the layout of data bricks on
      that rank.

    • An MPI library with ROCm acceleration enabled is required at
      build time and at runtime.

Changed

  • Compilation uses amdclang++ instead of hipcc.
  • CLI11 replaces Boost Program Options as the command line parser for clients and samples.
  • Building with the address sanitizer option sets xnack+ on relevant GPU
    architectures and address-sanitizer support is added to runtime-compiled
    kernels.

rocFFT 1.0.30 for ROCm 6.2.4

06 Nov 19:55
7b4fa44
Compare
Choose a tag to compare

Added

  • GFX1151 Support

Optimized

  • Implemented 1D kernels for factorizable sizes > 1024 and < 2048.

Resolved issues

  • Fixed plan creation failure on some even-length real-complex transforms that use Bluestein's algorithm.

rocFFT 1.0.29 for ROCm 6.2.2

27 Sep 16:01
65aaf84
Compare
Choose a tag to compare

rocFFT code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.

rocFFT 1.0.29 for ROCm 6.2.1

20 Sep 19:58
65aaf84
Compare
Choose a tag to compare

Optimizations

  • Implemented 1D kernels for factorizable sizes < 1024

rocFFT 1.0.28 for ROCm 6.2.0

02 Aug 16:15
7a8c475
Compare
Choose a tag to compare

Optimizations

  • Implemented multi-device transform for 3D pencil decomposition. Contiguous dimensions on input and output bricks
    are transformed locally, with global transposes to make remaining dimensions contiguous.

Changes

  • Randomly generated accuracy tests are now disabled by default; these can be enabled using
    the --nrand option (which defaults to 0).

rocFFT 1.0.27 for ROCm 6.1.5

12 Mar 18:30
30044d1
Compare
Choose a tag to compare

rocFFT code for ROCm 6.1.5 did not change. The library was rebuilt for the updated ROCm 6.1.5 stack.