Releases: ROCm/rocFFT
rocFFT 1.0.32 for ROCm 6.4.0
Changed
- Building with the address sanitizer option sets xnack+ on relevant GPU
architectures and adds address-sanitizer support to runtime-compiled
kernels. - The
AMDGPU_TARGETS
build variable should be replaced withGPU_TARGETS
.AMDGPU_TARGETS
is deprecated.
Removed
- Removed ahead-of-time compiled kernels for the gfx906, gfx940, and gfx941 architectures. These architectures still
function the same, but kernels for them are now compiled at runtime. - Removed consumer GPU architectures from the precompiled kernel cache that ships with
rocFFT. rocFFT continues to ship with a cache of precompiled RTC kernels for data-center
and workstation architectures. As before, user-level caches can be enabled by setting the
environment variable ROCFFT_RTC_CACHE_PATH to a writeable file location.
Optimized
- Improved MPI transform performance by using all-to-all communication for global transpose operations.
Point-to-point communications are still used when all-to-all is not possible. - Improved the performance of unit-strided, complex interleaved, forward and inverse, length (64,64,64) FFTs.
Resolved issues
- Fixed incorrect results from 2-kernel 3D FFT plans that used non-default output strides. For more information, see the rocFFT GitHub issue.
- Plan descriptions can be reused with different strides for different plans. For more information, see the rocFFT GitHub issue.
- Fixed client packages to depend on hipRAND instead of rocRAND.
- Fixed potential integer overflows during large MPI transforms.
rocFFT 1.0.31 for ROCm 6.3.3
rocFFT code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.
rocFFT 1.0.31 for ROCm 6.3.2
rocFFT code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.
rocFFT 1.0.31 for ROCm 6.3.1
rocFFT code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.
rocFFT 1.0.31 for ROCm 6.3.0
Added
-
rocfft-test now includes a --smoketest option.
-
Support for the gfx1151, gfx1200, and gfx1201 architectures.
-
Implemented experimental APIs to allow computing FFTs on data
distributed across multiple MPI ranks. These APIS can be enabled with the
ROCFFT_MPI_ENABLE
CMake option. This option defaults toOFF
.When
ROCFFT_MPI_ENABLE
isON
:-
rocfft_plan_description_set_comm
can be called to provide an
MPI communicator to a plan description, which can then be passed
torocfft_plan_create
. Each rank calls
rocfft_field_add_brick
to specify the layout of data bricks on
that rank. -
An MPI library with ROCm acceleration enabled is required at
build time and at runtime.
-
Changed
- Compilation uses amdclang++ instead of hipcc.
- CLI11 replaces Boost Program Options as the command line parser for clients and samples.
- Building with the address sanitizer option sets xnack+ on relevant GPU
architectures and address-sanitizer support is added to runtime-compiled
kernels.
rocFFT 1.0.30 for ROCm 6.2.4
Added
- GFX1151 Support
Optimized
- Implemented 1D kernels for factorizable sizes > 1024 and < 2048.
Resolved issues
- Fixed plan creation failure on some even-length real-complex transforms that use Bluestein's algorithm.
rocFFT 1.0.29 for ROCm 6.2.2
rocFFT code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.
rocFFT 1.0.29 for ROCm 6.2.1
Optimizations
- Implemented 1D kernels for factorizable sizes < 1024
rocFFT 1.0.28 for ROCm 6.2.0
Optimizations
- Implemented multi-device transform for 3D pencil decomposition. Contiguous dimensions on input and output bricks
are transformed locally, with global transposes to make remaining dimensions contiguous.
Changes
- Randomly generated accuracy tests are now disabled by default; these can be enabled using
the --nrand option (which defaults to 0).
rocFFT 1.0.27 for ROCm 6.1.5
rocFFT code for ROCm 6.1.5 did not change. The library was rebuilt for the updated ROCm 6.1.5 stack.