Skip to content

Bluestein Algorithm #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 76 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
6a6672f
intial factorization changes
AD2605 Nov 27, 2023
a0b6218
passing vector to contain l2 events in global level FFTs
AD2605 Nov 28, 2023
d6dff45
passing pre-allocated vector to transpose_level to collect llc events
AD2605 Nov 28, 2023
74c8c80
passing dimension_struct instead of kernel_data_struct
AD2605 Nov 29, 2023
386c98f
further changes to descriptor
AD2605 Nov 30, 2023
e4c8d27
Merge remote-tracking branch 'origin/atharva/refactor' into atharva/b…
AD2605 Nov 30, 2023
6236bba
WIP
AD2605 Dec 7, 2023
dbc0919
correctly populating factors and sub batches vectors for backward fac…
AD2605 Dec 11, 2023
8de1139
update setting of specialization constants
AD2605 Dec 12, 2023
69f674e
added host naive_dft and generation of chirp signal
AD2605 Dec 12, 2023
0f7f7cb
further changes and resolved all warnings
AD2605 Dec 12, 2023
308351d
populate bluestein specific twiddles in device pointer
AD2605 Dec 12, 2023
8d506bd
fixed issue is setting dimension size
AD2605 Dec 14, 2023
b530624
added backward pass required for bluestein
AD2605 Dec 14, 2023
2d3c88a
added prime sized tests
AD2605 Dec 14, 2023
67c3c7f
fix compilation issues
AD2605 Dec 20, 2023
b23be66
add copy function in global dispatcher
AD2605 Dec 20, 2023
4a92d8c
enable taking conjugate before and after fft compute
AD2605 Dec 21, 2023
651edd1
fix compilation issues
AD2605 Dec 21, 2023
9fd1566
refactor repetative conjugate snippet into a utility function
AD2605 Jan 3, 2024
c4e6549
ignore templated lambda C++20 warning (for now)
AD2605 Jan 3, 2024
3f982e7
Merge remote-tracking branch 'origin/main' into atharva/bluestein
AD2605 Jan 8, 2024
ea1995e
added event dependencies for copy before and after compute for prime …
AD2605 Jan 9, 2024
c56475f
fix bugs in prepare_implementation and build_w_spec_const
AD2605 Jan 9, 2024
7e6dd9e
add option to increment store modifier pointer
AD2605 Jan 9, 2024
78e1372
remove readability-magic-numbers from clang-tidy
AD2605 Jan 9, 2024
730fecc
further changes in global's calculate_twiddles_struct to accomodate b…
AD2605 Jan 11, 2024
1be4afc
fix compilation and warning
AD2605 Jan 12, 2024
4cf6ca7
updated apply_modifier in workitem impl after layout changes
AD2605 Jan 12, 2024
935477d
Revert "updated apply_modifier in workitem impl after layout changes"
AD2605 Jan 12, 2024
eae3341
bugfixes in calculate_twiddles and ensure coalesced accesses in worki…
AD2605 Jan 15, 2024
1861125
pass load modifier pointer and loc memory to launch_kernel and dispat…
AD2605 Jan 16, 2024
63dd473
add option for subgroup loads in apply_modifier
AD2605 Jan 16, 2024
9ea81fe
transpose load modifiers if required
AD2605 Jan 16, 2024
2d70170
setting of increment store pointer spec const and using less memory i…
AD2605 Jan 16, 2024
a07ec9c
updated doxygens and descriptions
AD2605 Jan 16, 2024
8a8371d
remove direction
AD2605 Jan 17, 2024
310945f
fix compilation errors
AD2605 Jan 17, 2024
c094e4d
correctly set conjugate spec constants for global impl, and workgroup…
AD2605 Jan 18, 2024
12e618f
set scale_factor as spec constant
AD2605 Jan 18, 2024
39d1632
remove scale_factor from friend declaration
AD2605 Jan 18, 2024
a152e0d
resolve warnings and add TODO
AD2605 Jan 18, 2024
80167b5
set scale factor only for the last kernel
AD2605 Jan 18, 2024
155d417
fix scaled tests
AD2605 Jan 18, 2024
b40a91f
change push_back to emplace_back
AD2605 Jan 19, 2024
c935e73
Merge remote-tracking branch 'origin/main' into atharva/spec_const_di…
AD2605 Jan 19, 2024
5976144
fix compilation failure with multiple subgroup sizes
AD2605 Jan 19, 2024
93ab8a4
add batch one to offset tests
AD2605 Jan 19, 2024
6fcb7d4
set strides, distance and offset on the basis of direction
AD2605 Jan 19, 2024
e4167ae
review comments 1
AD2605 Jan 19, 2024
d78ac6f
Merge remote-tracking branch 'origin/atharva/spec_const_dir_scale' in…
AD2605 Jan 21, 2024
5fe5eaa
renamed all take_conjugate_* to conjugate_*
AD2605 Jan 23, 2024
8ea2076
review comments 2
AD2605 Jan 23, 2024
a7034bb
Update description of compute direction
AD2605 Jan 23, 2024
3910c38
added get_spect_constant_scale
AD2605 Jan 23, 2024
681c587
Merge remote-tracking branch 'origin/atharva/spec_const_dir_scale' in…
AD2605 Jan 23, 2024
7c8b0c0
add dimension_struct back as a parameter to calculate_twiddles_struct
AD2605 Jan 23, 2024
e1050c1
make get_local_memory_usage a member function
AD2605 Jan 23, 2024
ff650b5
Merge remote-tracking branch 'origin/atharva/spec_const_dir_scale' in…
AD2605 Jan 23, 2024
a57ada8
doxygen fixes
t4c1 Feb 2, 2024
01cd4d4
adding back bluestein parts bit by bit
AD2605 Feb 2, 2024
32b5582
Merge remote-tracking branch 'origin/doxy_fix' into atharva/bluestein
AD2605 Feb 2, 2024
c6663d8
reimplment bluestein
AD2605 Feb 6, 2024
aed81cf
remove check for usm support
AD2605 Feb 6, 2024
fbe0b3d
Merge remote-tracking branch 'origin/main' into atharva/bluestein
AD2605 Feb 6, 2024
d96a158
refactoring changes
AD2605 Feb 7, 2024
4976174
doxygen fixes
AD2605 Feb 7, 2024
b135650
Merge remote-tracking branch 'origin/main' into atharva/bluestein
AD2605 Feb 13, 2024
ef62c2f
formatting changes
AD2605 Feb 13, 2024
fca1a80
fix compilation 1
AD2605 Feb 14, 2024
6f29039
warnings and compilation 2
AD2605 Feb 14, 2024
c885293
fix all compilation errors
AD2605 Feb 14, 2024
6be4c6e
removed local memory usage for store modifiers
AD2605 Feb 25, 2024
7a2afe8
added global2global copy kernel
AD2605 Feb 26, 2024
75f55d9
review comments 1
AD2605 Feb 27, 2024
b3a1cd7
added some comments
AD2605 Feb 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Checks: >
performance-*,
-performance-avoid-endl,
readability-*,
-readability-magic-numbers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What magic numbers are being complained about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2.0 here
I do not want to especially name it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should disable the check around the 1 line (stackoverflow).

This might alternatively be fixed by using the log2 with ints as suggested in another comment.

-readability-function-cognitive-complexity,
-readability-identifier-length,
-readability-named-parameter,
Expand Down
6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,8 @@ portFFT is still in early development. The supported configurations are:
* Arbitrary forward and backward offsets
* Arbitrary strides and distance where the problem size + auxilary data fits in the registers of a single work-item.

Any 1D arbitrarily large input size that fits in global memory is supported, with a restriction that large input sizes should not have large prime factors.
The largest prime factor depend on the device and the values set by `PORTFFT_REGISTERS_PER_WI` and `PORTFFT_SUBGROUP_SIZES`.
For instance with `PORTFFT_REGISTERS_PER_WI` set to `128` (resp. `256`) each work-item can hold a maximum of 27 (resp. 56) complex values, thus with `PORTFFT_SUBGROUP_SIZES` set to `32` the largest prime factor cannot exceed `27*32=864` (resp. `56*32=1792`).
portFFT may allocate up to `2 * PORTFFT_MAX_CONCURRENT_KERNELS * input_size` scratch memory, depending on the configuration passed.
Any 1D arbitrarily large input size that fits in global memory is supported.
portFFT may allocate up to `2 * PORTFFT_MAX_CONCURRENT_KERNELS * input_size` scratch memory, in addition to memory allocated to hold precomputed values to be used during compute, depending on the configuration passed.

Any batch size is supported as long as the input and output data fits in global memory.

Expand Down
613 changes: 375 additions & 238 deletions src/portfft/committed_descriptor_impl.hpp

Large diffs are not rendered by default.

76 changes: 76 additions & 0 deletions src/portfft/common/bluestein.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
/***************************************************************************
*
* Copyright (C) Codeplay Software Ltd.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Codeplay's portFFT
*
**************************************************************************/

#ifndef PORTFFT_COMMON_BLUESTEIN_HPP
#define PORTFFT_COMMON_BLUESTEIN_HPP

#include "portfft/common/host_fft.hpp"
#include "portfft/defines.hpp"

#include <complex>
#include <sycl/sycl.hpp>

namespace portfft {
namespace detail {
/**
* Utility function to get the dft transform of the chirp signal
* @tparam T Scalar Type
* @param ptr Host Pointer containing the load/store modifiers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param ptr Host Pointer containing the load/store modifiers.
* @param store_modifiers Host pointer containing the load/store modifiers.

* @param committed_size original problem size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is original problem?

* @param dimension_size padded size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If padded size is the best description for dimension_size, why isn't the variable called padded_size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whichever name is better, this documentation should be improved.

*/
template <typename T>
void get_fft_chirp_signal(T* ptr, std::size_t committed_size, std::size_t dimension_size) {
using complex_t = std::complex<T>;
std::vector<complex_t> chirp_signal(dimension_size, 0);
std::vector<complex_t> chirp_fft(dimension_size, 0);
for (std::size_t i = 0; i < committed_size; i++) {
double theta = M_PI * static_cast<double>(i * i) / static_cast<double>(committed_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M_PI is not standard C++.

chirp_signal[i] = complex_t(static_cast<T>(std::cos(theta)), static_cast<T>(std::sin(theta)));
}
std::size_t num_zeros = dimension_size - 2 * committed_size + 1;
for (std::size_t i = 0; i < committed_size; i++) {
chirp_signal[committed_size + num_zeros + i - 1] = chirp_signal[committed_size - i];
}
host_naive_dft(chirp_signal.data(), chirp_fft.data(), dimension_size);
std::memcpy(ptr, reinterpret_cast<T*>(chirp_fft.data()), 2 * dimension_size * sizeof(T));
}

/**
* Populates input modifiers required for bluestein
* @tparam T Scalar Type
* @param ptr Host Pointer containing the load/store modifiers.
* @param committed_size committed problem length
* @param dimension_size padded dft length
*/
template <typename T>
void populate_bluestein_input_modifiers(T* ptr, std::size_t committed_size, std::size_t dimension_size) {
using complex_t = std::complex<T>;
std::vector<complex_t> scratch(dimension_size, 0);
for (std::size_t i = 0; i < committed_size; i++) {
double theta = -M_PI * static_cast<double>(i * i) / static_cast<double>(committed_size);
scratch[i] = complex_t(static_cast<T>(std::cos(theta)), static_cast<T>(std::sin(theta)));
}
std::memcpy(ptr, reinterpret_cast<T*>(scratch.data()), 2 * dimension_size * sizeof(T));
}
} // namespace detail
} // namespace portfft

#endif
Loading