Skip to content
Closed
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
6a6672f
intial factorization changes
AD2605 Nov 27, 2023
a0b6218
passing vector to contain l2 events in global level FFTs
AD2605 Nov 28, 2023
d6dff45
passing pre-allocated vector to transpose_level to collect llc events
AD2605 Nov 28, 2023
74c8c80
passing dimension_struct instead of kernel_data_struct
AD2605 Nov 29, 2023
386c98f
further changes to descriptor
AD2605 Nov 30, 2023
e4c8d27
Merge remote-tracking branch 'origin/atharva/refactor' into atharva/b…
AD2605 Nov 30, 2023
6236bba
WIP
AD2605 Dec 7, 2023
dbc0919
correctly populating factors and sub batches vectors for backward fac…
AD2605 Dec 11, 2023
8de1139
update setting of specialization constants
AD2605 Dec 12, 2023
69f674e
added host naive_dft and generation of chirp signal
AD2605 Dec 12, 2023
0f7f7cb
further changes and resolved all warnings
AD2605 Dec 12, 2023
308351d
populate bluestein specific twiddles in device pointer
AD2605 Dec 12, 2023
8d506bd
fixed issue is setting dimension size
AD2605 Dec 14, 2023
b530624
added backward pass required for bluestein
AD2605 Dec 14, 2023
2d3c88a
added prime sized tests
AD2605 Dec 14, 2023
67c3c7f
fix compilation issues
AD2605 Dec 20, 2023
b23be66
add copy function in global dispatcher
AD2605 Dec 20, 2023
4a92d8c
enable taking conjugate before and after fft compute
AD2605 Dec 21, 2023
651edd1
fix compilation issues
AD2605 Dec 21, 2023
9fd1566
refactor repetative conjugate snippet into a utility function
AD2605 Jan 3, 2024
c4e6549
ignore templated lambda C++20 warning (for now)
AD2605 Jan 3, 2024
3f982e7
Merge remote-tracking branch 'origin/main' into atharva/bluestein
AD2605 Jan 8, 2024
ea1995e
added event dependencies for copy before and after compute for prime …
AD2605 Jan 9, 2024
c56475f
fix bugs in prepare_implementation and build_w_spec_const
AD2605 Jan 9, 2024
7e6dd9e
add option to increment store modifier pointer
AD2605 Jan 9, 2024
78e1372
remove readability-magic-numbers from clang-tidy
AD2605 Jan 9, 2024
730fecc
further changes in global's calculate_twiddles_struct to accomodate b…
AD2605 Jan 11, 2024
1be4afc
fix compilation and warning
AD2605 Jan 12, 2024
4cf6ca7
updated apply_modifier in workitem impl after layout changes
AD2605 Jan 12, 2024
935477d
Revert "updated apply_modifier in workitem impl after layout changes"
AD2605 Jan 12, 2024
eae3341
bugfixes in calculate_twiddles and ensure coalesced accesses in worki…
AD2605 Jan 15, 2024
1861125
pass load modifier pointer and loc memory to launch_kernel and dispat…
AD2605 Jan 16, 2024
63dd473
add option for subgroup loads in apply_modifier
AD2605 Jan 16, 2024
9ea81fe
transpose load modifiers if required
AD2605 Jan 16, 2024
2d70170
setting of increment store pointer spec const and using less memory i…
AD2605 Jan 16, 2024
a07ec9c
updated doxygens and descriptions
AD2605 Jan 16, 2024
8a8371d
remove direction
AD2605 Jan 17, 2024
310945f
fix compilation errors
AD2605 Jan 17, 2024
c094e4d
correctly set conjugate spec constants for global impl, and workgroup…
AD2605 Jan 18, 2024
12e618f
set scale_factor as spec constant
AD2605 Jan 18, 2024
39d1632
remove scale_factor from friend declaration
AD2605 Jan 18, 2024
a152e0d
resolve warnings and add TODO
AD2605 Jan 18, 2024
80167b5
set scale factor only for the last kernel
AD2605 Jan 18, 2024
155d417
fix scaled tests
AD2605 Jan 18, 2024
b40a91f
change push_back to emplace_back
AD2605 Jan 19, 2024
c935e73
Merge remote-tracking branch 'origin/main' into atharva/spec_const_di…
AD2605 Jan 19, 2024
5976144
fix compilation failure with multiple subgroup sizes
AD2605 Jan 19, 2024
93ab8a4
add batch one to offset tests
AD2605 Jan 19, 2024
6fcb7d4
set strides, distance and offset on the basis of direction
AD2605 Jan 19, 2024
e4167ae
review comments 1
AD2605 Jan 19, 2024
d78ac6f
Merge remote-tracking branch 'origin/atharva/spec_const_dir_scale' in…
AD2605 Jan 21, 2024
5fe5eaa
renamed all take_conjugate_* to conjugate_*
AD2605 Jan 23, 2024
8ea2076
review comments 2
AD2605 Jan 23, 2024
a7034bb
Update description of compute direction
AD2605 Jan 23, 2024
3910c38
added get_spect_constant_scale
AD2605 Jan 23, 2024
681c587
Merge remote-tracking branch 'origin/atharva/spec_const_dir_scale' in…
AD2605 Jan 23, 2024
7c8b0c0
add dimension_struct back as a parameter to calculate_twiddles_struct
AD2605 Jan 23, 2024
e1050c1
make get_local_memory_usage a member function
AD2605 Jan 23, 2024
ff650b5
Merge remote-tracking branch 'origin/atharva/spec_const_dir_scale' in…
AD2605 Jan 23, 2024
a57ada8
doxygen fixes
t4c1 Feb 2, 2024
01cd4d4
adding back bluestein parts bit by bit
AD2605 Feb 2, 2024
32b5582
Merge remote-tracking branch 'origin/doxy_fix' into atharva/bluestein
AD2605 Feb 2, 2024
c6663d8
reimplment bluestein
AD2605 Feb 6, 2024
aed81cf
remove check for usm support
AD2605 Feb 6, 2024
fbe0b3d
Merge remote-tracking branch 'origin/main' into atharva/bluestein
AD2605 Feb 6, 2024
d96a158
refactoring changes
AD2605 Feb 7, 2024
4976174
doxygen fixes
AD2605 Feb 7, 2024
b135650
Merge remote-tracking branch 'origin/main' into atharva/bluestein
AD2605 Feb 13, 2024
ef62c2f
formatting changes
AD2605 Feb 13, 2024
fca1a80
fix compilation 1
AD2605 Feb 14, 2024
6f29039
warnings and compilation 2
AD2605 Feb 14, 2024
c885293
fix all compilation errors
AD2605 Feb 14, 2024
6be4c6e
removed local memory usage for store modifiers
AD2605 Feb 25, 2024
7a2afe8
added global2global copy kernel
AD2605 Feb 26, 2024
75f55d9
review comments 1
AD2605 Feb 27, 2024
b3a1cd7
added some comments
AD2605 Feb 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Checks: >
performance-*,
-performance-avoid-endl,
readability-*,
-readability-magic-numbers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What magic numbers are being complained about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2.0 here
I do not want to especially name it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should disable the check around the 1 line (stackoverflow).

This might alternatively be fixed by using the log2 with ints as suggested in another comment.

-readability-function-cognitive-complexity,
-readability-identifier-length,
-readability-named-parameter,
Expand Down
543 changes: 303 additions & 240 deletions src/portfft/committed_descriptor_impl.hpp

Large diffs are not rendered by default.

80 changes: 80 additions & 0 deletions src/portfft/common/bluestein.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
/***************************************************************************
*
* Copyright (C) Codeplay Software Ltd.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Codeplay's portFFT
*
**************************************************************************/

#ifndef PORTFFT_COMMON_BLUESTEIN_HPP
#define PORTFFT_COMMON_BLUESTEIN_HPP

#include "portfft/common/host_fft.hpp"
#include "portfft/defines.hpp"

#include <cmath>
#include <complex>
#include <sycl/sycl.hpp>

namespace portfft {
namespace detail {
/**
* Utility function to get chirp signal and fft
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Utility function to get chirp signal and fft
* Utility function to get chirp signal and its dft transform

I am assuming this is what you maeant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Fixed now

* @tparam T Scalar Type
* @param ptr Host Pointer containing the load/store modifiers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param ptr Host Pointer containing the load/store modifiers.
* @param store_modifiers Host pointer containing the load/store modifiers.

* @param committed_size original problem size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is original problem?

* @param dimension_size padded size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If padded size is the best description for dimension_size, why isn't the variable called padded_size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whichever name is better, this documentation should be improved.

*/
template <typename T>
void get_fft_chirp_signal(T* ptr, std::size_t committed_size, std::size_t dimension_size) {
using ctype = std::complex<T>;
ctype* chirp_signal = (ctype*)calloc(dimension_size, sizeof(ctype));
ctype* chirp_fft = (ctype*)malloc(dimension_size * sizeof(ctype));
for (std::size_t i = 0; i < committed_size; i++) {
double theta = M_PI * static_cast<double>(i * i) / static_cast<double>(committed_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M_PI is not standard C++.

chirp_signal[i] = ctype(static_cast<T>(std::cos(theta)), static_cast<T>(std::sin(theta)));
}
std::size_t num_zeros = dimension_size - 2 * committed_size + 1;
for (std::size_t i = 0; i < committed_size; i++) {
chirp_signal[committed_size + num_zeros + i - 1] = chirp_signal[committed_size - i];
}
naive_dft(chirp_signal, chirp_fft, dimension_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we obtain this analytically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not get your question, could you elaborate a bit ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chirp signal is analytic (ie. you can express it mathematically). Is it possible to obtain a closed-form expression for the DFT of the chirp signal in the same way you can for square waves, triangular waves etc?

std::memcpy(ptr, reinterpret_cast<T*>(&chirp_fft[0]), 2 * dimension_size * sizeof(T));
free(chirp_signal);
free(chirp_fft);
}

/**
* Populates input modifiers required for bluestein
* @tparam T Scalar Type
* @param ptr Host Pointer containing the load/store modifiers.
* @param committed_size original problem size
* @param dimension_size padded size
*/
template <typename T>
void populate_bluestein_input_modifiers(T* ptr, std::size_t committed_size, std::size_t dimension_size) {
using ctype = std::complex<T>;
ctype* scratch = (ctype*)calloc(dimension_size, sizeof(ctype));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector

for (std::size_t i = 0; i < committed_size; i++) {
double theta = -M_PI * static_cast<double>(i * i) / static_cast<double>(committed_size);
scratch[i] = ctype(static_cast<T>(std::cos(theta)), static_cast<T>(std::sin(theta)));
}
std::memcpy(ptr, reinterpret_cast<T*>(&scratch[0]), 2 * dimension_size * sizeof(T));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculate directly into output.

free(scratch);
}
} // namespace detail
} // namespace portfft

#endif
208 changes: 159 additions & 49 deletions src/portfft/common/global.hpp

Large diffs are not rendered by default.

53 changes: 53 additions & 0 deletions src/portfft/common/host_fft.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/***************************************************************************
*
* Copyright (C) Codeplay Software Ltd.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Codeplay's portFFT
*
**************************************************************************/

#ifndef PORTFFT_COMMON_HOST_FFT_HPP
#define PORTFFT_COMMON_HOST_FFT_HPP

#include "portfft/defines.hpp"
#include <complex>

namespace portfft {
namespace detail {

/**
* Host Naive DFT. Works OOP only
* @tparam T Scalar Type
* @param input input pointer
* @param output output pointer
* @param fft_size fft size
*/
template <typename T>
void naive_dft(std::complex<T>* input, std::complex<T>* output, IdxGlobal fft_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big a DFT do we expect to compute with this? Can we use portFFT's SYCL implementation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any size FFTs are acceptable, any reason you ask that question ?. In case you are worried about the time to compute, I can replace this with a CT.
We cannot use our kernels here it would require jitting of a new kernel, at that point a host side CT ( which I assume this will become eventually) would be faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the scaling of naive DFTs, we should be careful that calling this function does not become more expensive than the actual DFT we are trying to compute.

I also don't especially like the notion that we'd need a host implementation of CT - we're trying to write a GPU implementation, not a CPU implementation!

using ctype = std::complex<T>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctype sounds special in c++. Consider complex_t.

for (int i = 0; i < fft_size; i++) {
ctype temp = ctype(0, 0);
for (int j = 0; j < fft_size; j++) {
ctype multiplier = ctype(static_cast<T>(std::cos((-2 * M_PI * i * j) / static_cast<double>(fft_size))),
static_cast<T>(std::sin((-2 * M_PI * i * j) / static_cast<double>(fft_size))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • M_PI is not C++.
  • cospi and sinpi.

temp += input[j] * multiplier;
}
output[i] = temp;
}
}
} // namespace detail
} // namespace portfft

#endif
8 changes: 4 additions & 4 deletions src/portfft/common/workgroup.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,15 +60,15 @@ namespace detail {
* @tparam SubgroupSize Size of the subgroup
* @tparam LocalT The type of the local view
* @tparam T Scalar type
* @param loc local accessor containing the input
* @param loc View of the local memory containing the input
* @param loc_twiddles Pointer to twiddles to be used by sub group FFTs
* @param wg_twiddles Pointer to precalculated twiddles which are to be used before second set of FFTs
* @param scaling_factor Scalar factor with which the result is to be scaled
* @param max_num_batches_in_local_mem Number of batches local memory is allocated for
* @param batch_num_in_local Id of the local memory batch to work on
* @param load_modifier_data Pointer to the load modifier data in global Memory
* @param store_modifier_data Pointer to the store modifier data in global Memory
* @param batch_num_in_kernel Absosulte batch from which batches loaded in local memory will be computed
* @param batch_num_in_kernel Absolute batch from which batches loaded in local memory will be computed
* @param dft_size Size of each DFT to calculate
* @param stride_within_dft Stride between elements of each DFT - also the number of the DFTs in the inner dimension
* @param ndfts_in_outer_dimension Number of DFTs in outer dimension
Expand Down Expand Up @@ -300,13 +300,13 @@ __attribute__((always_inline)) inline void dimension_dft(
* @tparam LocalT Local memory view type
* @tparam T Scalar type
*
* @param loc A view of a local accessor containing input
* @param loc View of the local memory containing the input
* @param loc_twiddles Pointer to twiddles to be used by sub group FFTs
* @param wg_twiddles Pointer to precalculated twiddles which are to be used before second set of FFTs
* @param scaling_factor Scalar factor with which the result is to be scaled
* @param max_num_batches_in_local_mem Number of batches local memory is allocated for
* @param batch_num_in_local Id of the local memory batch to work on
* @param batch_num_in_kernel Absosulte batch from which batches loaded in local memory will be computed
* @param batch_num_in_kernel Absolute batch from which batches loaded in local memory will be computed
* @param load_modifier_data Pointer to the load modifier data in global Memory
* @param store_modifier_data Pointer to the store modifier data in global Memory
* @param fft_size Problem Size
Expand Down
Loading