Bluestein Algorithm #140

AD2605 · 2024-02-06T10:55:30Z

Adds support for sizes which has prime factors for both INTERLEAVED_COMPLEX and SPLIT_COMPLEX layout.

Checklist

Tick if relevant:

…luestein

…tors, add transposes for backward factors

…sized cases

…luestein

This reverts commit 4cf6ca7.

…to atharva/bluestein

hjabird

This is an incompete review since I know this is draft DFT and you said that some things in the dispatch logic will change.

src/portfft/common/bluestein.hpp

hjabird · 2024-02-06T11:15:13Z

src/portfft/common/bluestein.hpp

+ * @tparam T Scalar Type
+ * @param ptr Host Pointer containing the load/store modifiers.
+ * @param committed_size original problem size
+ * @param dimension_size padded size


If padded size is the best description for dimension_size, why isn't the variable called padded_size?

Whichever name is better, this documentation should be improved.

hjabird · 2024-02-06T11:17:39Z

src/portfft/common/host_fft.hpp

+ * @param fft_size fft size
+ */
+template <typename T>
+void naive_dft(std::complex<T>* input, std::complex<T>* output, IdxGlobal fft_size) {


How big a DFT do we expect to compute with this? Can we use portFFT's SYCL implementation here?

any size FFTs are acceptable, any reason you ask that question ?. In case you are worried about the time to compute, I can replace this with a CT.
We cannot use our kernels here it would require jitting of a new kernel, at that point a host side CT ( which I assume this will become eventually) would be faster.

With the scaling of naive DFTs, we should be careful that calling this function does not become more expensive than the actual DFT we are trying to compute.

I also don't especially like the notion that we'd need a host implementation of CT - we're trying to write a GPU implementation, not a CPU implementation!

hjabird · 2024-02-06T11:18:43Z

src/portfft/common/host_fft.hpp

+      ctype multiplier = ctype(static_cast<T>(std::cos((-2 * M_PI * i * j) / static_cast<double>(fft_size))),
+                               static_cast<T>(std::sin((-2 * M_PI * i * j) / static_cast<double>(fft_size))));


M_PI is not C++.

cospi and sinpi.

hjabird · 2024-02-06T11:19:49Z

src/portfft/common/host_fft.hpp

+ */
+template <typename T>
+void naive_dft(std::complex<T>* input, std::complex<T>* output, IdxGlobal fft_size) {
+  using ctype = std::complex<T>;


ctype sounds special in c++. Consider complex_t.

src/portfft/committed_descriptor_impl.hpp

hjabird · 2024-02-06T11:44:14Z

src/portfft/committed_descriptor_impl.hpp

@@ -322,7 +316,7 @@ class committed_descriptor_impl {
   * vector of kernel ids, factors
   */
  template <Idx SubgroupSize>
-  std::tuple<detail::level, kernel_ids_and_metadata_t> prepare_implementation(std::size_t kernel_num) {
+  std::tuple<detail::level, std::size_t, kernel_ids_and_metadata_t> prepare_implementation(std::size_t kernel_num) {


The tuple's new addtion is undocumented. I'd still love this to be its own class.

hjabird · 2024-02-06T11:47:59Z

src/portfft/committed_descriptor_impl.hpp

-    return {detail::level::GLOBAL, param_vec};
+    if (!detail::factorize_input(fft_size, check_and_select_target_level)) {
+      param_vec.clear();
+      fft_size = static_cast<IdxGlobal>(std::pow(2, ceil(log(static_cast<double>(fft_size)) / log(2.0))));


I think using floating-point arithmatic on integer data is a mistake, especially when we have an integer log2 function here that you could adapt.

this operation could also be a named function like round_up_to_pow_2.

hjabird · 2024-02-06T11:48:14Z

src/portfft/committed_descriptor_impl.hpp

+      detail::factorize_input(fft_size, check_and_select_target_level);
+      detail::factorize_input(fft_size, check_and_select_target_level);


Why is this called twice?

I have provided the flexibility of having different number of num_forward_factors (as in the number of forward factors during the forward FFT phase) and num_backward_factors( as in the number of backward factors during the backward phase of bluestein) , and the following reason is why -

I will soon be removing the use local memory for the load/store modifiers and replacing it with single transaction vector loads from global memory. This will reduce the local memory requirements, and thus in turn reduce the number of factors, which in-turn will reduce the trips to global memory and number of transpositions. This is also the reason why I have not used local memory for load modifiers in the subgroup implementation.

However, I have still kept the option of using local memory. This is because some architectures do not support coalesced vector loads. In that case we would need to use local memory, and thus the number of factors would increase, as the factor size is driven by the local memory usage. In Bluestein, only the first kernel during the forward phase will require load modifier (which will go into local memory in this case), and the backward phase does not. And thus it of interest to reduce the number of factors in the backward phase (as it means lesser transpositions, lesser trips to global memory, and thus more performant), hence the number of backward_factors should be lesser than number of forward_factors.

Keeping point 2 in mind, throughout the code, I have written with the assumption that num_forward_factors != num_backward_factors, and whenever we make the change to use local memory only for some hardware, nothing would need to change except for the factorization. For completeness have a look at how factorization was being done when the de facto way was to use local memory for load modifiers here, focusing on the boolean at the end. So to answer your question, its called twice so as to provide the flexibility of having different number of forward and backward factors

I think point 1 the sort of big design discussions we should have as a team before much code is written.

I guess there is some state associated with check_and_select_target_level that means it does something different after each call?
This isn't immediately clear. Maybe a good start is explicitly defining the lambda captures?
Where possible, I find pure functions easier to follow.

I guess there is some state associated with check_and_select_target_level that means it does something different after each call?
This isn't immediately clear. Maybe a good start is explicitly defining the lambda captures?
Where possible, I find pure functions easier to follow.

There is no state associated with check_and_select_target_impl. Each call to that function independent of the previous one. I very strongly prefer lambdas in this case as this is the only place the function is required.

I think point 1 the sort of big design discussions we should have as a team before much code is written.

I would say that this does not subtract / change any logic, or permanently change things. This assumes a general case and does not make any strong assumptions and provides flexibility to go the other way if required, so I suppose it is fine ?

I agree that factorization for global is a really confusing piece of code, which should be refactored. The confusing part here is that factorize_input calls check_and_select_target_level, which appends factors to param_vec, so calling it twice here actually makes sense.

But that has not been introduced in this PR, so it makes sense to leave refactoring for a separate task.

I don't like this because

It really looks like a bug.

It is hard to follow.

Would the following be accurate?

Suggested change

detail::factorize_input(fft_size, check_and_select_target_level);

detail::factorize_input(fft_size, check_and_select_target_level);

// Forward DFT within Bluestein implementation (pre convolution).

detail::factorize_input(fft_size, check_and_select_target_level);

// Backward DFT within Bluestein implementation (post convolution).

detail::factorize_input(fft_size, check_and_select_target_level);

| There is no state associated with check_and_select_target_impl

check_and_select_target_impl modifies param_vec. When I mentioned pure functions, I wasn't trying to say there is anything wrong with lambdas, just unexpected side effects.

FMarno · 2024-02-06T12:54:30Z

src/portfft/committed_descriptor_impl.hpp

+  /**
+   * Tuple of the level, an input kernel_bundle, and factors pertaining to each factor of the committed size
+   */
+  using input_bundles_and_metadata_t =
+      std::tuple<detail::level, sycl::kernel_bundle<sycl::bundle_state::input>, std::vector<Idx>>;


when you create a comment describing a tuples contents, it might be time to create a POD struct.

FMarno · 2024-02-06T16:50:45Z

src/portfft/committed_descriptor_impl.hpp

      bool is_compatible = true;
-      for (auto [level, ids, factors] : prepared_vec) {
+      std::size_t temp = 1;


temp could probably have a better name

FMarno · 2024-02-06T16:53:17Z

src/portfft/committed_descriptor_impl.hpp

      }
-
+      Idx num_backward_factors = static_cast<Idx>(prepared_vec.size()) - num_forward_factors;
+      bool is_prime = dimension_size == params.lengths[dimension_num] ? false : true;


Suggested change

bool is_prime = dimension_size == params.lengths[dimension_num] ? false : true;

bool is_prime = dimension_size != params.lengths[dimension_num];

What does is_prime mean? surely we are more concerned with large prime factors than primes in general. Maybe something like "is_padded" would be more appropriate.

src/portfft/committed_descriptor_impl.hpp

FMarno · 2024-02-07T11:58:19Z

src/portfft/committed_descriptor_impl.hpp

+      detail::factorize_input(fft_size, check_and_select_target_level);
+      detail::factorize_input(fft_size, check_and_select_target_level);


I think point 1 the sort of big design discussions we should have as a team before much code is written.

I guess there is some state associated with check_and_select_target_level that means it does something different after each call?
This isn't immediately clear. Maybe a good start is explicitly defining the lambda captures?
Where possible, I find pure functions easier to follow.

FMarno · 2024-02-07T12:07:41Z

src/portfft/committed_descriptor_impl.hpp

+    detail::complex_conjugate conjugate_on_load;
+    detail::complex_conjugate conjugate_on_store;
+    detail::elementwise_multiply multiply_on_load;
+    detail::elementwise_multiply multiply_on_store;
+    detail::apply_scale_factor scale_factor_applied;
+


Suggested change

detail::complex_conjugate conjugate_on_load;

detail::complex_conjugate conjugate_on_store;

detail::elementwise_multiply multiply_on_load;

detail::elementwise_multiply multiply_on_store;

detail::apply_scale_factor scale_factor_applied;

These could just be declared in the loop since you redefine them anyway

src/portfft/committed_descriptor_impl.hpp

FMarno · 2024-02-07T12:34:26Z

src/portfft/committed_descriptor_impl.hpp

      try {
-        PORTFFT_LOG_TRACE("Building kernel bundle with subgroup size", SubgroupSize);


Did you mean to delete this log line?

FMarno · 2024-02-07T12:36:44Z

src/portfft/committed_descriptor_impl.hpp

-    }
-    if (is_compatible) {
-      return result;
+      if (!is_compatible) {


is_compatible isn't needed, just return nullopt immediately where it is set to false

t4c1 · 2024-02-07T14:29:47Z

src/portfft/committed_descriptor_impl.hpp

+std::vector<sycl::event> compute_level(const typename committed_descriptor_impl<Scalar, Domain>::kernel_data_struct&,
+                                       const TIn&, Scalar*, const TIn&, Scalar*, const Scalar*, const Scalar*,
+                                       const Scalar*, const IdxGlobal*, IdxGlobal, IdxGlobal, Idx, IdxGlobal, IdxGlobal,
+                                       Idx, Idx, complex_storage, const std::vector<sycl::event>&, sycl::queue&);

 template <typename Scalar, domain Domain, typename TOut>
-sycl::event transpose_level(const typename committed_descriptor_impl<Scalar, Domain>::kernel_data_struct& kd_struct,
-                            const Scalar* input, TOut output, const IdxGlobal* factors_triple, IdxGlobal committed_size,
-                            Idx num_batches_in_l2, IdxGlobal n_transforms, IdxGlobal batch_start, Idx total_factors,
-                            IdxGlobal output_offset, sycl::queue& queue, const std::vector<sycl::event>& events,
-                            complex_storage storage);
+sycl::event transpose_level(const typename committed_descriptor_impl<Scalar, Domain>::kernel_data_struct&,
+                            const Scalar*, TOut, const IdxGlobal*, IdxGlobal, Idx, IdxGlobal, IdxGlobal, Idx, IdxGlobal,
+                            sycl::queue&, const std::vector<sycl::event>&, complex_storage);
+
+template <Idx, typename Scalar, domain Domain, typename TIn, typename TOut>
+sycl::event global_impl_driver(const TIn&, const TIn&, TOut, TOut, committed_descriptor_impl<Scalar, Domain>&,
+                               typename committed_descriptor_impl<Scalar, Domain>::dimension_struct&,
+                               const kernels_vec<Scalar, Domain>&, const kernels_vec<Scalar, Domain>&, Idx, IdxGlobal,
+                               IdxGlobal, std::size_t, std::size_t, IdxGlobal, IdxGlobal, IdxGlobal, complex_storage,
+                               detail::elementwise_multiply, const Scalar*);


I find it more readable to have argument names here as well. Especially as each function accepts many arguments of the same type.

t4c1 · 2024-02-07T14:30:02Z

src/portfft/committed_descriptor_impl.hpp

+  friend std::vector<sycl::event> compute_level(
+      const typename committed_descriptor_impl<Scalar1, Domain1>::kernel_data_struct&, const TIn&, Scalar1*, const TIn&,
+      Scalar1*, const Scalar1*, const Scalar1*, const Scalar1*, const IdxGlobal*, IdxGlobal, IdxGlobal, Idx, IdxGlobal,
+      IdxGlobal, Idx, Idx, complex_storage, const std::vector<sycl::event>&, sycl::queue&);

  template <typename Scalar1, domain Domain1, typename TOut>
  friend sycl::event detail::transpose_level(
-      const typename committed_descriptor_impl<Scalar1, Domain1>::kernel_data_struct& kd_struct, const Scalar1* input,
-      TOut output, const IdxGlobal* factors_triple, IdxGlobal committed_size, Idx num_batches_in_l2,
-      IdxGlobal n_transforms, IdxGlobal batch_start, Idx total_factors, IdxGlobal output_offset, sycl::queue& queue,
-      const std::vector<sycl::event>& events, complex_storage storage);
+      const typename committed_descriptor_impl<Scalar1, Domain1>::kernel_data_struct&, const Scalar1*, TOut,
+      const IdxGlobal*, IdxGlobal, Idx, IdxGlobal, IdxGlobal, Idx, IdxGlobal, sycl::queue&,
+      const std::vector<sycl::event>&, complex_storage);
+
+  template <Idx, typename Scalar1, domain Domain1, typename TIn, typename TOut>
+  friend sycl::event global_impl_driver(const TIn&, const TIn&, TOut, TOut,
+                                        committed_descriptor_impl<Scalar1, Domain1>&,
+                                        typename committed_descriptor_impl<Scalar1, Domain1>::dimension_struct&,
+                                        const kernels_vec<Scalar1, Domain1>&, const kernels_vec<Scalar1, Domain1>&, Idx,
+                                        IdxGlobal, IdxGlobal, std::size_t, std::size_t, IdxGlobal, IdxGlobal, IdxGlobal,
+                                        complex_storage, detail::elementwise_multiply, const Scalar1*);


t4c1 · 2024-02-07T14:36:08Z

src/portfft/committed_descriptor_impl.hpp

@@ -215,17 +229,29 @@ class committed_descriptor_impl {
    std::shared_ptr<IdxGlobal> factors_and_scan;
    detail::level level;
    std::size_t length;
+    std::size_t committed_length;


What is committed length and how does it differ from just length?

committed length is the length that is committed corresponding to that dimension.
Now, the problem size needn't be the size that was committed, hence the two variables. Notice how prepare_implementation is now returning the problem size as well (here)

In which case does it differ from length?

Prime sized inputs, where we need to pad (Bluestein)
and in the future, in the case of Rader FFTs, where this will becomes committed_length - 1

In that case I think the variable name could be improved. length is what user commits and that is what I think of when I read committed_length. How about impl_ct_length? Also add a comment explaining when this differs from length.

t4c1 · 2024-02-07T14:38:03Z

src/portfft/committed_descriptor_impl.hpp

-    return {detail::level::GLOBAL, param_vec};
+    if (!detail::factorize_input(fft_size, check_and_select_target_level)) {
+      param_vec.clear();
+      fft_size = static_cast<IdxGlobal>(std::pow(2, ceil(log(static_cast<double>(fft_size)) / log(2.0))));


I dislike changing this variable in place - fft size is not changing, but you are using the same variable for something else. Use a new variable. rounded_up_fft_size maybe?

t4c1 · 2024-02-07T14:47:44Z

src/portfft/committed_descriptor_impl.hpp

+      detail::factorize_input(fft_size, check_and_select_target_level);
+      detail::factorize_input(fft_size, check_and_select_target_level);


I agree that factorization for global is a really confusing piece of code, which should be refactored. The confusing part here is that factorize_input calls check_and_select_target_level, which appends factors to param_vec, so calling it twice here actually makes sense.

But that has not been introduced in this PR, so it makes sense to leave refactoring for a separate task.

t4c1 · 2024-02-07T15:12:21Z

src/portfft/common/global.hpp

+ * @return sycl::event waiting on the last transposes
+ */
+template <Idx SubgroupSize, typename Scalar, domain Domain, typename TIn, typename TOut>
+sycl::event global_impl_driver(const TIn& input, const TIn& input_imag, TOut output, TOut output_imag,


I do not like putting driver in function names. It is such an overloaded term, which makes it less clear. There are GPU drivers, compiler driver ... Can you think of another name?

I thought about this, and I genuinely cannot come up with a better name, and I feel the driver suffix is apt,
because this function launches the compute and transpose kernels, manages dependencies, and increments pointer per level. It "drives" the global implementation.

How about just global_impl? I know this is a bit of nitpicking so feel free to ignore if you disagree.

src/portfft/common/global.hpp

t4c1 · 2024-02-07T15:36:51Z

src/portfft/dispatcher/global_dispatcher.hpp

+      auto in_acc_or_usm = get_access(input, cgh);
+      auto out_acc_or_usm = get_access(output, cgh);
+      cgh.host_task([&]() {
+        for (std::size_t i = 0; i < num_copies; i++) {
+          events.push_back(queue.copy(&in_acc_or_usm[0] + i * src_stride + input_offset,
+                                      &out_acc_or_usm[0] + i * dst_stride + output_offset, num_elements_to_copy));
+        }
+      });


This is undefined behavior for buffers. queue.copy is only defined for USM pointers, NOT pointers to buffer memory. However there are queue.copy overloads that accept accessors.

I did not know that, thanks for letting me know.
I will look into it

t4c1 · 2024-02-07T15:41:21Z

src/portfft/dispatcher/global_dispatcher.hpp

 }

 template <typename Scalar, domain Domain>
 template <typename Dummy>
 struct committed_descriptor_impl<Scalar, Domain>::calculate_twiddles_struct::inner<detail::level::GLOBAL, Dummy> {
-  static Scalar* execute(committed_descriptor_impl& desc, dimension_struct& /*dimension_data*/,
+  static Scalar* execute(committed_descriptor_impl& desc, dimension_struct& dimension_data,


Why did twiddle calculation need to change?

To calculate the twiddles for both forward and backward factors, I have refactored it a bit to avoid code duplication.

t4c1 · 2024-02-07T15:46:27Z

src/portfft/utils.hpp

+  if (ptr != nullptr) {
+    return std::shared_ptr<T>(ptr, [captured_queue = queue](T* ptr) {
+      if (ptr != nullptr) {
+        sycl::free(ptr, captured_queue);
+      }
+    });
+  }
+  throw internal_error("Could not allocate usm memory of size: ", size * sizeof(T), " bytes");


Suggested change

if (ptr != nullptr) {

return std::shared_ptr<T>(ptr, [captured_queue = queue](T* ptr) {

if (ptr != nullptr) {

sycl::free(ptr, captured_queue);

}

});

}

throw internal_error("Could not allocate usm memory of size: ", size * sizeof(T), " bytes");

if (ptr == nullptr) {

throw internal_error("Could not allocate usm memory of size: ", size * sizeof(T), " bytes");

}

return std::shared_ptr<T>(ptr, [captured_queue = queue](T* ptr) {

if (ptr != nullptr) {

sycl::free(ptr, captured_queue);

}

});

This would look better with less nesting.

FMarno · 2024-02-07T13:38:23Z

src/portfft/dispatcher/workitem_dispatcher.hpp

+      if (conjugate_on_store == detail::complex_conjugate::APPLIED) {
+        conjugate_inplace(priv, fft_size);
+      }


Maybe you should always conjugate before the logging or after the logging. I realize that we had 1 before and 1 after already, but both before makes more sense to me.

AD2605 · 2024-03-05T09:43:31Z

The current implementation does not handle small prime values.
It is possible to handle small prime by extending the current changes without actually implementing subgroup and workgroup level bluestein (which is being handled in this branch https://github.com/codeplaysoftware/portFFT/tree/atharva/sg_wg_bluestein), since the scope of this branch is to handle all prime sizes, I will be closing this pull request and re-open it once again once all prime sized values are handled

AD2605 added 30 commits November 27, 2023 17:19

intial factorization changes

6a6672f

passing vector to contain l2 events in global level FFTs

a0b6218

passing pre-allocated vector to transpose_level to collect llc events

d6dff45

passing dimension_struct instead of kernel_data_struct

74c8c80

further changes to descriptor

386c98f

Merge remote-tracking branch 'origin/atharva/refactor' into atharva/b…

e4c8d27

…luestein

WIP

6236bba

correctly populating factors and sub batches vectors for backward fac…

dbc0919

…tors, add transposes for backward factors

update setting of specialization constants

8de1139

added host naive_dft and generation of chirp signal

69f674e

further changes and resolved all warnings

0f7f7cb

populate bluestein specific twiddles in device pointer

308351d

fixed issue is setting dimension size

8d506bd

added backward pass required for bluestein

b530624

added prime sized tests

2d3c88a

fix compilation issues

67c3c7f

add copy function in global dispatcher

b23be66

enable taking conjugate before and after fft compute

4a92d8c

fix compilation issues

651edd1

refactor repetative conjugate snippet into a utility function

9fd1566

ignore templated lambda C++20 warning (for now)

c4e6549

Merge remote-tracking branch 'origin/main' into atharva/bluestein

3f982e7

added event dependencies for copy before and after compute for prime …

ea1995e

…sized cases

fix bugs in prepare_implementation and build_w_spec_const

c56475f

add option to increment store modifier pointer

7e6dd9e

remove readability-magic-numbers from clang-tidy

78e1372

further changes in global's calculate_twiddles_struct to accomodate b…

730fecc

…luestein

fix compilation and warning

1be4afc

updated apply_modifier in workitem impl after layout changes

4cf6ca7

Revert "updated apply_modifier in workitem impl after layout changes"

935477d

This reverts commit 4cf6ca7.

AD2605 and others added 8 commits January 23, 2024 17:30

add dimension_struct back as a parameter to calculate_twiddles_struct

7c8b0c0

make get_local_memory_usage a member function

e1050c1

Merge remote-tracking branch 'origin/atharva/spec_const_dir_scale' in…

ff650b5

…to atharva/bluestein

doxygen fixes

a57ada8

adding back bluestein parts bit by bit

01cd4d4

Merge remote-tracking branch 'origin/doxy_fix' into atharva/bluestein

32b5582

reimplment bluestein

c6663d8

remove check for usm support

aed81cf

AD2605 marked this pull request as draft February 6, 2024 10:57

hjabird reviewed Feb 6, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into atharva/bluestein

fbe0b3d

FMarno reviewed Feb 6, 2024

View reviewed changes

AD2605 added 2 commits February 7, 2024 09:48

refactoring changes

d96a158

doxygen fixes

4976174

AD2605 marked this pull request as ready for review February 7, 2024 10:27

FMarno reviewed Feb 7, 2024

View reviewed changes

t4c1 requested changes Feb 7, 2024

View reviewed changes

FMarno reviewed Feb 7, 2024

View reviewed changes

AD2605 added 5 commits February 13, 2024 16:47

Merge remote-tracking branch 'origin/main' into atharva/bluestein

b135650

formatting changes

ef62c2f

fix compilation 1

fca1a80

warnings and compilation 2

6f29039

fix all compilation errors

c885293

AD2605 marked this pull request as draft February 20, 2024 11:10

AD2605 added 4 commits February 25, 2024 21:38

removed local memory usage for store modifiers

6be4c6e

added global2global copy kernel

7a2afe8

review comments 1

75f55d9

added some comments

b3a1cd7

AD2605 closed this Mar 5, 2024

		ctype multiplier = ctype(static_cast<T>(std::cos((-2 * M_PI * i * j) / static_cast<double>(fft_size))),
		static_cast<T>(std::sin((-2 * M_PI * i * j) / static_cast<double>(fft_size))));

		detail::factorize_input(fft_size, check_and_select_target_level);
		detail::factorize_input(fft_size, check_and_select_target_level);

	bool is_prime = dimension_size == params.lengths[dimension_num] ? false : true;
	bool is_prime = dimension_size != params.lengths[dimension_num];

		try {
		PORTFFT_LOG_TRACE("Building kernel bundle with subgroup size", SubgroupSize);

Bluestein Algorithm #140

Bluestein Algorithm #140

Uh oh!

Conversation

AD2605 commented Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

hjabird left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AD2605 Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AD2605 commented Feb 6, 2024 •

edited

Loading

AD2605 Feb 7, 2024 •

edited

Loading