Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to use thrust::reduce on any. #685

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion include/matx/core/operator_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@

#pragma once

#include <thrust/reduce.h>

#include "matx/core/iterator.h"
#include "matx/core/type_utils.h"
#include "matx/operators/collapse.h"
Expand Down Expand Up @@ -62,6 +64,30 @@ namespace matx {
return func(in, iter, bi, ei);
}

template <typename In, typename Out, typename Op, bool ConvertType = true>
__MATX_HOST__ __MATX_INLINE__ auto ReduceInputThrust(In &&in, Out &&out, Op &&op) {
typename detail::base_type_t<In> in_base = in;

auto begin = BeginOffset{in_base};
auto end = EndOffset{in_base};

if constexpr (in_base.Rank() < 2 && is_tensor_view_v<In>) {
using value_t = typename Out::value_type;
const auto &iter = matx::RandomOperatorIterator<decltype(in_base), ConvertType>{in_base};

if (in_base.IsContiguous()) {
// the conversion is already handled for us by RandomOperatorIterator
thrust::reduce(
iter + *begin, iter + *end, op.Init(), op
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cliffburdick

So this function is responsible for dealing with the facct that In may not necessarily be a tensor_t but rather an operator of some sort. It'll get the respective offsets and construct an iterator for thrust to use, which, thankfully, does seem to be perfectly compatible.

Consequentially it would seem the code has also become potentially simpler than it's counterpart on main. Thrust is now responsible for deciding whom exactly to use in CUB rather than MatX. I might be missing some context though on whether or not MatX needing to be responsible for what function to use in CUB/Thrust.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight, I realize now that there might be issues with this approach. Does matXBinaryOp for example need to be utilized? I see that it has methods like PreRun PostRun etc that make use of the Executor(which I have used here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ZelboK since a tensor is an operator, the iterator wrapper can turn any MatX operator type into an iterator. However, we do the contiguous check to allow CUB/thrust an optimization if it's a flat pointer with contiguous strides.

Regarding your second comment, the ReduceInput function shouldn't/doesn't need to know whether it's CUB or thrust. That's something you pass in your lambda that you give to the function. I'm saying this all without actually trying it, but if you assume the iterator type is compatible between both libraries then there may not be any changes to that code. This was how the example I pointed to previously worked (just search ReduceInput in cub.h).

I'm not sure what your question about matxBinaryOp is, but that's a wrapper class for any binary type and should be completely separate from this. matxBinaryOp, like most of our types is an operator, and you can pass it to thrust/cub and have the iterator pull from it.

Copy link
Author

@ZelboK ZelboK Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cliffburdick I see. I should have clarified - I actually did try to follow that example you've shown me like this(but I did it incorrectly). If you're curious, here's what I did.

auto output_ = cuda::std::get<0>(out);
        using out_t = decltype(output_);
        using value_t = typename out_t::value_type;
        using input_t = typename detail::base_type_t<OpA>; // incorrect, this is not any.h responsibility
        using output_t = typename detail::base_type_t<out_t>; // incorrect
        input_t in_base = a_;
        output_t out_base = output_;
        auto op = detail::reduceOpAny<value_t>();

        auto fn = [&](input_t &&input, 
          output_t && output, 
          BeginOffset<input_t> &&begin,  
          EndOffset<input_t> &&end) { 
            return thrust::reduce(
              input + *begin, input + *end, op.Init(), op
            );
        };
        auto rv = ReduceInput(fn, out_base, in_base);

which brought me to my question of matXBinaryOp. This led to errors like

has no member "type"
    typedef typename thrust::iterator_system<InputIterator>::type System;   

where InputIterator was of type

matx::detail::matxBinaryOp<matx::detail::ConstVal<int, cuda::std::__4::array<matx::index_t, 2UL>>, matx::index_t, matx::detail::BinOp<int, matx::index_t, matx::detail::AddF<int, matx::index_t>>>

I should not be passing the base types though. Let me push the fix. Thanks for the help.

);
}
}

auto collapsed = matx::lcollapse<remove_cvref_t<decltype(out)>::Rank()>(rcollapse<remove_cvref_t<decltype(in)>::Rank() - remove_cvref_t<decltype(out)>::Rank()>(in_base));
const auto &iter = matx::RandomOperatorIterator<decltype(collapsed), ConvertType>{collapsed};
return thrust::reduce(iter + *begin, iter + *end, op.Init(), op);
}

template <typename Func, typename OutputOp, typename InputOp, bool ConvertType = true>
__MATX_HOST__ __MATX_INLINE__ auto ReduceInput(Func &&func, OutputOp &&out, InputOp &&in) {
typename detail::base_type_t<InputOp> in_base = in;
Expand All @@ -83,7 +109,6 @@ namespace matx {
}
}
}

// Collapse the right-most dimensions by the difference in ranks for the reduction dimension,
// then collapse the left size by the output rank to get the batch dimensions
auto collapsed = matx::lcollapse<remove_cvref_t<decltype(out)>::Rank()>(rcollapse<remove_cvref_t<decltype(in)>::Rank() -
Expand Down
16 changes: 12 additions & 4 deletions include/matx/operators/any.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

#pragma once

#include <thrust/reduce.h>

#include "matx/core/type_utils.h"
#include "matx/operators/base_operator.h"
Expand All @@ -40,8 +41,6 @@

namespace matx {



namespace detail {
template<typename OpA, int ORank>
class AnyOp : public BaseOp<AnyOp<OpA, ORank>>
Expand Down Expand Up @@ -71,8 +70,17 @@ namespace detail {
};

template <typename Out, typename Executor>
void Exec(Out &&out, Executor &&ex) const {
any_impl(cuda::std::get<0>(out), a_, ex);
void Exec(Out &&out, Executor) const {
auto output_ = cuda::std::get<0>(out);
using out_t = decltype(output_);
using value_t = typename out_t::value_type;
ZelboK marked this conversation as resolved.
Show resolved Hide resolved
using output_t = typename detail::base_type_t<out_t>;

output_t out_base = output_;
auto op = detail::reduceOpAny<value_t>();

auto rv = ReduceInputThrust(std::forward<OpA>(a_), std::forward<out_t>(output_), std::forward<decltype(op)>(op));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We construct the op, and then send the inputs and outputs in their original form to the new function ReduceInputThrust and of which gets collapsed and dispatched to thrust accordingly.

MATX_ASSERT_STR_EXP(rv, cudaSuccess, matxCudaError, "Error in any");
}

static __MATX_INLINE__ constexpr __MATX_HOST__ __MATX_DEVICE__ int32_t Rank()
Expand Down
19 changes: 13 additions & 6 deletions include/matx/transforms/reduce.h
Original file line number Diff line number Diff line change
Expand Up @@ -798,17 +798,24 @@ template <typename T> class reduceOpMax {
* Performs a reduction of two values of type T by returning 1 if either
* of the values are non-zero.
*/
template <typename T> class reduceOpAny {
template <typename T>
class reduceOpAny {
public:
using type = T; // This type is for Thrust
using matx_reduce = bool;
using matx_no_cub_reduce = bool; // Don't use CUB for this reduction type
__MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ T Reduce(const T &v1, const T &v2)
{

__MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ T operator()(const T &v1, const T &v2) const {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be const for Thrust.

return (v1 != 0) || (v2 != 0);
}
__MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ T operator()(T &v1, T &v2) { v1 = ((v1 != 0) || (v2 != 0)); return v1; }
__MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ T Init() { return (T)(0); }
__MATX_DEVICE__ __MATX_INLINE__ void atomicReduce(T *addr, T val) { atomicAny(addr, val); }

__MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ T Init() const {
return static_cast<T>(0);
}

__MATX_DEVICE__ __MATX_INLINE__ void atomicReduce(T *addr, T val) const {
atomicAny(addr, val);
}
};

/**
Expand Down