Skip to content

Commit 729bd5c

Browse files
Merge branch 'develop' into task/rocm-6-ghaction
2 parents e3c739d + 5443545 commit 729bd5c

29 files changed

+1265
-352
lines changed

docs/sphinx/user_guide/feature/reduction.rst

Lines changed: 108 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,9 @@ RAJA::expt::Reduce
190190
..................
191191
::
192192

193+
using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
194+
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;
195+
193196
double* a = ...;
194197

195198
double rs = 0.0;
@@ -198,9 +201,9 @@ RAJA::expt::Reduce
198201
RAJA::forall<EXEC_POL> ( Res, Seg,
199202
RAJA::expt::Reduce<RAJA::operators::plus>(&rs),
200203
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm),
201-
[=] (int i, double& _rs, double& _rm) {
204+
[=] (int i, VALOP_DOUBLE_SUM& _rs, VALOP_DOUBLE_MIN& _rm) {
202205
_rs += a[i];
203-
_rm = RAJA_MIN(a[i], _rm);
206+
_rm.min(a[i]);
204207
}
205208
);
206209

@@ -213,13 +216,14 @@ RAJA::expt::Reduce
213216
above. The reduction operation will include the existing value of
214217
the given target variable.
215218
* The kernel body lambda expression passed to ``RAJA::forall`` must have a
216-
parameter corresponding to each ``RAJA::expt::Reduce`` argument, ``_rs`` and
217-
``_rm`` in the example code. These parameters refer to a local target for each
218-
reduction operation. It is important to note that the parameters follow the
219-
kernel iteration variable, ``i`` in this case, and appear in the same order
220-
as the corresponding ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The
221-
parameter types must be references to the types used in the
222-
``RAJA::expt::Reduce`` arguments.
219+
``RAJA::expt::ValOp`` parameter corresponding to each ``RAJA::expt::Reduce``
220+
argument, ``_rs`` and ``_rm`` in the example code. These parameters refer to a
221+
local target for each reduction operation. Each ``ValOp`` needs to be templated
222+
on the underlying data type (``double`` for ``_rs`` and ``_rm``), and the operator
223+
being used. It is important to note that the parameters follow the kernel iteration
224+
variable, ``i`` in this case, and appear in the same order as the corresponding
225+
``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The ``ValOp`` parameters must
226+
be references to the objects instantiated by the ``RAJA::expt::Reduce`` arguments.
223227
* The local variables referred to by ``_rs`` and ``_rm`` are initialized with
224228
the *identity* of the reduction operation to be performed.
225229
* The local variables are updated in the user supplied lambda.
@@ -236,47 +240,109 @@ RAJA::expt::Reduce
236240
compatible with the ``EXEC_POL``. ``Seg`` is the iteration space
237241
object for ``RAJA::forall``.
238242

239-
.. important:: The order and types of the local reduction variables in the
240-
kernel body lambda expression must match exactly with the
241-
corresponding ``RAJA::expt::Reduce`` arguments to the
242-
``RAJA::forall`` to ensure that the correct result is obtained.
243+
.. important:: * ``RAJA::expt::Reduce`` arguments must be passed to the forall.
244+
These arguments are templated on the reduction operator, and take
245+
a pointer to the target reduction variable that was declared outside
246+
of the forall.
247+
* The local reduction arguments to the lambda expression must be
248+
``RAJA::expt::ValOp`` references. Each ``ValOp`` reference
249+
corresponds to a ``RAJA::expt::Reduce`` argument within the forall.
250+
* The ordering of the ``ValOp`` references must correspond to the
251+
ordering of the ``RAJA::expt::Reduce`` arguments to ensure that the
252+
correct result is obtained.
253+
* Each ``ValOp`` reduction data type and RAJA operator need to match
254+
the data type referenced, and operator template argument in the
255+
corresponding ``RAJA::expt::Reduce`` argument.
243256

244257
RAJA::expt::ValLoc
245258
..................
246259

247260
As with the current RAJA reduction interface, the new interface supports *loc*
248261
reductions, which provide the ability to get a kernel/loop index at which the
249262
final reduction value was found. With this new interface, *loc* reductions
250-
are performed using ``ValLoc<T>`` types. Since they are strongly typed, they
251-
provide ``min()`` and ``max()`` operations that are equivalent to using
252-
``RAJA_MIN()`` or ``RAJA_MAX`` macros as demonstrated in the code example below.
253-
Users must use the ``getVal()`` and ``getLoc()`` methods to access the reduction
254-
results::
263+
are performed using ``ValLoc<T,I>`` types, where ``T`` is the underlying data type,
264+
and ``I`` is the index type. Users must use the ``getVal()`` and ``getLoc()``
265+
methods to access the reduction results after the kernel completes.
266+
267+
In the lambda expression, a ``ValLoc<T,I>`` must be wrapped in a
268+
``ValOp`` type, and passed to the lambda in the same order as the corresponding
269+
``RAJA::expt::Reduce`` arguments, e.g. ``ValOp<ValLoc<T,I>, Op>``. In the example
270+
below, ``VALOPLOC_DOUBLE_MIN`` represents a wrapped ``ValLoc`` usable within the
271+
lambda.
272+
273+
For convenience, an alias of ``RAJA::expt::ValLocOp<T,I,Op>`` is provided.
274+
Within the lambda, this ``ValLocOp`` object provides ``minloc``, and ``maxloc``
275+
functions. In the example below, ``VALOPLOC_DOUBLE_MAX`` represents a wrapped
276+
``ValLoc`` using the ``ValLocOp`` alias::
255277

256278
double* a = ...;
257279

280+
using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValOp<ValLoc<double, RAJA::Index_type>,
281+
RAJA::operators::minimum>;
282+
using VALOPLOC_DOUBLE_MAX = RAJA::expt::ValLocOp<double, RAJA::Index_type,
283+
RAJA::operators::minimum>;
284+
258285
using VL_DOUBLE = RAJA::expt::ValLoc<double>;
259-
VL_DOUBLE rm_loc;
286+
VL_DOUBLE rmin_loc;
287+
VL_DOUBLE rmax_loc;
260288

261289
RAJA::forall<EXEC_POL> ( Res, Seg,
262-
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm_loc),
263-
[=] (int i, VL_DOUBLE& _rm_loc) {
264-
_rm_loc = RAJA_MIN(VL_DOUBLE(a[i], i), _rm_loc);
265-
//_rm_loc.min(VL_DOUBLE(a[i], i)); // Alternative to RAJA_MIN
290+
RAJA::expt::Reduce<RAJA::operators::minimum>(&rmin_loc),
291+
RAJA::expt::Reduce<RAJA::operators::maximum>(&rmax_loc),
292+
[=] (int i, VALOPLOC_DOUBLE_MIN& _rmin_loc, VALOPLOC_DOUBLE_MAX& _rmax_loc) {
293+
_rmin_loc.minloc(a[i], i);
294+
_rmax_loc.minloc(a[i], i);
266295
}
267296
);
268297

269-
std::cout << rm_loc.getVal() ...
270-
std::cout << rm_loc.getLoc() ...
298+
std::cout << rmin_loc.getVal() ...
299+
std::cout << rmin_loc.getLoc() ...
300+
std::cout << rmax_loc.getVal() ...
301+
std::cout << rmax_loc.getLoc() ...
302+
303+
Alternatively, *loc* reductions can be performed on separate reduction data, and
304+
location variables without a ``ValLoc`` object, seen in the next example below.
305+
To use this capability, a ``RAJA::expt::ReduceLoc`` argument must be passed to the
306+
``RAJA::forall``, templated on the reduction operation, and passing in references to
307+
the data and location. This is illustrated in the example below, with references to
308+
``rm`` and ``loc`` being passed into the ``ReduceLoc`` argument in the forall. The
309+
data and location can be accessed outside of the forall directly without
310+
``getVal()`` or ``getLoc()`` functions.
311+
::
312+
313+
double* a = ...;
314+
315+
using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValLocOp<double, RAJA::Index_type,
316+
RAJA::operators::minimum>;
317+
318+
// No ValLoc needed from the user here.
319+
double rm;
320+
RAJA::Index_type loc;
321+
322+
RAJA::forall<EXEC_POL> ( Res, Seg,
323+
RAJA::expt::ReduceLoc<RAJA::operators::minimum>(&rm, &loc), // --> 1 double & 1 index added
324+
[=] (int i, VALOPLOC_DOUBLE_MIN& _rm_loc) {
325+
_rm_loc.minloc(a[i], i);
326+
}
327+
);
328+
329+
// No getVal() or getLoc() required. Access results in their original form.
330+
std::cout << rm ...
331+
std::cout << loc ...
332+
271333

272334
Lambda Arguments
273335
................
274336

275337
This interface takes advantage of C++ parameter packs to allow users to pass
276-
any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
338+
any number of ``RAJA::expt::Reduce`` arguments to the ``RAJA::forall`` method::
277339

278340
double* a = ...;
279341

342+
using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
343+
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;
344+
using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValLocOp<double, RAJA::Index_type, RAJA::operators::minimum>;
345+
280346
using VL_DOUBLE = RAJA::expt::ValLoc<double>;
281347
VL_DOUBLE rm_loc;
282348
double rs;
@@ -287,10 +353,13 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
287353
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm), // --> 1 double added
288354
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm_loc), // --> 1 VL_DOUBLE added
289355
RAJA::expt::KernelName("MyFirstRAJAKernel"), // --> NO args added
290-
[=] (int i, double& _rs, double& _rm, VL_DOUBLE& _rm_loc) {
356+
[=] (int i,
357+
VALOP_DOUBLE_SUM& _rs,
358+
VALOP_DOUBLE_MIN& _rm,
359+
VALOPLOC_DOUBLE_MIN& _rm_loc) {
291360
_rs += a[i];
292-
_rm = RAJA_MIN(a[i], _rm);
293-
_rm_loc.min(VL_DOUBLE(a[i], i));
361+
_rm.min(a[i]);
362+
_rm_loc.minloc(a[i], i);
294363
}
295364
);
296365

@@ -300,11 +369,12 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
300369
std::cout << rm_loc.getLoc() ...
301370

302371
Again, the lambda expression parameters are in the same order as
303-
the ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. Both the types and
304-
order of the parameters must match to get correct results and to compile
305-
successfully. Otherwise, a static assertion will be triggered::
372+
the ``RAJA::expt::Reduce`` arguments to ``RAJA::forall``. The ``ValOp`` underlying
373+
data types and operators, and order of the ``ValOp`` parameters must match
374+
the corresponding ``RAJA::expt::Reduce`` types to get correct results and to
375+
compile successfully. Otherwise, a static assertion will be triggered::
306376

307-
LAMBDA Not invocable w/ EXPECTED_ARGS.
377+
LAMBDA Not invocable w/ EXPECTED_ARGS. Ordering and types must match between RAJA::expt::Reduce() and ValOp arguments.
308378

309379
.. note:: This static assert is only enabled when passing an undecorated C++
310380
lambda. Meaning, this check will not happen when passing
@@ -329,19 +399,22 @@ The usage of the experiemental reductions is similar to the forall example as il
329399

330400
double* a = ...;
331401

402+
using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
403+
using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;
404+
332405
double rs = 0.0;
333406
double rm = 1e100;
334407

335408
RAJA::launch<EXEC_POL> ( Res,
336409
RAJA::expt::Reduce<RAJA::operators::plus>(&rs),
337410
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm),
338411
"LaunchReductionKernel",
339-
[=] RAJA_HOST_DEVICE (int i, double& _rs, double& _rm) {
412+
[=] RAJA_HOST_DEVICE (int i, VALOP_DOUBLE_SUM& _rs, VALOP_DOUBLE_MIN& _rm) {
340413

341414
RAJA::loop<loop_pol>(ctx, Seg, [&] (int i) {
342415

343416
_rs += a[i];
344-
_rm = RAJA_MIN(a[i], _rm);
417+
_rm.min(a[i], _rm);
345418

346419
}
347420
);

0 commit comments

Comments
 (0)