@@ -190,6 +190,9 @@ RAJA::expt::Reduce
190
190
..................
191
191
::
192
192
193
+ using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
194
+ using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;
195
+
193
196
double* a = ...;
194
197
195
198
double rs = 0.0;
@@ -198,9 +201,9 @@ RAJA::expt::Reduce
198
201
RAJA::forall<EXEC_POL> ( Res, Seg,
199
202
RAJA::expt::Reduce<RAJA::operators::plus>(&rs),
200
203
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm),
201
- [=] (int i, double & _rs, double & _rm) {
204
+ [=] (int i, VALOP_DOUBLE_SUM & _rs, VALOP_DOUBLE_MIN & _rm) {
202
205
_rs += a[i];
203
- _rm = RAJA_MIN (a[i], _rm );
206
+ _rm.min (a[i]);
204
207
}
205
208
);
206
209
@@ -213,13 +216,14 @@ RAJA::expt::Reduce
213
216
above. The reduction operation will include the existing value of
214
217
the given target variable.
215
218
* The kernel body lambda expression passed to ``RAJA::forall `` must have a
216
- parameter corresponding to each ``RAJA::expt::Reduce `` argument, ``_rs `` and
217
- ``_rm `` in the example code. These parameters refer to a local target for each
218
- reduction operation. It is important to note that the parameters follow the
219
- kernel iteration variable, ``i `` in this case, and appear in the same order
220
- as the corresponding ``RAJA::expt::Reduce `` arguments to ``RAJA::forall ``. The
221
- parameter types must be references to the types used in the
222
- ``RAJA::expt::Reduce `` arguments.
219
+ ``RAJA::expt::ValOp `` parameter corresponding to each ``RAJA::expt::Reduce ``
220
+ argument, ``_rs `` and ``_rm `` in the example code. These parameters refer to a
221
+ local target for each reduction operation. Each ``ValOp `` needs to be templated
222
+ on the underlying data type (``double `` for ``_rs `` and ``_rm ``), and the operator
223
+ being used. It is important to note that the parameters follow the kernel iteration
224
+ variable, ``i `` in this case, and appear in the same order as the corresponding
225
+ ``RAJA::expt::Reduce `` arguments to ``RAJA::forall ``. The ``ValOp `` parameters must
226
+ be references to the objects instantiated by the ``RAJA::expt::Reduce `` arguments.
223
227
* The local variables referred to by ``_rs `` and ``_rm `` are initialized with
224
228
the *identity * of the reduction operation to be performed.
225
229
* The local variables are updated in the user supplied lambda.
@@ -236,47 +240,109 @@ RAJA::expt::Reduce
236
240
compatible with the ``EXEC_POL ``. ``Seg `` is the iteration space
237
241
object for ``RAJA::forall ``.
238
242
239
- .. important :: The order and types of the local reduction variables in the
240
- kernel body lambda expression must match exactly with the
241
- corresponding ``RAJA::expt::Reduce `` arguments to the
242
- ``RAJA::forall `` to ensure that the correct result is obtained.
243
+ .. important :: * ``RAJA::expt::Reduce`` arguments must be passed to the forall.
244
+ These arguments are templated on the reduction operator, and take
245
+ a pointer to the target reduction variable that was declared outside
246
+ of the forall.
247
+ * The local reduction arguments to the lambda expression must be
248
+ ``RAJA::expt::ValOp `` references. Each ``ValOp `` reference
249
+ corresponds to a ``RAJA::expt::Reduce `` argument within the forall.
250
+ * The ordering of the ``ValOp `` references must correspond to the
251
+ ordering of the ``RAJA::expt::Reduce `` arguments to ensure that the
252
+ correct result is obtained.
253
+ * Each ``ValOp `` reduction data type and RAJA operator need to match
254
+ the data type referenced, and operator template argument in the
255
+ corresponding ``RAJA::expt::Reduce `` argument.
243
256
244
257
RAJA::expt: :ValLoc
245
258
..................
246
259
247
260
As with the current RAJA reduction interface, the new interface supports *loc *
248
261
reductions, which provide the ability to get a kernel/loop index at which the
249
262
final reduction value was found. With this new interface, *loc * reductions
250
- are performed using ``ValLoc<T> `` types. Since they are strongly typed, they
251
- provide ``min() `` and ``max() `` operations that are equivalent to using
252
- ``RAJA_MIN() `` or ``RAJA_MAX `` macros as demonstrated in the code example below.
253
- Users must use the ``getVal() `` and ``getLoc() `` methods to access the reduction
254
- results::
263
+ are performed using ``ValLoc<T,I> `` types, where ``T `` is the underlying data type,
264
+ and ``I `` is the index type. Users must use the ``getVal() `` and ``getLoc() ``
265
+ methods to access the reduction results after the kernel completes.
266
+
267
+ In the lambda expression, a ``ValLoc<T,I> `` must be wrapped in a
268
+ ``ValOp `` type, and passed to the lambda in the same order as the corresponding
269
+ ``RAJA::expt::Reduce `` arguments, e.g. ``ValOp<ValLoc<T,I>, Op> ``. In the example
270
+ below, ``VALOPLOC_DOUBLE_MIN `` represents a wrapped ``ValLoc `` usable within the
271
+ lambda.
272
+
273
+ For convenience, an alias of ``RAJA::expt::ValLocOp<T,I,Op> `` is provided.
274
+ Within the lambda, this ``ValLocOp `` object provides ``minloc ``, and ``maxloc ``
275
+ functions. In the example below, ``VALOPLOC_DOUBLE_MAX `` represents a wrapped
276
+ ``ValLoc `` using the ``ValLocOp `` alias::
255
277
256
278
double* a = ...;
257
279
280
+ using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValOp<ValLoc<double, RAJA::Index_type>,
281
+ RAJA::operators::minimum>;
282
+ using VALOPLOC_DOUBLE_MAX = RAJA::expt::ValLocOp<double, RAJA::Index_type,
283
+ RAJA::operators::minimum>;
284
+
258
285
using VL_DOUBLE = RAJA::expt::ValLoc<double>;
259
- VL_DOUBLE rm_loc;
286
+ VL_DOUBLE rmin_loc;
287
+ VL_DOUBLE rmax_loc;
260
288
261
289
RAJA::forall<EXEC_POL> ( Res, Seg,
262
- RAJA::expt::Reduce<RAJA::operators::minimum>(&rm_loc),
263
- [=] (int i, VL_DOUBLE& _rm_loc) {
264
- _rm_loc = RAJA_MIN(VL_DOUBLE(a[i], i), _rm_loc);
265
- //_rm_loc.min(VL_DOUBLE(a[i], i)); // Alternative to RAJA_MIN
290
+ RAJA::expt::Reduce<RAJA::operators::minimum>(&rmin_loc),
291
+ RAJA::expt::Reduce<RAJA::operators::maximum>(&rmax_loc),
292
+ [=] (int i, VALOPLOC_DOUBLE_MIN& _rmin_loc, VALOPLOC_DOUBLE_MAX& _rmax_loc) {
293
+ _rmin_loc.minloc(a[i], i);
294
+ _rmax_loc.minloc(a[i], i);
266
295
}
267
296
);
268
297
269
- std::cout << rm_loc.getVal() ...
270
- std::cout << rm_loc.getLoc() ...
298
+ std::cout << rmin_loc.getVal() ...
299
+ std::cout << rmin_loc.getLoc() ...
300
+ std::cout << rmax_loc.getVal() ...
301
+ std::cout << rmax_loc.getLoc() ...
302
+
303
+ Alternatively, *loc * reductions can be performed on separate reduction data, and
304
+ location variables without a ``ValLoc `` object, seen in the next example below.
305
+ To use this capability, a ``RAJA::expt::ReduceLoc `` argument must be passed to the
306
+ ``RAJA::forall ``, templated on the reduction operation, and passing in references to
307
+ the data and location. This is illustrated in the example below, with references to
308
+ ``rm `` and ``loc `` being passed into the ``ReduceLoc `` argument in the forall. The
309
+ data and location can be accessed outside of the forall directly without
310
+ ``getVal() `` or ``getLoc() `` functions.
311
+ ::
312
+
313
+ double* a = ...;
314
+
315
+ using VALOPLOC_DOUBLE_MIN = RAJA::expt: :ValLocOp<double, RAJA::Index_type,
316
+ RAJA::operators: :minimum>;
317
+
318
+ // No ValLoc needed from the user here.
319
+ double rm;
320
+ RAJA::Index_type loc;
321
+
322
+ RAJA::forall<EXEC_POL> ( Res, Seg,
323
+ RAJA::expt: :ReduceLoc<RAJA::operators: :minimum>(&rm, &loc), // --> 1 double & 1 index added
324
+ [=] (int i, VALOPLOC_DOUBLE_MIN& _rm_loc) {
325
+ _rm_loc.minloc(a[i], i);
326
+ }
327
+ );
328
+
329
+ // No getVal() or getLoc() required. Access results in their original form.
330
+ std::cout << rm ...
331
+ std::cout << loc ...
332
+
271
333
272
334
Lambda Arguments
273
335
................
274
336
275
337
This interface takes advantage of C++ parameter packs to allow users to pass
276
- any number of ``RAJA::expt::Reduce `` objects to the ``RAJA::forall `` method::
338
+ any number of ``RAJA::expt::Reduce `` arguments to the ``RAJA::forall `` method::
277
339
278
340
double* a = ...;
279
341
342
+ using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
343
+ using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;
344
+ using VALOPLOC_DOUBLE_MIN = RAJA::expt::ValLocOp<double, RAJA::Index_type, RAJA::operators::minimum>;
345
+
280
346
using VL_DOUBLE = RAJA::expt::ValLoc<double>;
281
347
VL_DOUBLE rm_loc;
282
348
double rs;
@@ -287,10 +353,13 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
287
353
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm), // --> 1 double added
288
354
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm_loc), // --> 1 VL_DOUBLE added
289
355
RAJA::expt::KernelName("MyFirstRAJAKernel"), // --> NO args added
290
- [=] (int i, double& _rs, double& _rm, VL_DOUBLE& _rm_loc) {
356
+ [=] (int i,
357
+ VALOP_DOUBLE_SUM& _rs,
358
+ VALOP_DOUBLE_MIN& _rm,
359
+ VALOPLOC_DOUBLE_MIN& _rm_loc) {
291
360
_rs += a[i];
292
- _rm = RAJA_MIN (a[i], _rm );
293
- _rm_loc.min(VL_DOUBLE( a[i], i) );
361
+ _rm.min (a[i]);
362
+ _rm_loc.minloc( a[i], i);
294
363
}
295
364
);
296
365
@@ -300,11 +369,12 @@ any number of ``RAJA::expt::Reduce`` objects to the ``RAJA::forall`` method::
300
369
std::cout << rm_loc.getLoc() ...
301
370
302
371
Again, the lambda expression parameters are in the same order as
303
- the ``RAJA::expt::Reduce `` arguments to ``RAJA::forall ``. Both the types and
304
- order of the parameters must match to get correct results and to compile
305
- successfully. Otherwise, a static assertion will be triggered::
372
+ the ``RAJA::expt::Reduce `` arguments to ``RAJA::forall ``. The ``ValOp `` underlying
373
+ data types and operators, and order of the ``ValOp `` parameters must match
374
+ the corresponding ``RAJA::expt::Reduce `` types to get correct results and to
375
+ compile successfully. Otherwise, a static assertion will be triggered::
306
376
307
- LAMBDA Not invocable w/ EXPECTED_ARGS.
377
+ LAMBDA Not invocable w/ EXPECTED_ARGS. Ordering and types must match between RAJA::expt::Reduce() and ValOp arguments.
308
378
309
379
.. note :: This static assert is only enabled when passing an undecorated C++
310
380
lambda. Meaning, this check will not happen when passing
@@ -329,19 +399,22 @@ The usage of the experiemental reductions is similar to the forall example as il
329
399
330
400
double* a = ...;
331
401
402
+ using VALOP_DOUBLE_SUM = RAJA::expt::ValOp<double, RAJA::operators::plus>;
403
+ using VALOP_DOUBLE_MIN = RAJA::expt::ValOp<double, RAJA::operators::minimum>;
404
+
332
405
double rs = 0.0;
333
406
double rm = 1e100;
334
407
335
408
RAJA::launch<EXEC_POL> ( Res,
336
409
RAJA::expt::Reduce<RAJA::operators::plus>(&rs),
337
410
RAJA::expt::Reduce<RAJA::operators::minimum>(&rm),
338
411
"LaunchReductionKernel",
339
- [=] RAJA_HOST_DEVICE (int i, double & _rs, double & _rm) {
412
+ [=] RAJA_HOST_DEVICE (int i, VALOP_DOUBLE_SUM & _rs, VALOP_DOUBLE_MIN & _rm) {
340
413
341
414
RAJA::loop<loop_pol>(ctx, Seg, [&] (int i) {
342
415
343
416
_rs += a[i];
344
- _rm = RAJA_MIN (a[i], _rm);
417
+ _rm.min (a[i], _rm);
345
418
346
419
}
347
420
);
0 commit comments