-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCC re-vector ad driver level and OpenMP offload #480
base: main
Are you sure you want to change the base?
Conversation
bfebe11
to
9a8f163
Compare
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/480/index.html |
…and OpenMP offload
9a8f163
to
95dc680
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #480 +/- ##
==========================================
- Coverage 96.17% 95.86% -0.31%
==========================================
Files 224 224
Lines 40386 40558 +172
==========================================
+ Hits 38842 38882 +40
- Misses 1544 1676 +132
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
# Now generate the pre- and post pragmas (OpenACC or OpenMP) | ||
pragma = None | ||
pragma_post = None | ||
if self.directive == 'openacc': | ||
if self.present_on_device: | ||
if self.assume_deviceptr: | ||
offload_args = inargs + outargs + inoutargs | ||
if offload_args: | ||
deviceptr = f' deviceptr({", ".join(offload_args)})' | ||
else: | ||
deviceptr = '' | ||
pragma = Pragma(keyword='acc', content=f'data{deviceptr}') | ||
else: | ||
deviceptr = '' | ||
pragma = Pragma(keyword='acc', content=f'data{deviceptr}') | ||
offload_args = inargs + outargs + inoutargs | ||
if offload_args: | ||
present = f' present({", ".join(offload_args)})' | ||
else: | ||
present = '' | ||
pragma = Pragma(keyword='acc', content=f'data{present}') | ||
else: | ||
offload_args = inargs + outargs + inoutargs | ||
if offload_args: | ||
present = f' present({", ".join(offload_args)})' | ||
else: | ||
present = '' | ||
pragma = Pragma(keyword='acc', content=f'data{present}') | ||
else: | ||
copyin = f'copyin({", ".join(inargs)})' if inargs else '' | ||
copy = f'copy({", ".join(inoutargs)})' if inoutargs else '' | ||
copyout = f'copyout({", ".join(outargs)})' if outargs else '' | ||
pragma = Pragma(keyword='acc', content=f'data {copyin} {copy} {copyout}') | ||
pragma_post = Pragma(keyword='acc', content='end data') | ||
copyin = f'copyin({", ".join(inargs)})' if inargs else '' | ||
copy = f'copy({", ".join(inoutargs)})' if inoutargs else '' | ||
copyout = f'copyout({", ".join(outargs)})' if outargs else '' | ||
pragma = Pragma(keyword='acc', content=f'data {copyin} {copy} {copyout}') | ||
pragma_post = Pragma(keyword='acc', content='end data') | ||
elif self.directive == 'omp-gpu': | ||
if self.present_on_device: | ||
... # TODO: OpenMP offload if self.present_on_device | ||
else: | ||
copyin = f'map(to: {", ".join(inargs)})' if inargs else '' | ||
copy = f'map(tofrom:{", ".join(inoutargs)})' if inoutargs else '' | ||
copyout = f'map(from: {", ".join(outargs)})' if outargs else '' | ||
pragma = Pragma(keyword='omp', content=f'target data {copyin} {copy} {copyout}') | ||
pragma_post = Pragma(keyword='omp', content='end target data') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something that would be great here would be an abstraction of the encoded concept: We inject statements to perform data allocation and movement, by
- specifying a list of variables that are in/out/inout/create.
- specifying the programming model.
- specifying the IR (PragmaRegion) to apply this to
I would create an abstract interface that takes this information and applies it to a region, with implementations for OpenACC, OpenMP, FIELD_API etc. That makes this easier to test standalone and then to re-use everywhere (offload trafo, pool allocator, global var offload, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we have a rudimentary version of this for FIELD-API only already, although it would probably need extending/generalising:
https://github.com/ecmwf-ifs/loki/blob/main/loki/transformations/field_api.py#L33
I fully agree though, that having a common abstraction for this would be great.
if self.directive == 'openacc': | ||
with pragmas_attached(routine, ir.Loop): | ||
for loop in FindNodes(ir.Loop).visit(routine.body): | ||
for pragma in as_tuple(loop.pragma): | ||
if is_loki_pragma(pragma, starts_with='loop vector reduction'): | ||
# Turn reduction pragmas into `!$acc` equivalent | ||
pragma._update(keyword='acc') | ||
continue | ||
|
||
if is_loki_pragma(pragma, starts_with='loop vector'): | ||
# Turn general vector pragmas into `!$acc` and add private clause | ||
private_arrs = ', '.join(v.name for v in private_arrays) | ||
private_clause = '' if not private_arrays else f' private({private_arrs})' | ||
pragma._update(keyword='acc', content=f'loop vector{private_clause}') | ||
|
||
if self.directive == 'omp-gpu': | ||
with pragmas_attached(routine, ir.Loop): | ||
for loop in FindNodes(ir.Loop).visit(routine.body): | ||
for pragma in as_tuple(loop.pragma): | ||
#TODO: how to handle vector reductions? | ||
|
||
if is_loki_pragma(pragma, starts_with='loop vector'): | ||
# TODO: need for privatizing variables/arrays? | ||
pragma_new = ir.Pragma(keyword='omp', content='parallel do') | ||
pragma_post = ir.Pragma(keyword='omp', content='end parallel do') | ||
# pragma_new = ir.Pragma(keyword='omp', content='loop bind(parallel)') | ||
# pragma_post = ir.Pragma(keyword='omp', content='end loop') | ||
|
||
# Replace existing loki pragma and add post-pragma | ||
loop_pragmas = tuple(p for p in as_tuple(loop.pragma) if p is not pragma) | ||
loop._update( | ||
pragma=loop_pragmas + (pragma_new,), | ||
pragma_post=(pragma_post,) + as_tuple(loop.pragma_post) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment, essentially: Encode the parallelisation concepts used here as a generic interface that applies them, and then overload as programming model-specific implementations for OpenMP and OpenACC.
I'm undecided whether this should be captured together with the data offload interfaces in common programming-model specific classes or have them separate by function (i.e., parallelization, data movement).
extend SCC family pipelines to allow revectorisation at driver level and OpenMP offload
nams-loki-scc-omp
src/cloudsc_loki/cloudsc_loki_omp.config
(which is currently set insrc/cloudsc_loki/CMakeLists.txt
to buildSCC
,SCC-HOIST
(some minor problem do debug) andSCC-STACK