Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCC re-vector ad driver level and OpenMP offload #480

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MichaelSt98
Copy link
Collaborator

@MichaelSt98 MichaelSt98 commented Jan 24, 2025

extend SCC family pipelines to allow revectorisation at driver level and OpenMP offload

  • you can test this via e.g., CLOUDSC branch nams-loki-scc-omp
    • new config file src/cloudsc_loki/cloudsc_loki_omp.config (which is currently set in src/cloudsc_loki/CMakeLists.txt to build SCC, SCC-HOIST (some minor problem do debug) and SCC-STACK

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/480/index.html

Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 42.91667% with 137 lines in your changes missing coverage. Please review.

Project coverage is 95.86%. Comparing base (48c5cbf) to head (95dc680).

Files with missing lines Patch % Lines
loki/transformations/single_column/vector.py 20.72% 88 Missing ⚠️
loki/transformations/single_column/annotate.py 59.45% 30 Missing ⚠️
loki/transformations/data_offload/offload.py 67.74% 10 Missing ⚠️
loki/transformations/pool_allocator.py 20.00% 4 Missing ⚠️
loki/transformations/build_system/dependency.py 50.00% 3 Missing ⚠️
loki/transformations/single_column/hoist.py 83.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #480      +/-   ##
==========================================
- Coverage   96.17%   95.86%   -0.31%     
==========================================
  Files         224      224              
  Lines       40386    40558     +172     
==========================================
+ Hits        38842    38882      +40     
- Misses       1544     1676     +132     
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 95.85% <42.91%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +156 to +189
# Now generate the pre- and post pragmas (OpenACC or OpenMP)
pragma = None
pragma_post = None
if self.directive == 'openacc':
if self.present_on_device:
if self.assume_deviceptr:
offload_args = inargs + outargs + inoutargs
if offload_args:
deviceptr = f' deviceptr({", ".join(offload_args)})'
else:
deviceptr = ''
pragma = Pragma(keyword='acc', content=f'data{deviceptr}')
else:
deviceptr = ''
pragma = Pragma(keyword='acc', content=f'data{deviceptr}')
offload_args = inargs + outargs + inoutargs
if offload_args:
present = f' present({", ".join(offload_args)})'
else:
present = ''
pragma = Pragma(keyword='acc', content=f'data{present}')
else:
offload_args = inargs + outargs + inoutargs
if offload_args:
present = f' present({", ".join(offload_args)})'
else:
present = ''
pragma = Pragma(keyword='acc', content=f'data{present}')
else:
copyin = f'copyin({", ".join(inargs)})' if inargs else ''
copy = f'copy({", ".join(inoutargs)})' if inoutargs else ''
copyout = f'copyout({", ".join(outargs)})' if outargs else ''
pragma = Pragma(keyword='acc', content=f'data {copyin} {copy} {copyout}')
pragma_post = Pragma(keyword='acc', content='end data')
copyin = f'copyin({", ".join(inargs)})' if inargs else ''
copy = f'copy({", ".join(inoutargs)})' if inoutargs else ''
copyout = f'copyout({", ".join(outargs)})' if outargs else ''
pragma = Pragma(keyword='acc', content=f'data {copyin} {copy} {copyout}')
pragma_post = Pragma(keyword='acc', content='end data')
elif self.directive == 'omp-gpu':
if self.present_on_device:
... # TODO: OpenMP offload if self.present_on_device
else:
copyin = f'map(to: {", ".join(inargs)})' if inargs else ''
copy = f'map(tofrom:{", ".join(inoutargs)})' if inoutargs else ''
copyout = f'map(from: {", ".join(outargs)})' if outargs else ''
pragma = Pragma(keyword='omp', content=f'target data {copyin} {copy} {copyout}')
pragma_post = Pragma(keyword='omp', content='end target data')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something that would be great here would be an abstraction of the encoded concept: We inject statements to perform data allocation and movement, by

  1. specifying a list of variables that are in/out/inout/create.
  2. specifying the programming model.
  3. specifying the IR (PragmaRegion) to apply this to

I would create an abstract interface that takes this information and applies it to a region, with implementations for OpenACC, OpenMP, FIELD_API etc. That makes this easier to test standalone and then to re-use everywhere (offload trafo, pool allocator, global var offload, ...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we have a rudimentary version of this for FIELD-API only already, although it would probably need extending/generalising:
https://github.com/ecmwf-ifs/loki/blob/main/loki/transformations/field_api.py#L33

I fully agree though, that having a common abstraction for this would be great.

Comment on lines +72 to +105
if self.directive == 'openacc':
with pragmas_attached(routine, ir.Loop):
for loop in FindNodes(ir.Loop).visit(routine.body):
for pragma in as_tuple(loop.pragma):
if is_loki_pragma(pragma, starts_with='loop vector reduction'):
# Turn reduction pragmas into `!$acc` equivalent
pragma._update(keyword='acc')
continue

if is_loki_pragma(pragma, starts_with='loop vector'):
# Turn general vector pragmas into `!$acc` and add private clause
private_arrs = ', '.join(v.name for v in private_arrays)
private_clause = '' if not private_arrays else f' private({private_arrs})'
pragma._update(keyword='acc', content=f'loop vector{private_clause}')

if self.directive == 'omp-gpu':
with pragmas_attached(routine, ir.Loop):
for loop in FindNodes(ir.Loop).visit(routine.body):
for pragma in as_tuple(loop.pragma):
#TODO: how to handle vector reductions?

if is_loki_pragma(pragma, starts_with='loop vector'):
# TODO: need for privatizing variables/arrays?
pragma_new = ir.Pragma(keyword='omp', content='parallel do')
pragma_post = ir.Pragma(keyword='omp', content='end parallel do')
# pragma_new = ir.Pragma(keyword='omp', content='loop bind(parallel)')
# pragma_post = ir.Pragma(keyword='omp', content='end loop')

# Replace existing loki pragma and add post-pragma
loop_pragmas = tuple(p for p in as_tuple(loop.pragma) if p is not pragma)
loop._update(
pragma=loop_pragmas + (pragma_new,),
pragma_post=(pragma_post,) + as_tuple(loop.pragma_post)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, essentially: Encode the parallelisation concepts used here as a generic interface that applies them, and then overload as programming model-specific implementations for OpenMP and OpenACC.

I'm undecided whether this should be captured together with the data offload interfaces in common programming-model specific classes or have them separate by function (i.e., parallelization, data movement).

@MichaelSt98
Copy link
Collaborator Author

Superseded by #483 and #485. PR and branch to be deleted once those two PRs are accepted and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants