-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LFRic] [PSyAD] Missing halo exchange and immediate cleaning of halos when using setval_random with redundant calculations #2805
Comments
|
I believe the metadata is fine because the next part of the invokes are the forward kernels. I just had a look at some PSy layer code and remembered the actual problem so I will edit the description above. |
There must be something special going on here as, from a PSyclone point of view, there's nothing special about |
Is this a new issue or was it present in the 2.5.0 release do you know? (I could have broken something in the recent work I did on the halo-exchange logic.) |
I am not entirely certain I'm afraid, sorry! |
YGM on Teams. |
Hi all! @DrTVockerodtMO, I am not sure what version of PSyclone you are using. Is it test environment or 2.5.0? I modified test environment for Josh's testing of operator changes. If you are using test environment, what is the LFRic branch you are using? In any case, I will reinstall PSyclone test environment from master and I would suggest double-checking that everything works. |
@arporter, as far as I can see the current LFRic Core and Apps trunk are fine with 2.5.0. |
Thanks Iva, from discussion with Terry, the problem seems to be that this is the first time anyone has tried to run the PSyAD test harness in parallel. |
This is not the test environment no. |
The metadata for matrix-vector is:
so it does perform GH_INC on a field on a continuous (or potentially continuous) function space. This will require annexed dofs to be clean on entry. This would normally trigger a halo exchange but, when using |
Which version of PSyclone are you using @DrTVockerodtMO? I would have expected to see comments in the PSy-layer identifying each of the builtins? The problem seems to be that some of the builtins are only doing redundant updates for the annexed dofs rather than the whole L1 halo. This then means the GH_INC access in the kernel is accessing uninitialised data. |
I am using PSyclone 2.5.0. |
We are testing some LFRic branches that run the adjoint tests in parallel. We find that when using
setval_random
there is a missing halo exchange in the PSy layer code after the field values have been randomised. This causes some failures in adjoint tests that would otherwise pass if the halo is properly initialised.With redundant calculations, the halo exchange would be missed because after the field is randomised, the halo is set to dirty but then immediately set to clean. This isn't present when using
setval_random
normally. However, there is also no halo exchange aftersetval_random
is used before it enters the forward kernels, which when run in parallel causes floating invalid errors.Ultimately, we need to perform a halo exchange between the use of
setval_random
and the forward kernel in order to get the tests to work properly, whilst still randomising the annexed dofs.As an example, looking at
atlt_vorticity_advection_alg_mod
we have the following invoke:which when compiled using LFRic parallel optimisations and the options described above yields
where I have omitted the inner product calculations for brevity. Running this as-is causes a floating invalid error in the forward code.
The text was updated successfully, but these errors were encountered: