-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid repeated ESMF initialization in unittestTimeManagerMod.F90 #3017
base: b4b-dev
Are you sure you want to change the base?
Avoid repeated ESMF initialization in unittestTimeManagerMod.F90 #3017
Conversation
I'm marking this as a draft because now there are some failing unit tests. I'll work on those.... |
@billsacks I think we should rebase this to go onto b4b-dev. I think this also might be holding up work that @samsrabin wants to do, so it might be good to get this in as quickly as can be done. Although you are right we need to get all the tests to work. Do you have time to work on this? And/or do you need some help here? From looking at PFUnit documentation and code, I didn't see a way, to do only one Initialize/Finalize per test suite. The method you have here is fine for ESMF_Initialize. The finalize would be the hard part as the teardown would somehow need to know that this is the last test. Note, I add some ESMF_Initialization in my PR: #2844. I just have an ESMF_Initialize there, which does happen multiple times. So I'll add the same check you have here so it's only done once. Thanks for figuring this out. |
Oh I don't think this is holding up my work. The commit Bill has on this branch is all I need for |
Ahh awesome. Glad to hear it. |
I've retargeted this so that it'll merge to b4b-dev. Bill, could you rebase it onto the tip of b4b-dev and then force-push? |
ESMF was being repeatedly initialized in unittestTimeManagerMod.F90, which is an error. This led to unit tests failing silently. This commit fixes the issue by checking if ESMF is already initialized before calling ESMF_Initialize, and not calling ESMF_Finalize at the end of each unit test. Currently this skips any calls to ESMF_Finalize, as it is an error to re-call ESMF_Initialize after calling ESMF_Finalize. A better solution would be to only call ESMF_Initialize and ESMF_Finalize once per unit test executable. I am looking into whether that's possible. I'm not sure if it will cause any problems to not do the ESMF_Finalize. See ESCOMP#3015 (comment) for details. Resolves ESCOMP#3015
50c4eef
to
16b7ed5
Compare
I just rebased and force pushed. I'm going to look at the failing unit tests tonight and see if I can fix them... I'll keep you posted. |
Sorry for the delay on this: I got caught up in trying to get the unit tests working again on my Mac (leading to #3023) and then some other troubleshooting this week on an unrelated issue. But I'm getting back to this now. |
Here is information from Tom Clune with guidance on how to do initialization or finalization once per executable. I'll try to put something like this in place for ESMF Finalization. (I think our current mechanism for ESMF Initialization is sufficient, and in our case, probably better than doing it this way, since the way we do it now, we're only doing ESMF Initialization for tests that really need it, and it seems like that might be harder to do with an approach like the one Tom is using. But this approach seems good for ESMF finalization; I'll just make the finalization call conditional on whether ESMF has been initialized.)
And he added in a follow-up:
|
Without this change, the unit tests of clm_time_manager were aborting due to trying to destroy an ESMF clock that hadn't been initialized. Other pieces of this reset routine don't appear to be causing problems, but it seems most robust to just skip the whole thing if the time manager hasn't been initialized.
Down to one failure, in a |
This last issue is an interesting one, and one where the now-failing unit test is picking up a possibly problematic difference in behavior between the esmf_wrf_timemgr code we used to use and the real ESMF time manager code. In particular, there's a difference in the meaning of |
As far as I can tell, there aren't any issues in the production code due to this change, but I'm tearing my hair out trying to get all of the unit tests working with this behavior change. I'll keep working on it.... Also, I do think we should make some behaviors more consistent than they are now to avoid problems down the road... I'll follow up on that with a separate issue. |
There is a behavior difference between the real ESMF library and the ESMF WRF Timemgr we had been using previously, in how prevTime is defined: the ESMF WRF Timemgr defined prevTime using a -dtime offset from currTime; the real ESMF library instead defines prevTime by tracking the previous value of currTime. Therefore, now that we are using the real ESMF library, if a unit test relies on prevTime (as is the case, for example, with test_year_position_start_of_timestep in test_dynTimeInfo.pf), it is now important that any setting of the time be done in a way such that there is a call to ESMF_ClockAdvance, since that is the only way that prevTime will get set correctly. Thus, this commit updates the implementation of for_test_set_curr_date so that it involves a call to ESMF_ClockAdvance. Making this change in the implementation of for_test_set_curr_date also required some other changes in some unit test code: - I needed to remove the call to unittest_timemgr_set_curr_date at the end of unittest_timemgr_setup. This call now leads to an increment of the time step count, which was causing failures in some time manager tests that check the time step count. And it turns out that (at least with the other changes here), the call to unittest_timemgr_set_curr_date was unnecessary: it wasn't actually leading to any change in time from what was set in the initialization of the time manager. - I needed to change some set_date calls to use a year of 2 instead of 1. This is needed for calls setting the date to January 1 and the time to 0: It is no longer allowed to set the date to yr,mon,day,tod = 1,1,1,0, because the new method sets the date/time to one time step earlier and then advances the clock, and it is illegal to set the date/time to one time step before 1,1,1,0.
Okay, I have fixed the time-related unit tests in the commit I just pushed. I have also updated the top-level PR description. I may make a few other small changes, then will take this out of draft. |
'stop' calls were leading to pFUnit passes instead of fails. Need to either have shr_sys_abort (which raises a pFUnit exception) or stop with a non-zero integer return value (which causes the program to stop with an error code).
Okay, this is ready for review now. @samsrabin - I noticed that there had been some calls to 'stop' (possibly with an error message); these were leading to premature exits followed by pass results from unit tests. I fixed some of these in the most recent commit. I see at least one more - in |
I spent a few more minutes looking at this, and I think this is fine: I don't see any code paths that would lead to calling |
See #3026 |
@ekluzek @samsrabin @adrifoster - I have added some notes on pFUnit-based unit testing connected with this PR: https://github.com/billsacks/ctsm/blob/fix_pfunit_multiple_esmf_inits/doc/design/pfunit_testing.rst . I'm not tied to anything in there, so if you want to make changes, I welcome you to change it directly - no need to go back and forth on it via review. One thing I noticed today is that I think it would actually be okay to initialize ESMF multiple times: the problem just comes from trying to initialize it after finalizing. For now I'm leaving the ESMF_Initialize in a conditional on whether it has already been initialized, but if that starts to feel too cumbersome, I think it would be safe to take out that conditional: from examining the ESMF code, it looks like ESMF_Initialize returns without doing anything if ESMF has already been initialized. (I have addressed this a bit in the above design document.) |
Now I see why the unit tests were looking like they passed even though they were aborting early: It was because of the way I had originally written the call ESMF_Initialize(rc=rc)
if (rc /= ESMF_SUCCESS) then
stop 'Error in ESMF_Initialize'
end if This change (without the other changes in this PR) would result in a reported failure instead of pass: diff --git a/src/unit_test_shr/unittestTimeManagerMod.F90 b/src/unit_test_shr/unittestTimeManagerMod.F90
index 8e1cdb3f3..0db91fb35 100644
--- a/src/unit_test_shr/unittestTimeManagerMod.F90
+++ b/src/unit_test_shr/unittestTimeManagerMod.F90
@@ -91,7 +91,8 @@ subroutine unittest_timemgr_setup(dtime, use_gregorian_calendar)
call ESMF_Initialize(rc=rc)
if (rc /= ESMF_SUCCESS) then
- stop 'Error in ESMF_Initialize'
+ print *, 'Error in ESMF_Initialize'
+ stop 1
end if
if (l_use_gregorian_calendar) then |
which is to say: this was not really a problem in ESMF or pFUnit, but rather a problem with how I was doing the error handling. |
Here is some more information from Tom Clune... I don't think this changes anything, but I'm just noting it to keep others in the loop:
|
Thanks so much for this work, Bill. I'm going to try and review in the next couple of days. For now I want to add this, since #3023 was merged (I'll handle doing this):
|
Just FYI, I have changed ESMF to give a better error message in this init-final-init case (esmf-org/esmf#368), but as I noted above, it wouldn't have helped in the case that caused the problems leading to this PR. |
…te to NOT call ESMF_Finalize, see ESCOMP#3015 and ESCOMP#3017
Description of changes
ESMF was being repeatedly initialized in unittestTimeManagerMod.F90, which is an error. This led to unit tests failing silently. This PR fixes the issue by checking if ESMF is already initialized before calling ESMF_Initialize, and not calling ESMF_Finalize at the end of each unit test. Instead, this adds code to finalize ESMF at the end of a given pFUnit executable. (See #3015 (comment) for some additional details.)
This also fixes some time-related unit tests that were failing with the real ESMF library; these failures were hidden/silent before the above fix.
Specific notes
Contributors other than yourself, if any: @samsrabin
CTSM Issues Fixed (include github issue)
Are answers expected to change (and if so in what way)? No
Any User Interface Changes (namelist or namelist defaults changes)? No
Does this create a need to change or add documentation? Did you do so? No
Testing performed, if any: