-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix clm2 support #602
Fix clm2 support #602
Conversation
zppy/templates/ts.bash
Outdated
{%- if input_files.split(".")[0] == 'cam' or input_files.split(".")[0] == 'eam' or input_files.split(".")[0] == 'elm' or input_files.split(".")[0] == 'clm2' %} | ||
--prc_typ={{ input_files.split(".")[0][:3] }} | ||
{%- else %} | ||
--prc_typ={{ prc_typ }} | ||
{%- endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BunnyVon @chengzhuzhang Does this still accomplish what #421 was trying to do? For reference, that change was:
--prc_typ={{ input_files.split(".")[0] }}
to
--prc_typ={{ input_files.split(".")[0][:3] }}
@golaz Does this still accomplish what #424 was trying to do? For reference, that change was:
{%- if input_files.split(".")[0] == 'cam' or input_files.split(".")[0] == 'eam' or input_files.split(".")[0] == 'elm' or input_files.split(".")[0] == 'clm2' %}
--prc_typ={{ input_files.split(".")[0][:3] }}
{%- else %}
--prc_typ=sgs
{%- endif %}
to:
--prc_typ={{ prc_typ }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. I was able to run the bash files by changing --prc_typ=clm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang I added a second commit with that change:
elif tmp in ("clm2",):
component = "lnd"
prc_typ = "clm"
elif tmp in ("elm",):
Is that correct? Or did we want it for elm
too?
@czender Hi Charlie, could you help to review if the logic here for get Lines 143 to 169 in 0883805
The needed change is to make prc_typ = clm for clm2 output.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I expect this will work fine.
Thank you, Charlie! |
The other commit is in fact necessary. Without it, 3 |
It appears the both-commits test was done without the #611 commit, (meaning the global time series plots don't look right), so I need to re-test with that. But it does look like we need both commits. |
@chengzhuzhang This is my testing summary so far. I'm hoping to get back to this later in the afternoon or tomorrow. Also, I think there was some confusion over my debugging/testing process. To clarify, when I say "test off latest main" I mean rebasing the code off the latest Furthermore, the issues I describe below specifically come up in the context of the complete-run test, so it is only possible to narrow scope/runtime so much. (The dry-run parameter is helpful, for instance, for checking pre-runtime things like dependencies). At this point, I am confident that the both-commits version is working. But it seems that what you were getting at in yesterday's meeting was that we really shouldn't need the 1st commit (the For reference, the respective changes: 1st commit --
=>
2nd commit --
=>
Both commits
With both the Dropping the first commit (
|
test output dir | branch | base commit | conda env | Ran pip install . && python tests/integration/utils.py ? |
lessons learned |
---|---|---|---|---|---|
/lcrc/group/e3sm/ac.forsyth2/zppy_test_complete_run_output/issue-421-post-600-1st-commit-only/v2.LR.historical_0201/post/scripts/ |
issue-421-fix-drop-1st-commit | Add center times (611) | zppy_dev_n600 | y | Note the unique_id is a misnomer. It should really be drop-1st-commit . grep -v "OK" *status shows e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_1852-1853_vs_1850-1851 , e3sm_diags_atm_monthly_180x360_aave_tc_analysis_model_vs_obs_1850-1851 , e3sm_diags_lnd_monthly_mvm_lnd_model_vs_model_1850-1851_vs_1850-1851 , tc_analysis_1852-1853 waiting, and their jobs were still in the squeue output marked as waiting for a dependency. There's also tc_analysis_1850-1851.status:RUNNING 548854 . So we have a couple problems:1) e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_1852-1853_vs_1850-1851 and e3sm_diags_lnd_monthly_mvm_lnd_model_vs_model_1850-1851_vs_1850-1851 are waiting on jobs that apparently don't exist.2) e3sm_diags_atm_monthly_180x360_aave_tc_analysis_model_vs_obs_1850-1851 and tc_analysis_1852-1853 are waiting on tc_analysis_1850-1851 , which is apparently silently failing. The .o file indicates /var/spool/slurmd/job548854/slurm_script: line 114: /lcrc/globalscratch/ac.forsyth2//tc-analysis_1850_1851/aew_v2.LR.historical_0201_1850_1851.txt: No such file or directory . This doesn't make any sense though because tc_analysis doesn't depend on the ts task, so dropping that commit should not have affected this.Going to try re-running. |
(Rerunning incomplete tasks) | issue-421-fix-drop-1st-commit | Add center times (611) | zppy_dev_n600 | y | Nothing actually re-ran because zppy won't re-run tasks that say "WAITING" or "RUNNING". Need to delete those status files. |
(Rerunning incomplete tasks) | issue-421-fix-drop-1st-commit | Add center times (611) | zppy_dev_n600 | y | It doesn't seem like it was a one-time issue. The bad status files still occur. Turning on dry-run to take a look at dependencies. |
(Rerunning incomplete tasks) | issue-421-fix-drop-1st-commit | Add center times (611) | zppy_dev_n600 | y | Dependency file paths are long. Changing print statement to print([os.path.basename(d) for d in dependencies]) |
(Rerunning incomplete tasks) | issue-421-fix-drop-1st-commit | Add center times (611) | zppy_dev_n600 | y | e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_1852-1853_vs_1850-1851 and e3sm_diags_lnd_monthly_mvm_lnd_model_vs_model_1850-1851_vs_1850-1851 have identical dependenices: ['tc_analysis_1850-1851.status', 'climo_atm_monthly_180x360_aave_1852-1853.status', 'climo_atm_monthly_diurnal_8xdaily_180x360_aave_1852-1853.status', 'tc_analysis_1852-1853.status', 'ts_atm_monthly_180x360_aave_1852-1853-0002.status', 'ts_rof_monthly_1852-1853-0002.status'] . It looks like everything else is blocked because TC Analysis 1850-1851 is failing. I'm not sure why tc_analysis is showing up as a dependecy for the land model-vs-model when it has sets = "lat_lon_land", only. In the tc_analysis .o file, I see NetCDF: HDF error /var/spool/slurmd/job552637/slurm_script: line 80: 331340 Killed GenerateCSMesh --res $res --alt --file ${result_dir}outCSne$res.g , but that isn't particularly helpful. |
@chengzhuzhang I just discovered the same TC analysis error testing a completely different PR (#612), so that leads me to believe it's a problem on my end, perhaps with my scratch space (since I know TC Analysis makes use of that). What I don't understand is why the both-commits version didn't run into that issue when running the complete-run test. |
@forsyth2 I reviewed code change in first commit, and second commit. If I understand correctly both should take identical effect. I don't know why keeping both commits works, while drop the first commit doesn't... |
I deleted the WAITING/RUNNING status files so I could re-run. I was re-running with dry-run so they haven't been reproduced yet. I suspect the error is actually unrelated to this PR, being as I ran into a similar issue elsewhere (#602 (comment)). That will require more debugging to see what's wrong on my end. |
@forsyth2 Got it. Thank you for clarifying. I agree there might be a separate issue, which is irrelevant to code change in the PR. |
For reference, code changes summary: #421
#424
=>
The
|
Confirmed the issue is #613, doing final checks now. |
Fix clm2 support. Follow-up fix to #421, addressing an accidental reversion in #424. (See #421 (comment) for details)