Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to help test the real configs on Hera #79

Open
wants to merge 46 commits into
base: develop
Choose a base branch
from

Conversation

zmoon
Copy link
Member

@zmoon zmoon commented Feb 27, 2025

The script tests/run_config.py

  • sets up a temporary case directory for each selected config
    • the configs are identified by the names of the config/* subdirectories, e.g. cmaq_gfs_megan_nei2019_globtempo
  • stores settings as JSON (script args, etc.)
  • copies in config files, modifying time and modifying grid if requested
  • links in the input data
  • writes a Slurm job script (optionally submitting it as well)

The script tests/collect_data.py collects data from those runs, including the stored settings and info from the Slurm job stdout and stderr files, and writes it as a newline-delimited JSON file, which can be loaded in pandas with pd.read_json("data.ndjson", lines=True).

In the future we could use these capabilities to develop some e2e regression tests and maybe connect a self-hosted runner on Hera to GitHub Actions for automated testing.


The below figures summarize data for a set of runs (two time steps of cmaq_gfs_megan_nei2019_globtempo, no change to the grid). Regrid T means NEXUS output via ESMF (regridded to the FV3 grid spec; nexus -r grid_spec.nc), whereas regrid F indicates HEMCO native grid output produced by HEMCO.

image

image

Data table:
  • n: number of tasks (MPI)
  • c: number of CPUs per task (also used as OMP_NUM_THREADS)
  • r: regrid
  • time: run time in minutes (only successful runs included)
  • mem: max node-wise memory usage during the job in GB
n c r time mem
40 4 T 1.98 29.66
36 4 T 2.02 26.73
36 4 T 2.02 26.75
40 4 T 2.03 29.66
32 4 T 2.10 23.88
32 4 T 2.12 23.90
24 4 T 2.15 24.18
28 4 T 2.15 29.99
28 4 T 2.17 29.94
24 4 T 2.18 24.16
32 2 T 2.20 47.66
32 2 T 2.22 47.61
36 2 T 2.22 53.30
36 2 T 2.23 53.30
28 2 T 2.25 41.88
40 2 T 2.25 59.11
16 4 T 2.28 24.63
16 4 T 2.28 24.65
40 2 T 2.28 59.07
24 2 T 2.28 36.19
24 2 T 2.32 36.15
28 2 T 2.32 41.92
20 4 T 2.37 30.43
12 4 T 2.37 18.96
12 4 T 2.38 18.97
20 4 T 2.42 30.41
20 1 T 2.62 60.43
16 2 T 2.63 37.49
24 1 T 2.63 71.77
20 1 T 2.63 60.40
24 1 T 2.65 71.93
24 1 T 2.65 72.04
16 2 T 2.67 49.02
24 1 T 2.68 70.59
20 2 T 2.70 60.61
20 2 T 2.70 60.55
28 1 T 2.72 78.17
32 1 T 2.72 90.81
32 1 T 2.75 88.89
28 1 T 2.75 83.52
32 1 T 2.75 94.08
12 2 T 2.77 37.67
16 1 T 2.80 49.04
12 2 T 2.80 37.68
16 1 T 2.82 49.06
16 1 T 2.82 48.96
36 1 T 2.87 100.01
32 1 T 2.88 93.80
16 1 T 2.90 49.04
40 1 T 2.90 112.84
40 1 T 2.92 110.34
8 4 T 2.93 26.31
8 4 T 2.95 26.32
40 1 T 3.00 90.78
8 2 T 3.15 26.28
12 1 T 3.15 37.68
12 1 T 3.15 37.65
12 1 T 3.17 37.67
8 2 T 3.18 26.25
12 1 T 3.22 37.71
4 12 T 3.52 7.89
4 12 T 3.65 8.98
4 16 T 3.67 8.99
4 20 T 3.68 8.98
4 20 T 3.68 9.01
4 16 T 3.72 8.99
8 1 T 3.75 26.31
8 1 T 3.77 26.26
8 1 T 3.77 26.24
4 8 T 3.82 16.23
4 8 T 3.82 16.32
8 1 T 3.82 26.25
4 4 T 4.02 16.26
4 4 T 4.05 16.24
4 28 T 4.07 5.23
3 16 T 4.35 11.25
3 16 T 4.35 11.20
4 32 T 4.37 5.30
4 32 T 4.38 5.29
4 40 T 4.38 5.24
3 20 T 4.47 10.96
3 20 T 4.47 11.09
4 24 T 4.47 5.23
4 36 T 4.50 5.23
4 40 T 4.52 5.24
3 8 T 4.60 15.82
3 8 T 4.60 15.86
4 28 T 4.67 5.21
4 2 T 4.70 16.36
4 36 T 4.72 5.23
4 2 T 4.72 16.39
4 24 T 4.78 5.20
3 28 T 4.82 6.29
3 4 T 4.93 15.81
3 4 T 4.98 15.82
3 24 T 5.03 6.28
3 24 T 5.07 6.30
3 32 T 5.10 6.45
3 36 T 5.10 6.47
3 12 T 5.20 14.58
3 40 T 5.22 6.33
3 40 T 5.25 6.29
3 28 T 5.27 6.26
3 12 T 5.30 14.57
3 32 T 5.52 6.45
3 36 T 5.58 6.30
4 1 T 5.60 16.35
4 1 T 5.63 16.36
2 16 T 5.72 15.47
2 16 T 5.73 15.47
2 12 T 5.80 15.47
3 2 T 5.80 15.87
2 12 T 5.83 15.47
2 8 T 5.83 15.46
2 20 T 5.83 15.33
2 8 T 5.92 15.44
3 2 T 5.92 15.77
2 20 T 6.00 15.48
2 24 T 6.17 8.41
2 28 T 6.28 8.46
2 40 T 6.45 8.49
2 4 T 6.48 15.43
2 32 T 6.50 8.56
2 32 T 6.50 8.61
2 4 T 6.53 15.43
2 36 T 6.60 8.39
2 28 T 6.63 8.41
2 40 T 6.63 8.47
2 36 T 6.85 8.31
2 24 T 6.95 8.42
3 1 T 7.15 15.80
3 1 T 7.18 15.85
2 2 T 7.57 15.44
2 2 T 7.62 15.35
2 1 T 9.53 15.17
2 1 T 9.57 15.41
1 16 F 9.62 12.62
1 12 F 9.63 12.63
2 16 F 9.73 23.31
3 16 F 9.75 23.93
2 12 F 9.87 23.92
1 8 F 9.92 12.62
1 16 F 10.03 12.65
1 8 F 10.08 12.64
1 24 F 10.10 12.64
2 8 F 10.15 23.94
2 16 F 10.15 23.95
1 16 T 10.23 14.65
1 16 T 10.25 13.26
1 12 F 10.28 12.65
1 20 T 10.28 14.52
1 12 T 10.32 13.23
1 20 T 10.35 14.54
1 24 F 10.40 12.65
1 12 T 10.57 14.66
1 32 F 10.60 12.62
1 8 T 10.82 14.64
1 28 T 10.87 14.58
1 32 F 11.05 12.69
1 40 F 11.12 12.69
1 24 T 11.13 14.57
2 32 F 11.15 12.72
1 4 F 11.25 12.64
1 40 F 11.33 12.68
1 8 T 11.33 14.66
1 4 F 11.35 12.62
2 4 F 11.48 23.92
1 28 T 11.52 14.53
2 4 F 11.55 23.07
1 24 T 11.57 14.57
1 40 T 11.67 14.59
1 36 T 11.77 14.59
3 8 F 11.83 33.75
1 32 T 11.83 13.29
1 36 T 11.92 14.57
1 40 T 12.00 14.60
1 4 T 12.35 14.51
1 32 T 12.55 14.71
1 4 T 12.62 14.61
3 4 F 13.42 35.08
2 2 F 13.50 23.88
2 2 F 13.55 23.45
1 2 F 13.60 12.61
3 12 F 13.77 35.12
3 4 F 13.82 35.13
1 2 F 14.10 12.60
3 12 F 14.12 35.16
1 2 T 14.20 14.64
1 2 T 14.60 14.45
3 2 F 15.72 35.08
4 2 F 16.05 45.91
3 2 F 16.48 34.74
1 1 F 17.38 12.61
2 1 F 17.52 22.88
1 1 F 17.87 12.59
1 1 T 18.10 14.26
1 1 T 18.47 14.60
3 1 F 19.32 34.72
4 1 F 20.13 45.91
3 1 F 20.18 35.09

zmoon added 30 commits February 20, 2025 11:48
note that

/scratch1/RDARCH/rda-arl-gpu/Barry.Baker/emissions/nexus/FENGSHA/

has links to them too
also could be interesting for space-separated (e.g. "gfs megan")
to include _all_ the matches
might be a good idea to require at least one delimiter for
this matching mode, to ensure that it is intentional
but don't fail, so it is still easy to test the tmp case creation
outside of Hera
and nproc isn't useful
was getting OOM-ed running the default grid without this

haven't been able to find what the workflow uses for --mem(-per-cpu)
@zmoon zmoon requested a review from Copilot February 27, 2025 19:31
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This pull request adds two scripts: one to run config cases on Hera (tests/run_config.py) and one to collect their output data (tests/collect_data.py).

  • tests/run_config.py sets up temporary directories, updates config files based on command‐line arguments, creates a Slurm job script, and optionally submits it.
  • tests/collect_data.py collects settings, extracts runtime and memory usage info from Slurm outputs, and writes a newline‐delimited JSON file.

Reviewed Changes

File Description
tests/run_config.py Implements test run setup, config updates, and job creation/submission for Hera.
tests/collect_data.py Implements collection and parsing of run outputs into a summary JSON file.

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

zmoon and others added 2 commits February 27, 2025 14:36
@zmoon zmoon marked this pull request as ready for review February 27, 2025 20:03
@drnimbusrain
Copy link
Member

@zmoon Awesome! You want this tested on Hera in the UFS-SRW-App workflow? Also, how did you get copilot enabled to act as a review of this PR? Can't see the option in other noaa-oar-arl repos.

@bbakernoaa
Copy link
Member

@drnimbusrain you can request copilot to review PRs under the normal "request reviewer" in the top right

I don't think these changes are for the workflow. This is a simple script for us to be able to perform tests with the configuration files used for our various supported emission scenarios (which include the operational system). This is basically a the RT piece for NEXUS.

@drnimbusrain
Copy link
Member

Great, thanks Barry! Unfortunately I cannot see the CoPilot for other repositories under reviewer, just here.

@bbakernoaa
Copy link
Member

bbakernoaa commented Feb 28, 2025 via email

@drnimbusrain
Copy link
Member

You have to enable it under settings for each repository Barry Baker National Oceanic and Atmospheric Administration Air Resources Laboratory Physical Research Scientist Chemical Modeling and Emissions Group Leader NCWCP, R/ARL, Rm. 4204 5830 University Research Court College Park, Maryland 20740 Phone: ‪(301) 683-1395‬

On Fri, Feb 28, 2025 at 11:59 AM Patrick Campbell @.> wrote: Great, thanks Barry! Unfortunately I cannot see the CoPilot for other repositories under reviewer, just here. — Reply to this email directly, view it on GitHub <#79 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIUVN7Y7X22K4VQXZVIOTL2SCIWJAVCNFSM6AAAAABYAQHRICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJRGEZTCNRXGI . You are receiving this because your review was requested.Message ID: @.> [image: drnimbusrain]drnimbusrain left a comment (noaa-oar-arl/NEXUS#79) <#79 (comment)> Great, thanks Barry! Unfortunately I cannot see the CoPilot for other repositories under reviewer, just here. — Reply to this email directly, view it on GitHub <#79 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIUVN7Y7X22K4VQXZVIOTL2SCIWJAVCNFSM6AAAAABYAQHRICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJRGEZTCNRXGI . You are receiving this because your review was requested.Message ID: @.***>

Still can't access, no worries...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants