Allow e3sm_to_cmip to use ilamb parameter names #672

forsyth2 · 2025-02-04T02:39:32Z

Summary

Objectives:

Allow e3sm_to_cmip to use parameter names consistent with ILAMB. I.e., in addition to ts_grid and ts_subsection, allow users to set ts_{component}_grid and ts_{component}_subsection as well.

Issue resolution:

Addresses Parameter setting for new `e3sm_to_cmip` task #667 (reply in thread)

Select one: This pull request is...

a bug fix: increment the patch version
a small improvement: increment the minor version
a new feature: increment the minor version
an incompatible (non-backwards compatible) API change: increment the major version

Big Change

To merge, I will use "Create a merge commit". That is, this change is large enough to require multiple units of work (i.e., it should be multiple commits).

1. Does this do what we want it to do?

Required:

Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
Testing: I have added or modified at least one "min-case" configuration file to test this change. Every objective above is represented in at least one cfg.
Testing: I have considered likely and/or severe edge cases and have included them in testing.

If applicable:

Testing: this pull request adds at least one new possible parameter to the cfg. I have tested using this parameter with and without any other parameter that may interact with it.

2. Are the implementation details accurate & efficient?

Required:

Logic: I have visually inspected the entire pull request myself.
Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

3. Is this well documented?

Required:

Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.

4. Is this code clean?

Required:

Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
Pre-commit checks: All the pre-commits checks have passed.

If applicable:

Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

forsyth2

@chengzhuzhang This is ready for review. Unit tests pass. The primary file to review is zppy/e3sm_to_cmip.py.

@xylar @tomvothecoder I think it would be good to get a second ok from at least one of you. This change allows users to set ts_atm_grid and ts_land_grid rather than just setting ts_grid differently in each e3sm_to_cmip subtask. It is a similar change for ts_subsection.

Benefits of "guessing" parameters:

Allows users much more flexibility. They have multiple options to get to their desired result.
Can be more intuitive. For example, the ilamb task requires both ts_atm_grid and ts_land_grid whereas the e3sm_to_cmip task handled this by setting a single parameter ts_grid differently in its atm subtask and its land subtask. @chengzhuzhang pointed out this is not very intuitive to users.

Drawbacks:

This sort of algorithmic guesswork adds a decent amount of code, in particular in tests. Unit tests expand trying to keep up with the number of parameter combinations. This can increase tech debt.
Pollutes the parameter space. I.e., there are now more parameters for users to be aware of and for developers to keep track of.

forsyth2 · 2025-02-04T03:06:57Z

zppy/e3sm_to_cmip.py

+            raise ParameterNotProvidedError(parameter)
+
+
+def check_and_define_parameters(c: Dict[str, Any], sub: str) -> None:


The reasons for the different function are:

ts_grid is used in the bash file whereas ts_subsection is used in the python file.

We will only try to guess ts_subsection if the guess parameter is on. The reason for this is somewhat historical -- when I did the parameter refactoring, I created parameters to turn on/off guessing for file paths or zppy cfg sections.

class ParameterGuessType(Enum): PATH_GUESS = 1 SECTION_GUESS = 2

grid was always used in the bash files using exactly what the user set it as, or else it was computed from the mapping file. This PR appears to be the first time we're allowing a grid parameter to be set by an alternative set of parameters.

def set_grid(c: Dict[str, Any]) -> None: # Grid name (if not explicitly defined) # 'native' if no remapping # or extracted from mapping filename if c["grid"] == "": if c["mapping_file"] == "": c["grid"] = "native" elif c["mapping_file"] == "glb": c["grid"] = "glb" else: tmp = os.path.basename(c["mapping_file"]) # FIXME: W605 invalid escape sequence '\.' tmp = re.sub("\.[^.]*\.nc$", "", tmp) # noqa: W605 tmp = tmp.split("_") if tmp[0] == "map": c["grid"] = f"{tmp[-2]}_{tmp[-1]}" else: raise ValueError( f"Cannot extract target grid name from mapping file {c['mapping_file']}" ) # If grid is defined, just use that

xylar · 2025-02-04T07:36:25Z

@forsyth2, I don't feel well enough versed on the analysis affected by these changes to have an opinion. I'd rather let @tomvothecoder and @chengzhuzhang provide feedback.

chengzhuzhang · 2025-02-04T18:20:32Z

I was thinking it only take a few lines of code change to also support ts_atm/lnd_grid. I don't fully understand of the guessing parameter part, and I feel the complication perhaps is not necessary, and is very hard to test.

tomvothecoder · 2025-02-04T18:35:18Z

Benefits of "guessing" parameters:

* Allows users much more flexibility. They have multiple options to get to their desired result.

* Can be more intuitive. For example, the `ilamb` task requires both `ts_atm_grid` and `ts_land_grid` whereas the `e3sm_to_cmip` task handled this by setting a single parameter `ts_grid` differently in its `atm` subtask and its `land` subtask. @chengzhuzhang pointed out this is not very intuitive to users.

Drawbacks:

* This sort of algorithmic guesswork adds a decent amount of code, in particular in tests. Unit tests expand trying to keep up with the number of parameter combinations. This can increase tech debt.

* Pollutes the parameter space. I.e., there are now more parameters for users to be aware of and for developers to keep track of.

I don't have a strong opinion for this specific case in zppy. I would probably ask users what their opinion is on the current design and whether adding a guessing/inference feature is helpful/confusing.

I usually lean towards being explicit as possible with API design and configurations because too much flexibility can introduce significant overhead for both the developer and the user (even if they think it is helpful to be flexible). However, if inferring/guessing can be implemented relatively simply with default fallback behavior(s) (if it fails) and it makes the lives of users easier, then this feature might be reasonable.

tomvothecoder

My quick code review.

tomvothecoder · 2025-02-04T18:25:49Z

zppy/defaults/default.ini

+# Use for e3sm_to_cmip and/or ilamb tasks.
+# Name of the grid used by the relevant `[ts]` atm subtask
+ts_atm_grid = string(default="180x360_aave")
+# Use for e3sm_to_cmip and/or ilamb tasks.
+# Name of the `[ts]` atm subtask to depend on
+ts_atm_subsection = string(default="")
+# Use for e3sm_to_cmip task (but NOT the ilamb task) -- you can either set this, or
+# both ts_atm_grid and ts_land_grid
+# Name of the grid used by the relevant `[ts]` task
+ts_grid = string(default="180x360_aave")
+# Use for e3sm_to_cmip and/or ilamb tasks.
+# Name of the grid used by the relevant `[ts]` land subtask
+ts_land_grid = string(default="180x360_aave")
+# Use for e3sm_to_cmip and/or ilamb tasks.
+# Name of the `[ts]` land subtask to depend on
+ts_land_subsection = string(default="")


From what I understand, you're moving these parameters up a level in the configuration hierarchy so that they can be reused across each e3sm_to_cmip subtask, rather than configuring ts_grid differently for each subtask. The intention is to eliminate redundant code/configuration.

Is this correct? If so, it sounds reasonable to me.

tomvothecoder · 2025-02-04T18:38:46Z

Also I would call it inferring instead of guessing if this is actually implemented.

I was curious and asked ChatGPT:

The difference between guessing and inferring lies in the use of evidence and reasoning:

Guessing is making a statement or prediction without sufficient evidence or logical reasoning. It is often based on chance, intuition, or incomplete knowledge. For example, if someone asks you to guess a number between 1 and 100, you might pick 42 without any clues.

Inferring is drawing a conclusion based on available evidence and logical reasoning. It involves analyzing facts, patterns, or context to arrive at a probable answer. For example, if you see wet ground and dark clouds, you might infer that it recently rained.

In short, guessing is random or uncertain, while inferring is reasoned and based on observable clues.

forsyth2 · 2025-02-04T19:09:58Z

I don't fully understand of the guessing parameter part

@chengzhuzhang This feature has been part of main since #628. I found it useful to include as the developer because I needed to know what the minimal set of parameters was -- i.e., what parameters does a task actually need to run? And are these passed in directly or "guessed" (or rather "inferred" as @tomvothecoder points out)? If they're guessed, do we have the parameters needed to guess? This is very similar to the variable derivation in e3sm_diags or global_time_series: ok we really need parameter x, but can we compute it or assume it based on the parameters we do have?

As you noted, users prefer flexibility, so these guessing parameters are defaulted to True:

# These two parameters enable zppy to guess path or section parameters.
# This allows users to set fewer parameters, but with the risk of zppy choosing incorrect values for them.
# Set to False for more transparency in path or section defintions.
guess_path_parameters = boolean(default=True)
guess_section_parameters = boolean(default=True)

There are two "types" of guesses I defined -- file path guesses (e.g., we don't have a file path, but it's probably xyz) and section name guesses (e.g., we don't know what subtask is a dependency for this task, but it's probably the ts subtask matching the name of this subtask)

What this PR does is expand guessing functionality to ts_subsection and ts_grid (which as I note in the PR comments, doesn't fall into either of those two "types" of guesses.

I would probably ask users what their opinion is on the current design and whether adding a guessing/inference feature is helpful/confusing.

@chengzhuzhang believes users generally prefer more flexibility. I should also note, as mentioned earlier in this comment, we already do a very large amount of guesswork. This PR is just adding more.

It's a little too late considering we're in the RC testing period, but perhaps sending out a survey to zppy users asking their opinions on features might be a good idea to consider in the future.

I usually lean towards being explicit as possible with API design [...] (even if they think it is helpful to be flexible).

This is my personal feeling. It's easier for users to run into issues from a missing parameter if they're setting different parameters to achieve the same result in different cfg's (e.g., I'm imagining a user being confused because one cfg sets ts_grid but another sets ts_land_grid and ts_atm_grid).

That said, I do think flexibility can reduce the learning curve -- e.g., users can get going faster with whatever parameters they have or what's intuitive to them. (What's intuitive to one person might not be to another!)

if inferring/guessing can be implemented relatively simply with default fallback behavior(s) (if it fails) and it makes the lives of users easier, then this feature might be reasonable.

@tomvothecoder As mentioned above, the feature has existed since #628 -- this PR is just extending it even more. I guess the design decision at this point, is do we want to be extending it more?

The intention is to eliminate redundant code/configuration. Is this correct? If so, it sounds reasonable to me.

This is close to correct. The primary purpose of this shift is 4 of the variables can now be set at the top level and apply to both e3sm_to_cmip and ilamb (rather than simply the subtasks of e3sm_to_cmip).

I would call it inferring instead of guessing if this is actually implemented.

#628 already did the initial implementation using "guess". There's still time to change the parameter names to use "infer" instead of "guess" before the v3.0.0 release though. I can update the parameter names.

chengzhuzhang · 2025-03-05T18:46:23Z

I tested two different settings with e3sm_to_cmip task:

Supported in recent zppy rc

[e3sm_to_cmip]
active = True
frequency = "monthly"
walltime = "00:50:00"
ts_num_years = 15
years = "2000:2014:15"

  [[ atm_monthly_180x360_aave ]]
  # for ilamb
  input_files = "eam.h0"
  ts_grid = "180x360_traave"

  [[ land_monthly ]]
  # for ilamb
  input_files = "elm.h0"
  ts_grid = "180x360_esmfaave"

use two separate parameters for ts_xxx_grid, expected to be also supported in addition to case 1.

[e3sm_to_cmip]
active = True
frequency = "monthly"
walltime = "00:50:00"
ts_atm_years = 15
years = "2000:2014:15"

  [[ atm_monthly_180x360_aave ]]
  # for ilamb
  input_files = "eam.h0"
  ts_atm_grid = "180x360_traave"

  [[ land_monthly ]]
  # for ilamb
  input_files = "elm.h0"
  ts_lnd_grid = "180x360_esmfaave"

With both cases, I ran into status file missing error pasted at the bottom, I think somehow the grid name was not pass into the expected status file name (ex. ts__1985-2014-0030.status).

ts_atm_monthly_180x360_aave_1985-2014-0030
...Submitted batch job 700819
   environment_commands=source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.11.0rc10_chrysalis.sh
ts_land_monthly_1985-2014-0030
...Submitted batch job 700820
   environment_commands=source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.11.0rc10_chrysalis.sh
e3sm_to_cmip_atm_monthly_180x360_aave_1985-2014-0030
...skipping because of dependency status file missing
   /lcrc/group/e3sm2/ac.zhang40/E3SMv3/v3.LR.historical_eurc10_e2c_v3/post/scripts/ts__1985-2014-0030.status
   environment_commands=source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.11.0rc10_chrysalis.sh
e3sm_to_cmip_land_monthly_1985-2014-0030
...skipping because of dependency status file missing
   /lcrc/group/e3sm2/ac.zhang40/E3SMv3/v3.LR.historical_eurc10_e2c_v3/post/scripts/ts__1985-2014-0030.status
   environment_commands=source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.11.0rc10_chrysalis.sh
ilamb_1985-2014
...skipping because of dependency status file missing
   /lcrc/group/e3sm2/ac.zhang40/E3SMv3/v3.LR.historical_eurc10_e2c_v3/post/scripts/e3sm_to_cmip_land_monthly_1985-2014-0030.status
   environment_commands=source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.11.0rc10_chrysalis.sh
URL: https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.zhang40/E3SMv3_eurc10_e2c_v3/v3.LR.historical_0051/ilamb

forsyth2 · 2025-03-05T22:15:25Z

@chengzhuzhang

You're setting ts_lnd_grid = "180x360_esmfaave", not ts_land_grid. The parameter names were chosen to match ILAMB, as in https://github.com/E3SM-Project/zppy/blob/main/zppy/defaults/default.ini:

# Name of the grid used by the relevant `[ts]` land subtask
ts_land_grid = string(default="180x360_aave")
# Name of the `[ts]` land subtask to depend on
ts_land_subsection = string(default="")

That doesn't explain the failure with atm though. For that, I see you're not supplying the ts_subsection name explicitly (the name of the dependent ts subtask), which is distinct from the grid name. However, that shouldn't pose a problem. I'm going to have to debug this more thoroughly.

Analysis so far

            check_and_define_parameters(c, sub)
            add_dependencies(
                dependencies,
                script_dir,
                "ts",
                c["ts_subsection"],
                c["yr_start"],
                c["yr_end"],
                c["ts_num_years"],
            )

def check_and_define_parameters(c: Dict[str, Any], sub: str) -> None:
    parameter = "ts_subsection"
    if (parameter not in c.keys()) or (c[parameter] == ""):
        guess_type_parameter: str = get_guess_type_parameter(
            ParameterGuessType.SECTION_GUESS
        )
        if c[guess_type_parameter]:
            if "component" in c.keys():
                if (c["component"] == "atm") and ("ts_atm_subsection" in c.keys()):
                    c[parameter] = c["ts_atm_subsection"]
                elif (c["component"] == "lnd") and ("ts_land_subsection" in c.keys()):
                    c[parameter] = c["ts_land_subsection"]
                else:
                    c[parameter] = sub
            else:
                c[parameter] = sub
        else:
            raise ParameterNotProvidedError(parameter)

We don't have c["ts_subsection"] defined, so zppy attempts to guess/infer it. You also have not specified component-specific subsection names, so neither of those blocks will run. That leaves us with setting c["ts_subsection"] = sub.

sub is defined earlier as:

            sub: str = define_or_guess(
                c, "subsection", "grid", ParameterGuessType.SECTION_GUESS
            )

I believe here that would mean sub gets defined as atm_monthly_180x360_aave and land_monthly, ultimately resulting in dependencies on ts_atm_monthly_180x360_aave_1985-2014-0030 and ts_land_monthly_1985-2014-0030 respectively.

So, it appears something is wrong in this logic, that the subsection name ends up getting guessed to be empty string.

(#682 has matching logic using the better-worded function names)

forsyth2 · 2025-03-05T22:33:04Z

@chengzhuzhang Do you have the full cfg you ran? I can't seem to reproduce the issue in stand-alone unit testing (45dc2c6).

chengzhuzhang · 2025-03-05T22:43:41Z

@chengzhuzhang Do you have the full cfg you ran? I can't seem to reproduce the issue in stand-alone unit testing (45dc2c6).

Here is my cfg for testing: /home/ac.zhang40/test_zppy/post.v3.LR.historical.e3sm_to_cmip.cfg

forsyth2 · 2025-03-05T23:24:12Z

@chengzhuzhang Try with the latest code; there was a missing check for empty strings (31e2f15)

chengzhuzhang · 2025-03-06T18:02:34Z

@chengzhuzhang Try with the latest code; there was a missing check for empty strings (31e2f15)

@forsyth2 Yes, with this update, my .cfg is running okay. I added more one commit 3384fc4 to expand the default atm variable list, by adding all 2d variables. it is potentially useful to users needs to cmorize some variables, or to have more ilamb diagnostics.

forsyth2 · 2025-03-06T18:17:19Z

Great, thanks! Can I merge this PR then?

(I anticipate merge conflicts with #682. Since that PR is mainly refactoring, I'll 1. merge this PR keeping the distinct commits, 2. address merge conflicts on that PR, 3. merge that PR, keeping the distinct commits).

chengzhuzhang · 2025-03-06T18:20:08Z

Yes, go ahead. Thanks!

forsyth2 · 2025-03-06T19:17:28Z

@chengzhuzhang FYI I added a commit (44992bb) mostly with comments but also adding that empty string check to the ts_grid parameter too.

We didn't see issues with that because ts_{atm/land}_grid, unlike ts_{atm/land}_subsection, doesn't default to an empty string. But I figured it would still be good to add the check. I added unit tests for the situation of them being explicitly set to empty strings, so I think this is fine to merge now.

Allow e3sm_to_cmip to use ilamb parameter names

1cf4fa2

forsyth2 commented Feb 4, 2025

View reviewed changes

tomvothecoder reviewed Feb 4, 2025

View reviewed changes

This was referenced Feb 4, 2025

[Bug]: Another parameter checking error: zppy.utils.ParameterNotProvidedError: climo_land_subsection #669

Closed

[Doc]: Explain parameter inference #680

Closed

chengzhuzhang self-requested a review March 5, 2025 18:35

Test unspecified ts_subsection

45dc2c6

Check for empty component subsection names

31e2f15

add cmor supported 2d atm variables as default

3384fc4

Self-code review

44992bb

forsyth2 merged commit 66c1a2e into main Mar 6, 2025
5 checks passed

forsyth2 deleted the discussion-667-parameter-compatibility branch March 6, 2025 19:41

forsyth2 mentioned this pull request Mar 6, 2025

Improve parameter inference #682

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow e3sm_to_cmip to use ilamb parameter names #672

Allow e3sm_to_cmip to use ilamb parameter names #672

forsyth2 commented Feb 4, 2025 •

edited

Loading

forsyth2 left a comment

forsyth2 Feb 4, 2025

xylar commented Feb 4, 2025

chengzhuzhang commented Feb 4, 2025

tomvothecoder commented Feb 4, 2025

tomvothecoder left a comment

tomvothecoder Feb 4, 2025

tomvothecoder commented Feb 4, 2025

forsyth2 commented Feb 4, 2025

chengzhuzhang commented Mar 5, 2025

forsyth2 commented Mar 5, 2025

forsyth2 commented Mar 5, 2025

chengzhuzhang commented Mar 5, 2025

forsyth2 commented Mar 5, 2025

chengzhuzhang commented Mar 6, 2025

forsyth2 commented Mar 6, 2025

chengzhuzhang commented Mar 6, 2025

forsyth2 commented Mar 6, 2025

		raise ParameterNotProvidedError(parameter)


		def check_and_define_parameters(c: Dict[str, Any], sub: str) -> None:

Allow e3sm_to_cmip to use ilamb parameter names #672

Allow e3sm_to_cmip to use ilamb parameter names #672

Conversation

forsyth2 commented Feb 4, 2025 • edited Loading

Summary

Big Change

1. Does this do what we want it to do?

2. Are the implementation details accurate & efficient?

3. Is this well documented?

4. Is this code clean?

forsyth2 left a comment

Choose a reason for hiding this comment

forsyth2 Feb 4, 2025

Choose a reason for hiding this comment

xylar commented Feb 4, 2025

chengzhuzhang commented Feb 4, 2025

tomvothecoder commented Feb 4, 2025

tomvothecoder left a comment

Choose a reason for hiding this comment

tomvothecoder Feb 4, 2025

Choose a reason for hiding this comment

tomvothecoder commented Feb 4, 2025

forsyth2 commented Feb 4, 2025

chengzhuzhang commented Mar 5, 2025

forsyth2 commented Mar 5, 2025

forsyth2 commented Mar 5, 2025

chengzhuzhang commented Mar 5, 2025

forsyth2 commented Mar 5, 2025

chengzhuzhang commented Mar 6, 2025

forsyth2 commented Mar 6, 2025

chengzhuzhang commented Mar 6, 2025

forsyth2 commented Mar 6, 2025

forsyth2 commented Feb 4, 2025 •

edited

Loading