Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3490: Allow run to check CRDS or user parameters for default overrides #192

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

melanieclarke
Copy link
Collaborator

@melanieclarke melanieclarke commented Oct 4, 2024

Resolves JP-3490
Resolves RCAL-nnnn

Closes #

This PR proposes a way to resolve the long-standing issue that calling pipeline or step via run retrieves different parameter defaults than calling it via the class method call. The idea is to keep run and call as is, but allow run to check CRDS for parameter overrides when called.

To allow users to set parameters as attributes and not override them with defaults, we can track initialization status for all parameters.

  • Override setattr to set initialized = True whenever a parameter is set.
  • In init, set initialized = False for all parameters set from software defaults in the spec

Inside run, check for CRDS defaults and use them.

  • Use the provided input data to retrieve any param files from CRDS, and build a config in the same way call does.
  • Set the parameters to the CRDS config defaults, for any that have not already been initialized.

This would be mostly invisible to users: both call and run would work in the same way they currently do, except that recommended defaults are retrieved from CRDS when run is called.

Additionally, since we are updating a config inside run, we can also allow it to accept kwargs in the same way call does, so users could directly set parameters for steps when calling run. Since run and call can use the same method to retrieve and merge CRDS and user parameters, this is straightforward to support.

If this PR is accepted, the JWST documentation will need to be updated, in readthedocs and in JDox, to note the change to behavior for run.

The romancal documentation will also need updating, here:
https://roman-pipeline.readthedocs.io/en/latest/roman/stpipe/call_via_run.html

Tasks

  • update or add relevant tests
  • update relevant docstrings and / or docs/ page
  • Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • run regression tests with this branch installed ("git+https://github.com/<fork>/stpipe@<branch>")
news fragment change types...
  • changes/<PR#>.feature.rst: new feature
  • changes/<PR#>.bugfix.rst: fixes an issue
  • changes/<PR#>.doc.rst: documentation change
  • changes/<PR#>.removal.rst: deprecation or removal of public API
  • changes/<PR#>.misc.rst: infrastructure or miscellaneous change

Copy link

codecov bot commented Oct 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.39%. Comparing base (c55d0d2) to head (8ea3bde).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #192       +/-   ##
===========================================
+ Coverage   75.26%   95.39%   +20.12%     
===========================================
  Files          34       37        +3     
  Lines        2689     3277      +588     
===========================================
+ Hits         2024     3126     +1102     
+ Misses        665      151      -514     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@melanieclarke
Copy link
Collaborator Author

melanieclarke commented Oct 9, 2024

JWST regtests here:
https://github.com/spacetelescope/RegressionTests/actions/runs/11260804666
Roman regtests here:
https://github.com/spacetelescope/RegressionTests/actions/runs/11261960698

Both look fine - expected failures for jwst on GA, no failures for romancal.

@melanieclarke melanieclarke marked this pull request as ready for review October 10, 2024 14:38
@melanieclarke melanieclarke requested a review from a team as a code owner October 10, 2024 14:38
@braingram
Copy link
Collaborator

The romancal documentation would also need updated for this change: https://roman-pipeline.readthedocs.io/en/latest/roman/stpipe/call_via_run.html

There are uses of run in romancal unit tests where the assumption is that the default in-code parameters will be used. With this PR how would one call run without fetching CRDS parameters?

Given the scope of the change I think making this opt-in and deprecating the old behavior makes sense providing advice for a user on how to replace run with some other call that doesn't fetch parameters.

I gave this PR a brief "test drive" and found one configuration where the parameters from crds are overwriting user-supplied parameters.

from jwst.resample import ResampleStep
pipeline = ResampleStep(weight_type="ivm")
pipeline.run("asn_with_nircam_images.json")

on stpipe main I see the expected log message:

  weight_type: ivm

with this PR I'm seeing:

  weight_type: exptime

Due to an overwrite from jwst_nircam_pars-resamplestep_0001.asdf.
Would you add a unit test that replicates this failure and fixes the issue?

@melanieclarke
Copy link
Collaborator Author

Thanks for testing, @braingram.

To disable CRDS pars for run, it currently works the same way call does - if you set the STPIPE_DISABLE_CRDS_STEPPARS environment variable to true, it does not check CRDS. The command line interface additionally allows a disable-crds-steppars argument that will directly disable it as well. I could add an explicit argument for run too, if we want it to behave differently than call.

For input parameters set on instantiation, what you saw is currently expected behavior (expected by me, anyway). To override parameters that might be in CRDS, with the current code, you would do:

from jwst.resample import ResampleStep
pipeline = ResampleStep()
pipeline.run("asn_with_nircam_images.json", weight_type="ivm")

or

import os
from jwst.resample import ResampleStep
os.environ['STPIPE_DISABLE_CRDS_STEPPARS'] = 'True'
pipeline = ResampleStep(weight_type="ivm")
pipeline.run("asn_with_nircam_images.json")

I wrote it this way because there are a number of attributes set on instantiation by default in cases where steps are created by pipelines that should not disable CRDS overrides later, but I will take another look and see if I can distinguish this case from that one.

@melanieclarke
Copy link
Collaborator Author

melanieclarke commented Oct 10, 2024

Two changes pushed, @braingram -

Add handling for an explicit disable for CRDS checks in the run call. Set disable_crds_steppars to True if CRDS checks are not desired, False if they're definitely desired. Default is None, as for call, which checks for the environment variable.

Add initialization handling for parameters explicitly passed on instantiation, for either Step or Pipe, so they are not overridden by CRDS checks. Your example for ResampleStep should now work as expected.

Unit tests added for both changes.

@braingram
Copy link
Collaborator

Two changes pushed, @braingram -

Add handling for an explicit disable for CRDS checks in the run call. Set disable_crds_steppars to True if CRDS checks are not desired, False if they're definitely desired. Default is None, as for call, which checks for the environment variable.

Add initialization handling for parameters explicitly passed on instantiation, for either Step or Pipe, so they are not overridden by CRDS checks. Your example for ResampleStep should now work as expected.

Unit tests added for both changes.

Thanks! The tests look good and the ResampleStep example now works as expected.

Having the option to disable the fetch on run is helpful. I'll add a comment inline as run is now opening the input file an additional time even when crds steppars are disabled (but doing nothing with the opened file).

My vote is still for making this opt-in given the scope of the change. Is it possible with the config tracking in this PR to warn if run is called without crds parameters being fetched?

Also, is there any reason to keep call with this PR? Isn't it now a less-convenient version of run? Given the documentation updates that are needed for this PR I'd say we deprecate call as part of this change so the documentation can be updated to say call is no longer preferred.

src/stpipe/step.py Outdated Show resolved Hide resolved
@braingram
Copy link
Collaborator

One additional question. Are you aware of examples "in the wild" where run is used? All the jwst notebooks I can find as part of the spacetelescope repo use call.

@melanieclarke
Copy link
Collaborator Author

melanieclarke commented Oct 11, 2024

My vote is still for making this opt-in given the scope of the change. Is it possible with the config tracking in this PR to warn if run is called without crds parameters being fetched?

I'm reluctant to make the CRDS checks off by default because that risks the same situation we currently have, where the users don't know that they have to do something special to get the currently recommended default parameters. But you are right that this is a significant change, so perhaps we can have a short transition period for a build, where CRDS checks are available but off by default, with a warning. I'll look into where/how to warn.

Also, is there any reason to keep call with this PR? Isn't it now a less-convenient version of run? Given the documentation updates that are needed for this PR I'd say we deprecate call as part of this change so the documentation can be updated to say call is no longer preferred.

Only that it is in very common use now, since we have been recommending it in place of run for years. If we do want to remove it, we should have a much longer transition period before we do. If we don't turn on CRDS checks by default in run now, we should wait to deprecate call until we do turn them on.

One additional question. Are you aware of examples "in the wild" where run is used? All the jwst notebooks I can find as part of the spacetelescope repo use call.

I think there are still lots out there in less officially vetted places. Here's a (deprecated) one I wrote, before I was aware of the distinction between run and call: caveat example

@melanieclarke
Copy link
Collaborator Author

Changes pushed for disabling CRDS checks by default, with a warning.

@braingram
Copy link
Collaborator

Thanks for working on this. It has highlighted some issues with the current Step API. I am in favor of rethinking this API but given the extent of the documentation and usage in both pipelines I am concerned that changing run to fetch parameters will add to the confusion.

@melanieclarke
Copy link
Collaborator Author

Thanks, I understand your perspective, and I'm also in favor of evaluating the Step API and making more deliberate design choices for the future. But judging from conversations I have had with users in INS, as well as my own experience in learning to work with the pipeline, the current state of run is what causes confusion - the documentation is at odds with user expectations. Perhaps we could check with INS whether they would prefer to make this change now, while we continue to work on improvements for the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants