-
Notifications
You must be signed in to change notification settings - Fork 32
CDAT Migration FY24 ‐ General Guide
This is a general guide on how to refactor diagnostic sets. It will cover how to get started, how to refactor code (generally), and how to perform regression testing.
-
GitHub Project Tracker
- This project tracker is used to map out progress and milestones.
- Root Development Branch: cdat-migration-fy24
- This branch stores all of the developmental work for this task. We will merge this branch progressively into
main
when sets are refactored and pass regression testing.
- This branch stores all of the developmental work for this task. We will merge this branch progressively into
- Remaining Explicit Imports
- Checkout the CDAT migration development branch
git checkout cdat-migration-fy24
- Create a branch stemming from
cdat-migration-fy24
git checkout -b refactor/<ISSUE#>-<SET-NAME>
- Example:
git checkout -b refactor/658-lat-lon-set
- Create the development conda environment for your branch
mamba env create -f conda-env/dev.yml -n e3sm_diags_dev_<ISSUE#>
mamba activate e3sm_diags_dev_<ISSUE#>
- Install the local development version of
e3sm_diags
for your branch-
make install
(in root of repo) - This ensures supplementary files are installed (e.g.,
.cfg
files). - WARNING, if you make any changes to supplementary files AND/OR run e3sm_diags via CLI, you must repeat this command for those changes to be reflected in your environment.
-
- Setup testing directory
auxiliary_tools/cdat_regression_testing/<GH_ISSUE-SET_NAME>/
- Copy test script to test directory and set it up
- auxiliary_tools/cdat_regression_testing/template_run_script.py
- Note, this script was already executed on the
main
branch to produce baseline results for compare your branch against.
- Copy the regression testing notebook(s) you need to testing directory and set it up
- auxiliary_tools/cdat_regression_testing/template_cdat_regression_test_netcdf.ipynb
- If your set also produces
.json
metrics files, then copy auxiliary_tools/cdat_regression_testing/template_cdat_regression_test_json.ipynb. - The sets that dump
.json
include:lat_lon
,lat_lon_land
,lat_lon_river
,area_mean_time_series
,arm_diags
,enso_diags
, andqbo
.
- Create a draft pull request early using
cdat-migration-fy24
as the base branch
There are three steps in this process: 1. Refactor CDAT logic, 2. Clean up and refactor some more, 3. Regression testing.
Objective: Refactor CDAT logic with Xarray/xCDAT and successfully produce metrics .json
and .png
files
- Read the description of your diagnostic set (here)
- The core components of a set consist of a driver, plotter, viewer, and some utilities.
- We'll figure out how to refactor the viewer at a later time in #628.
- Plan your approach to refactoring
- Find all imports and code related to CDAT (e.g., operations on
cdms2.TransientVariable
). - Make an outline for what you need to refactor (example).
- If possible, write failing unit tests beforehand to cover edge cases (test-driven development).
- Find all imports and code related to CDAT (e.g., operations on
- Refactor code
- Focus on CDAT logic with Xarray/xCDAT.
- Try to reuse as much code that has already been refactored. For example, general Xarray-based classes and functions (e.g.,
dataset_xr.py
,metrics.py
,io.py
,regrid.py
) - Refer to (PR #677) for pointers.
- Run your test script(s) to get feedback
- auxiliary_tools/cdat_regression_testing/template_run_script.py
- Read the stack trace, understand how new code is behaving, fix any issues
-
NOTE: Make sure to
python -m pip install .
if running viapython <script_name>.py
to get the latest code changes in your environment. You don't need to do this if you're running with VS Code's Python interactive console and debugger because imports from the local package directory will take precedence.
- Repeat steps 3 and 4 until the diagnostic set can produce metrics (
.json
and.png
)
Objective: Implement readable, maintainable, and testable code
- Refactor sub-optimal or hard to understand code
- Excessive for loops, repeated logic
- Break up large functions into smaller, maintainable functions
- Write/fix unit tests for refactored code, if possible
-
/tests/e3sm_diags
stores unit tests.py
files -
pytest
command runs unit tests and generated code coverage report (tests_coverage_report/
)
-
ALTERNATIVE: Write a TODO: ...
statement
- Get back to refactoring at a later time
- If you are not confident in rewriting cleaner code for an implementation, skip it for now.
- Additional refactoring can be risky because there is minimal unit test coverage (easy to unknowingly introduce incorrect behaviors, side-effects, etc.)
Objective: Regression testing is performed to ensure that a diagnostic set's metrics produced by your branch's refactored code is reasonably close to main
- Run the regression testing notebook for your branch
- Debug branch code as needed
- Run test script
- Public web-server directory: https://portal.nersc.gov/project/e3sm/cdat-migration-fy24/
-
main
results viewer: https://portal.nersc.gov/project/e3sm/cdat-migration-fy24/main/viewer/
- Repeat 1-3 until the results are within reasonable closeness (tolerance 1e-5 max relative difference)
- Follow this guide for getting started with VS Code with the recommended extensions. Also grab Remote SSH to use VS Code with remote machines via SSH.
- Open the
e3sm_diags.code-workspace
file- Automatically configures VS Code extensions, debugger, etc.
- Create your mamba development environment
- Should be done already if you followed "Getting Started"
- VS Code -> Select Interpreter -> mamba environment
VS Code offers the ability to debug by stepping through the code stack in real-time.
By installing the Python extension, you will automatically have access to this feature. I use this feature extensively to develop and debug e3sm_diags
using test scripts.
- Make sure you open VS Code with
e3sm_diags.code-workspace
- Setup breakpoints in your test script and the local e3sm_diags code
- Hit F5 to run the Debugger
- Step through code, use Debug Console to check values of variables