CDAT Migration FY24 ‐ General Guide

Overview

This is a general guide on how to refactor diagnostic sets. It will cover how to get started, how to refactor code (generally), and how to perform regression testing.

GitHub Project Tracker
- This project tracker is used to map out progress and milestones.
Root Development Branch: cdat-migration-fy24
- This branch stores all of the developmental work for this task. We will merge this branch progressively into main when sets are refactored and pass regression testing.
Remaining Explicit Imports

Getting Started

Checkout the CDAT migration development branch
- git checkout cdat-migration-fy24
Create a branch stemming from cdat-migration-fy24
- git checkout -b refactor/<ISSUE#>-<SET-NAME>
- Example: git checkout -b refactor/658-lat-lon-set
Create the development conda environment for your branch
- mamba env create -f conda-env/dev.yml -n e3sm_diags_dev_<ISSUE#>
- mamba activate e3sm_diags_dev_<ISSUE#>
Install the local development version of e3sm_diags for your branch
- make install (in root of repo)
- This ensures supplementary files are installed (e.g., .cfg files).
- WARNING, if you make any changes to supplementary files AND/OR run e3sm_diags via CLI, you must repeat this command for those changes to be reflected in your environment.
Setup testing directory
- auxiliary_tools/cdat_regression_testing/<GH_ISSUE-SET_NAME>/
Copy test script to test directory and set it up
- auxiliary_tools/cdat_regression_testing/template_run_script.py
- Note, this script was already executed on the main branch to produce baseline results for compare your branch against.
Copy the regression testing notebook(s) you need to testing directory and set it up
- auxiliary_tools/cdat_regression_testing/template_cdat_regression_test_netcdf.ipynb
- If your set also produces .json metrics files, then copy auxiliary_tools/cdat_regression_testing/template_cdat_regression_test_json.ipynb.
- The sets that dump .json include: lat_lon, lat_lon_land, lat_lon_river, area_mean_time_series, arm_diags, enso_diags, and qbo.
Create a draft pull request early using cdat-migration-fy24 as the base branch

Refactoring a Diagnostic Set

There are three steps in this process: 1. Refactor CDAT logic, 2. Clean up and refactor some more, 3. Regression testing.

1. Refactor CDAT Logic

Objective: Refactor CDAT logic with Xarray/xCDAT and successfully produce metrics .json and .png files

Read the description of your diagnostic set (here)
- The core components of a set consist of a driver, plotter, viewer, and some utilities.
- We'll figure out how to refactor the viewer at a later time in #628.
Plan your approach to refactoring
- Find all imports and code related to CDAT (e.g., operations on cdms2.TransientVariable).
- Make an outline for what you need to refactor (example).
- If possible, write failing unit tests beforehand to cover edge cases (test-driven development).
Refactor code
- Focus on CDAT logic with Xarray/xCDAT.
- Try to reuse as much code that has already been refactored. For example, general Xarray-based classes and functions (e.g., dataset_xr.py, metrics.py, io.py, regrid.py)
- Refer to (PR #677) for pointers.
Run your test script(s) to get feedback
- auxiliary_tools/cdat_regression_testing/template_run_script.py
- Read the stack trace, understand how new code is behaving, fix any issues
- NOTE: Make sure to python -m pip install . if running via python <script_name>.py to get the latest code changes in your environment. You don't need to do this if you're running with VS Code's Python interactive console and debugger because imports from the local package directory will take precedence.
Repeat steps 3 and 4 until the diagnostic set can produce metrics (.json and .png)

2. Clean up and refactor some more

Objective: Implement readable, maintainable, and testable code

Refactor sub-optimal or hard to understand code
- Excessive for loops, repeated logic
Break up large functions into smaller, maintainable functions
Write/fix unit tests for refactored code, if possible
- /tests/e3sm_diags stores unit tests .py files
- pytest command runs unit tests and generated code coverage report (tests_coverage_report/)

ALTERNATIVE: Write a TODO: ... statement

Get back to refactoring at a later time
If you are not confident in rewriting cleaner code for an implementation, skip it for now.
Additional refactoring can be risky because there is minimal unit test coverage (easy to unknowingly introduce incorrect behaviors, side-effects, etc.)

3. Regression testing

Objective: Regression testing is performed to ensure that a diagnostic set's metrics produced by your branch's refactored code is reasonably close to main

Run the regression testing notebook for your branch
Debug branch code as needed
Run test script
- Public web-server directory: https://portal.nersc.gov/project/e3sm/cdat-migration-fy24/
- main results viewer: https://portal.nersc.gov/project/e3sm/cdat-migration-fy24/main/viewer/
Repeat 1-3 until the results are within reasonable closeness (tolerance 1e-5 max relative difference)

General Tips

Set up VS Code

Follow this guide for getting started with VS Code with the recommended extensions. Also grab Remote SSH to use VS Code with remote machines via SSH.
Open the e3sm_diags.code-workspace file
- Automatically configures VS Code extensions, debugger, etc.
Create your mamba development environment
- Should be done already if you followed "Getting Started"
VS Code -> Select Interpreter -> mamba environment
- https://code.visualstudio.com/docs/python/environments#_working-with-python-interpreters

Efficient Development and Debugging with VS Code

VS Code offers the ability to debug by stepping through the code stack in real-time. By installing the Python extension, you will automatically have access to this feature. I use this feature extensively to develop and debug e3sm_diags using test scripts.

Make sure you open VS Code with e3sm_diags.code-workspace
Setup breakpoints in your test script and the local e3sm_diags code
Hit F5 to run the Debugger
Step through code, use Debug Console to check values of variables

VS Code debugger

Provide feedback

Saved searches

Use saved searches to filter your results more quickly