Skip to content

CDAT Migration FY24 ‐ General Guide

Tom Vo edited this page Apr 9, 2024 · 58 revisions

Overview

This is a general guide on how to refactor diagnostic sets. It will cover how to get started, how to refactor code (generally), and how to perform regression testing.

Getting Started

  1. Checkout the CDAT migration development branch
    • git checkout cdat-migration-fy24
  2. Create a branch stemming from cdat-migration-fy24
    • git checkout -b refactor/<ISSUE#>-<SET-NAME>
    • Example: git checkout -b refactor/658-lat-lon-set
  3. Create the development conda environment for your branch
    • mamba env create -f conda-env/dev.yml -n e3sm_diags_dev_<ISSUE#>
    • mamba activate e3sm_diags_dev_<ISSUE#>
  4. Install the local development version of e3sm_diags for your branch
    • make install (in root of repo)
    • This ensures supplementary files are installed (e.g., .cfg files).
    • WARNING, if you make any changes to supplementary files AND/OR run e3sm_diags via CLI, you must repeat this command for those changes to be reflected in your environment.
  5. Setup testing directory
    • auxiliary_tools/cdat_regression_testing/<GH_ISSUE-SET_NAME>/
  6. Copy test script to test directory and set it up
  7. Copy the regression testing notebook(s) you need to testing directory and set it up
  8. Create a draft pull request early using cdat-migration-fy24 as the base branch

Refactoring a Diagnostic Set

There are three steps in this process: 1. Refactor CDAT logic, 2. Clean up and refactor some more, 3. Regression testing.

1. Refactor CDAT Logic

Objective: Refactor CDAT logic with Xarray/xCDAT and successfully produce metrics .json and .png files

  1. Read the description of your diagnostic set (here)
    • The core components of a set consist of a driver, plotter, viewer, and some utilities.
    • We'll figure out how to refactor the viewer at a later time in #628.
  2. Plan your approach to refactoring
    • Find all imports and code related to CDAT (e.g., operations on cdms2.TransientVariable).
    • Make an outline for what you need to refactor (example).
    • If possible, write failing unit tests beforehand to cover edge cases (test-driven development).
  3. Refactor code
    • Focus on CDAT logic with Xarray/xCDAT.
    • Try to reuse as much code that has already been refactored. For example, general Xarray-based classes and functions (e.g., dataset_xr.py, metrics.py, io.py, regrid.py)
    • Refer to (PR #677) for pointers.
  4. Run your test script(s) to get feedback
    • auxiliary_tools/cdat_regression_testing/template_run_script.py
    • Read the stack trace, understand how new code is behaving, fix any issues
    • NOTE: Make sure to python -m pip install . if running via python <script_name>.py to get the latest code changes in your environment. You don't need to do this if you're running with VS Code's Python interactive console and debugger because imports from the local package directory will take precedence.
  5. Repeat steps 3 and 4 until the diagnostic set can produce metrics (.json and .png)

2. Clean up and refactor some more

Objective: Implement readable, maintainable, and testable code

  1. Refactor sub-optimal or hard to understand code
    • Excessive for loops, repeated logic
  2. Break up large functions into smaller, maintainable functions
  3. Write/fix unit tests for refactored code, if possible
    • /tests/e3sm_diags stores unit tests .py files
    • pytest command runs unit tests and generated code coverage report (tests_coverage_report/)

ALTERNATIVE: Write a TODO: ... statement

  • Get back to refactoring at a later time
  • If you are not confident in rewriting cleaner code for an implementation, skip it for now.
  • Additional refactoring can be risky because there is minimal unit test coverage (easy to unknowingly introduce incorrect behaviors, side-effects, etc.)

3. Regression testing

Objective: Regression testing is performed to ensure that a diagnostic set's metrics produced by your branch's refactored code is reasonably close to main

  1. Run the regression testing notebook for your branch
  2. Debug branch code as needed
  3. Run test script
  4. Repeat 1-3 until the results are within reasonable closeness (tolerance 1e-5 max relative difference)

General Tips

Set up VS Code

  1. Follow this guide for getting started with VS Code with the recommended extensions. Also grab Remote SSH to use VS Code with remote machines via SSH.
  2. Open the e3sm_diags.code-workspace file
    • Automatically configures VS Code extensions, debugger, etc.
  3. Create your mamba development environment
    • Should be done already if you followed "Getting Started"
  4. VS Code -> Select Interpreter -> mamba environment

Efficient Development and Debugging with VS Code

VS Code offers the ability to debug by stepping through the code stack in real-time. By installing the Python extension, you will automatically have access to this feature. I use this feature extensively to develop and debug e3sm_diags using test scripts.

  1. Make sure you open VS Code with e3sm_diags.code-workspace
  2. Setup breakpoints in your test script and the local e3sm_diags code
  3. Hit F5 to run the Debugger
  4. Step through code, use Debug Console to check values of variables

VS Code debugger