Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit of modified hopper workflow files. #6

Draft
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

drnimbusrain
Copy link
Member

@drnimbusrain drnimbusrain commented Mar 21, 2023

This draft PR is a work in progress of the initial files needed to create a working conda environment and generate a workflow for the UFS-SRW-App [develop] branch for a ATMAQ build on Hopper.

Again to build the UFS-SRW-App on Hopper (using our develop branches of UWM, UFS_UTILS, and AQM-utils) from previous closed PR #5:

module reset
module load hpc-stack

Following build procedures on Hopper of UFS-SRW-App, the following will successfully create an environment and generate a workflow on Hopper:

cd ush
module use ../modulefiles
module load wflow_hopper
conda env create -f environment_hopper_wflow.yml
conda activate regional_workflow
module reset
python3 generate_FV3LAM_wflow.py

At the bottom of the environment yaml file, environment_hopper_wflow.yml, be sure to change the prefix path prefix: /groups/ESS/pcampbe8/anaconda3/envs/regional_workflow to your local directory where you store conda environments. After you initially create your new environment, you only need to reactivate it to generate new workflows conda activate regional_workflow.

There exists some conflicts when activating the conda environment and loaded modules, hence the module reset before generating the workflow. Thus, when performing the subsequent launch workflow, e.g., cd /scratch/pcampbe8/expt_dirs/aqm_community_aqmna13 && ./launch_FV3LAM_wflow.sh called_from_cron="TRUE", there is a similar issue reloading modules vs. conda environment. I also add the rocoto module path to my .bashrc so that it can find the necessary rocotorun commands: export PATH="/opt/sw/other/apps/rocoto/bin/:$PATH" . Further modifications are needed in this draft to avoid these errors.

@ytangnoaa @zmoon @bbakernoaa I appreciate if other tests could be done on your end with this method.

DESCRIPTION OF CHANGES:

This PR is a draft work in progress of the initial files needed to create a working conda environment and generate a workflow for the UFS-SRW-App [develop] branch for a ATMAQ build on Hopper.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • [X ] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)
  • [ ]

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published
  • [ X] Hopper

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • [X ] Work In Progress
  • bug
  • [X ] enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@drnimbusrain drnimbusrain added the help wanted Extra attention is needed label Mar 21, 2023
@drnimbusrain drnimbusrain marked this pull request as draft March 21, 2023 17:51
ush/machine/hopper.yaml Show resolved Hide resolved
ush/machine/hopper.yaml Outdated Show resolved Hide resolved
ush/machine/hopper.yaml Outdated Show resolved Hide resolved
ush/machine/hopper.yaml Outdated Show resolved Hide resolved
@drnimbusrain
Copy link
Member Author

drnimbusrain commented Mar 24, 2023

@ytangnoaa @zmoon @bbakernoaa With a new workaround I put in for Hopper and some changes to the machine file for the Hopper sbatch default partition and queue/account, it seems the above steps and of course the following command can successfully generate AND launch workflow tasks to Hopper, and adds it to your crontab:

./launch_FV3LAM_wflow.sh called_from_cron="TRUE"
rocotostat -w FV3LAM_wflow.xml -d FV3LAM_wflow.db 
       CYCLE                    TASK                       JOBID               STATE         EXIT STATUS     TRIES      DURATION
================================================================================================================================
202302170000               make_grid                      666355              QUEUED                   -         0           0.0
202302170000               make_orog                           -                   -                   -         -             -
202302170000          make_sfc_climo                           -                   -                   -         -             -
202302170000           nexus_gfs_sfc                      666356              QUEUED                   -         0           0.0
202302170000       nexus_emission_00                           -                   -                   -         -             -
202302170000       nexus_emission_01                           -                   -                   -         -             -
202302170000       nexus_emission_02                           -                   -                   -         -             -
202302170000        nexus_post_split                           -                   -                   -         -             -
202302170000           fire_emission                      666357              QUEUED                   -         0           0.0

@drnimbusrain
Copy link
Member Author

drnimbusrain commented Mar 24, 2023

The launch_FV3LAM_wflow.sh workaround on Hopper (by resetting modules) is not really functional though when the tasks run, as we still need to have the correct modules loaded:

Loading modules for task "get_extrn_ics" ...
Currently Loaded Modules:

  1. use.own 4) gnu9/9.3.0 7) hwloc/2.1.0
  2. autotools 5) ucx/1.8.0 8) openmpi4/4.0.4
  3. prun/2.0 6) libfabric/1.10.1 9) hosts/hopper
    ...
    ModuleNotFoundError: No module named 'jinja2'

Need a better idea to get the correct modules loaded in, for example on Hera:
Loading modules for task "get_extrn_ics" ...

Currently Loaded Modules:

  1. hpss/hpss 3) miniconda_regional_workflow
  2. miniconda3/4.12.0 4) get_extrn_ics.local

We need to get the regional_workflow to be loaded successfully as on Hera (e.g., "miniconda_regional_workflow") during launch_FV3LAM_wflow.sh.

@ytangnoaa
Copy link
Collaborator

The launch_FV3LAM_wflow.sh workaround on Hopper (by resetting modules) is not really functional though when the tasks run, as we still need to have the correct modules loaded:

Loading modules for task "get_extrn_ics" ... Currently Loaded Modules:

  1. use.own 4) gnu9/9.3.0 7) hwloc/2.1.0
  2. autotools 5) ucx/1.8.0 8) openmpi4/4.0.4
  3. prun/2.0 6) libfabric/1.10.1 9) hosts/hopper
    ...
    ModuleNotFoundError: No module named 'jinja2'

Need a better idea to get the correct modules loaded in, for example on Hera: Loading modules for task "get_extrn_ics" ...

Currently Loaded Modules:

  1. hpss/hpss 3) miniconda_regional_workflow
  2. miniconda3/4.12.0 4) get_extrn_ics.local

We need to get the regional_workflow to be loaded successfully as on Hera (e.g., "miniconda_regional_workflow") during launch_FV3LAM_wflow.sh.

Patrick, the jinja2 is used in python package, which is currently included in
python/3.9.9-jh

However, we still missed "f90nml" in that python package. Build another
miniconda version of python may help

@drnimbusrain
Copy link
Member Author

drnimbusrain commented Mar 24, 2023 via email

@bbakernoaa
Copy link
Member

bbakernoaa commented Mar 25, 2023 via email

@ytangnoaa
Copy link
Collaborator

Thanks Youhua. Maybe I'm wrong, but I've created and activated the 'regional_workflow' environment (from included Hopper environment yamal file) that contains jinja2 and the other packages needed to successfully generate the workflow. How to get these also loaded when launching workflow, I think is the question. Can you help test?

On Fri, Mar 24, 2023, 2:02 PM Youhua Tang @.> wrote: The launch_FV3LAM_wflow.sh workaround on Hopper (by resetting modules) is not really functional though when the tasks run, as we still need to have the correct modules loaded: Loading modules for task "get_extrn_ics" ... Currently Loaded Modules: 1. use.own 4) gnu9/9.3.0 7) hwloc/2.1.0 2. autotools 5) ucx/1.8.0 8) openmpi4/4.0.4 3. prun/2.0 6) libfabric/1.10.1 9) hosts/hopper ... ModuleNotFoundError: No module named 'jinja2' Need a better idea to get the correct modules loaded in, for example on Hera: Loading modules for task "get_extrn_ics" ... Currently Loaded Modules: 1. hpss/hpss 3) miniconda_regional_workflow 2. miniconda3/4.12.0 4) get_extrn_ics.local We need to get the regional_workflow to be loaded successfully as on Hera (e.g., "miniconda_regional_workflow") during launch_FV3LAM_wflow.sh. Patrick, the jinja2 is used in python package, which is currently included in python/3.9.9-jh However, we still missed "f90nml" in that python package. Build another miniconda version of python may help — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGLFYNXWE46UPOPOSMQD65LW5XOUVANCNFSM6AAAAAAWCZDP5E . You are receiving this because you authored the thread.Message ID: @.>

The regional_workflow's python has "jinja2". However, when launching the tasks via rocoto, the actually loaded modules are

  1. ucx/1.8.0 6) prun/2.0 11) rocoto/1.3.5
  2. libfabric/1.10.1 7) hosts/hopper 12) miniconda3/22.11.1-gy
  3. hwloc/2.1.0 8) gnu10/10.3.0-ya 13) wflow_hopper
  4. use.own 9) zlib/1.2.11-2y
  5. autotools 10) ruby/3.1.0-4e

miniconda3/22.11.1-gy has no "jinja2" module

@drnimbusrain
Copy link
Member Author

drnimbusrain commented Apr 14, 2023 via email

@drnimbusrain
Copy link
Member Author

drnimbusrain commented Apr 14, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
3 participants