This is an internal helper utility program to automate the workflow on the HPC cluster. The cluster can be found on this repository
In this document, it is assumed that the HPC cluster is already up and running and you have logged in to the Slurm's login node.
It already comes pre-installed to the Slurm login node. You can verify if it's installed by:
bp --version
If you want to customize the program, make changes to it and install it like any other pip package:
pip install <path-to-custom-bp>
All of the available commands are:
The first command should be run before running any other commands. It copies dvm-dos-tem model from a GCP bucket and sets up the file system. It doesn't take any argument.
bp init
Splits the given input set into columns for faster processing. It takes the following arguments:
-i/--input-path
: Relative or absolute path to the input files. Required.-b/--batches
: Path to store the split batches. Note that the given value will be concatenated with/mnt/exacloud/$USER
. Required.-sp/--slurm-partition
: Name of the slurm partition. Optional, by defaultspot
.-p
: Number of pre-run years to run. Optional, by default0
.-e
: Number of equilibrium years to run. Optional, by default0
.-s
: Number of spin-up years to run. Optional, by default0
.-t
: Number of transient years to run. Optional, by default0
.-n
: Number of scenario years to run. Optional, by default0
.-l/--log-level
: Level of logging. Optional, by defaultdisabled
.
If bp batch split -i /mnt/exacloud/dvmdostem-input/my-big-input-dataset -b first-run -p 100 -e 1000 -s 85 -t 115 -n 85 --log-level warn
command is run, you should be able to see your batch folders in /mnt/exacloud/$USER/first-run
where $USER
is the username of the current logged in user.
You can check slurm_runner.sh
to see the details of the job.
Submits all of the jobs to Slurm in the given batch folder. It takes one argument:
-b/--batches
: Path that stores job folders.
Assuming bp batch split
is run with -b first-run
, running bp batch run -b first-run
submits all the jobs in that folder to the Slurm controller.
Combines the results of all batches. It should be run after all jobs are finished. It takes one argument:
-b/--batches
: Path that stores job folders.
Assuming bp batch merge -b first-run
is run, it looks for the /mnt/exacloud/$USER/first-run
folder, gathers the results, and puts them into all-merged
folder in the batch folder, ie. /mnt/exacloud/$USER/first-run
.
Plots the status of a run by checking individual cell statuses and puts cells that have not succeeded in a text file for further reference. It takes one argument:
-b/--batches
: Path that stores job folders.
When bp map -b first-run
is run, it creates run_status_visualization.png
and failed_cell_coords.txt
in /mnt/exacloud/$USER/first-run
.
These files can be copied to a local environment or a bucket using gcloud
or gsutil
tools.
Compares the NetCDF files in the given two directories. It takes two positional arguments:
- todo
Extracts a single cell from the given input set. It takes the following arguments:
- todo
Slices the given big input set into 10 smaller pieces by spawning a process
node in the cluster.
It works with input sets that have more than 500,000 cells.
It takes the following arguments:
- todo
It is pretty easy to start working on the project:
git clone https://github.com/whrc/batch-processing.git
cd batch-processing/
pip install -r requirements.txt
pre-commit install
You are good to go!