Skip to content

Running Gen 3 workflow at Cambridge

James Chiang edited this page Aug 2, 2021 · 7 revisions

This page describes the process of running the Gen 3 DRP workflow on the CSD3 system at Cambridge. It should also be applicable to other similar systems (i.e. HPC systems that use Slurm and Singularity).

Container Image

The LSST software stack and Parsl are deployed via a container image. The image used is lsstdesc/desc-drp-stack:weekly, available on DockerHub. It can be downloaded and converted to sandbox format as follows:

singularity pull desc-drp-stack_weekly.sif docker://lsstdesc/desc-drp-stack:weekly
singularity build --sandbox sandbox-weekly/ desc-drp-stack_weekly.sif

It is not essential to convert the image to sandbox format; the workflow can also be run by using the SIF file directly. However, the sandbox format is convenient as it makes it very easy to make minor changes to the software within the container when required.

Job Script

The top level Slurm job script for running the workflow is shown below:

#!/bin/bash
#SBATCH -A IRIS-IP005-CPU
#SBATCH -p cclake
#SBATCH --nodes=32
#SBATCH --ntasks=32
#SBATCH --time=12:00:00

singularity exec -B /usr/local/software/slurm,/usr/lib64/libmunge.so.2,/usr/lib64/libmunge.so.2.0.0,/var/run/munge,/usr/lib64/liblua.so,/usr/lib64/liblua-5.1.so /home/ir-perr3/rds/rds-iris-ip005/jamesp/parsl/sandbox-weekly/ /home/ir-perr3/parsl/gen3_workflow/container_inner.sh

The parameters passed to SBATCH will vary for different systems. Passing --nodes=32 and --ntasks=32 will cause Slurm to allocate 32 nodes for the workflow. One of these nodes will be used to run the Parsl workflow manager itself, the other 31 will be used to run the workflow's tasks, with Parsl managing the resource allocation.

The bind mounts passed to Singularity are required in order for Slurm to work inside the container. The details of these may vary depending on the machine's configuration and the version of Slurm installed.

The container-inner.sh script

Inside the container, the container-inner.sh script will be executed. This script contains the following:

#!/bin/bash

export PATH=/usr/local/software/slurm/current/bin:$PATH
echo "slurm:x:8900:8900::/usr/local/software/slurm:/bin/false" >> /etc/passwd
echo "slurm:x:8900:" >> /etc/group

cd /home/ir-perr3/parsl/gen3_workflow
source /opt/lsst/software/stack/loadLSST.bash
setup lsst_distrib
setup -r . -j

bps submit bps_drp_baseline_csd3.yaml

The first block of commands sets up the PATH and adds a user and a group to enable running Slurm commands within the container. Again the details are likely to vary depending on the machine's configuration and the Slurm version. The second block sets up the LSST software stack and the gen3_workflow software.

The final command actually starts the workflow running. bps_drp_baseline_csd3.yaml is a copy of the example YAML file, modified slightly for CSD3. The most important modifications are to set the repo location, and the Parsl config file to be used. In this case the Parsl config file is set as follows:

parslConfig: desc.gen3_workflow.config.workQueue_csd3_config

With this configuration, a commandPrepend to launch a Singularity container for each command is not required; this is handled at a lower level and will be discussed later.

Parsl Config File

The Parsl config file for CSD3 is shown below:

"""Parsl config for WorkQueueExecutor using a SlurmProvider"""
import parsl
from parsl.executors import WorkQueueExecutor, ThreadPoolExecutor
from parsl.providers import SlurmProvider, LocalProvider
from parsl.launchers import SrunLauncher

PROVIDER_OPTIONS = dict(nodes_per_block=31,
                        init_blocks=0,
                        min_blocks=0,
                        max_blocks=1,
                        parallelism=0,
                        launcher=SrunLauncher(),
                        cmd_timeout=300)

SCHEDULER_OPTIONS = ("#SBATCH -A IRIS-IP005-CPU")

provider = LocalProvider(**PROVIDER_OPTIONS)

executors = [WorkQueueExecutor(label='work_queue', port=9000, shared_fs=True,
                               provider=provider, autolabel=False,
                               worker_executable='singularity exec /home/ir-perr3/rds/rds-iris-ip005/jamesp/parsl/sandbox-weekly/ /home/ir-perr3/parsl/gen3_workflow/python/desc/gen3_workflow/csd3_wrapper.sh work_queue_worker'),
                               ThreadPoolExecutor(max_threads=1, label='submit-node')]
config = parsl.config.Config(strategy='simple',
                             garbage_collect=False,
                             app_cache=True,
                             executors=executors,
                             retries=1)
DFK = parsl.load(config)

The executor is set up to use the resource allocation requested, with Parsl managing the resources. The worker_executable is used to launch a Singularity container for each task, and to run the csd3_wrapper.sh within it to set up the environment as required.

csd3_wrapper.sh

The wrapper script runs inside the container for each task. It simply sets up the LSST stack environment and then launches the executable that was passed in (in this case work_queue_worker). This time it is not necessary to set up the enrivonment for Slurm, since no Slurm commands are required at this level.

#!/bin/bash

source /opt/lsst/software/stack/loadLSST.bash
setup lsst_distrib
cd /home/ir-perr3/parsl/gen3_workflow
setup -r . -j

$*
exit $?

Resuming the workflow

The maximum time allowed for a normal Slurm job on CSD3 is 12 hours, so for workflows that take longer than this it is necessary to resume them in a new Slurm job after the original job is terminated. A second top level job script is used to implement this:

#!/bin/bash
#SBATCH -A IRIS-IP005-CPU
#SBATCH -p cclake
#SBATCH --nodes=32
#SBATCH --ntasks=32
#SBATCH --time=12:00:00

singularity exec -B /usr/local/software/slurm,/usr/lib64/libmunge.so.2,/usr/lib64/libmunge.so.2.0.0,/var/run/munge,/usr/lib64/liblua.so,/usr/lib64/liblua-5.1.so /home/ir-perr3/rds/rds-iris-ip005/jamesp/parsl/sandbox-weekly/ /home/ir-perr3/parsl/gen3_workflow/container_inner_resume.sh

It is very similar to the original script except that the script run within the container is container_inner_resume.sh.

container_inner_resume.sh

Again this script is similar to the one used for lauching a new workflow (container_inner.sh), but instead of submitting the workflow using the bps command, it runs a Python script named resume_workflow.py:

#!/bin/bash

export PATH=/usr/local/software/slurm/current/bin:$PATH
echo "slurm:x:8900:8900::/usr/local/software/slurm:/bin/false" >> /etc/passwd
echo "slurm:x:8900:" >> /etc/group

cd /home/ir-perr3/parsl/gen3_workflow
source /opt/lsst/software/stack/loadLSST.bash
setup lsst_distrib
setup -r . -j

python resume_workflow.py

resume_workflow.py

This Python script is used to restore a workflow that was previously started:

#!/usr/bin/env python

from desc.gen3_workflow import ParslGraph
graph = ParslGraph.restore('/home/ir-perr3/parsl/gen3_workflow/submit/shared/drp_test/20210514T093524Z/parsl_graph_config.pickle')
print("Parsl graph status:")
graph.status()
print("Resuming workflow...")
graph.run(block=True)

The name of the pickle file containing the workflow graph to resume is currently hardcoded into this script and must be edited whenever a different workflow needs to be resumed.


Gen3 Computing Resource Usage at CSD3

(posted by Jim, 2021-07-23)

Similar to my posting on using Cori-Haswell at NERSC for DC2 tract 3828 Y1 processing, I've made plots showing the cpu time and memory usage for James' Gen3 runs at CSD3 on the same data. Note that James' run used the older, single band deblender, whereas the runs at NERSC used the Scarlet deblender. Based on the isr task plots, it appears that James' run used a version of the DRP pipeline that also did not include the Brighter-Fatter correction.

Visit-level processing

Coadd/Multiband Processing

Here's a plot of the numbers of concurrent processes as a function of time for each of the task types. These data were obtained from the parsl monitoring.db file for for the small test data set that just had data for 5 visits per band and CCDs just covering a single patch.

These results are similar to those from a much earlier run I did on these same data using Parsl but before the advent of the plugin for ctrl_bps.

Here are plots of concurrent processes for the full DC2 Y1 tract 3828 processing. These data appear to have been processed in two runs submitted for 12 hours each: