From 011710f0cf6e2e7514338f6cb4acd535b6329c81 Mon Sep 17 00:00:00 2001 From: "Michael R. Crusoe" <1330696+mr-c@users.noreply.github.com> Date: Thu, 29 Jun 2023 08:07:45 +0200 Subject: [PATCH 01/13] Create toil-cwl-runner Basic structure --- docs/computing/running/toil-cwl-runner | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 docs/computing/running/toil-cwl-runner diff --git a/docs/computing/running/toil-cwl-runner b/docs/computing/running/toil-cwl-runner new file mode 100644 index 0000000000..741da3a019 --- /dev/null +++ b/docs/computing/running/toil-cwl-runner @@ -0,0 +1,23 @@ +# Running CWL workflows on Puhti with `toil-cwl-runner` + +## Strengths of the Common Workflow Language standards + +## Strengths of `toil-cwl-runner` + +## Disadvantages for using CWL + +## Disadvantages for using `toil-cwl-runner` + +## Installing `toil-cwl-runner` + +(and nodejs) + +## Defining CWL workflows + +(link to existing docs) + +## Running CWL workflows with `toil-cwl-runner` + +(use sbatch to submit toil-cwl-runner job to submit further jobs) + +## Monitoring a running workflow From 642af2ee3d73972deda5a4d84012829c1b310e30 Mon Sep 17 00:00:00 2001 From: "Michael R. Crusoe" <1330696+mr-c@users.noreply.github.com> Date: Thu, 29 Jun 2023 08:56:23 +0200 Subject: [PATCH 02/13] add motivation --- docs/computing/running/toil-cwl-runner | 23 ----------------- docs/computing/running/toil-cwl-runner.md | 31 +++++++++++++++++++++++ 2 files changed, 31 insertions(+), 23 deletions(-) delete mode 100644 docs/computing/running/toil-cwl-runner create mode 100644 docs/computing/running/toil-cwl-runner.md diff --git a/docs/computing/running/toil-cwl-runner b/docs/computing/running/toil-cwl-runner deleted file mode 100644 index 741da3a019..0000000000 --- a/docs/computing/running/toil-cwl-runner +++ /dev/null @@ -1,23 +0,0 @@ -# Running CWL workflows on Puhti with `toil-cwl-runner` - -## Strengths of the Common Workflow Language standards - -## Strengths of `toil-cwl-runner` - -## Disadvantages for using CWL - -## Disadvantages for using `toil-cwl-runner` - -## Installing `toil-cwl-runner` - -(and nodejs) - -## Defining CWL workflows - -(link to existing docs) - -## Running CWL workflows with `toil-cwl-runner` - -(use sbatch to submit toil-cwl-runner job to submit further jobs) - -## Monitoring a running workflow diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md new file mode 100644 index 0000000000..0cf95a1a52 --- /dev/null +++ b/docs/computing/running/toil-cwl-runner.md @@ -0,0 +1,31 @@ +# Running CWL workflows on Puhti with `toil-cwl-runner` + +[Common Workflow Language](https://www.commonwl.org/) is a popular set of open standards implemented by several workflow runners and platforms. +The CWL standards are targeted at creating portable workflows made of command line programs. The steps can be written in any compiled or interpreted language. +Sub-workflows, optional steps, scatter-gather, and implicit parallelism are just some of the features. + +The [Toil workflow system](https://toil.ucsc-cgl.org/) supports running CWL on a variety of schedulers and systems. + +This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, including the usage of `apptainer` to execute any provided Docker-format containers. + +## Strengths of the Common Workflow Language standards + +## Strengths of `toil-cwl-runner` + +## Disadvantages for using CWL + +## Disadvantages for using `toil-cwl-runner` + +## Installing `toil-cwl-runner` + +(and nodejs) + +## Defining CWL workflows + +(link to existing docs) + +## Running CWL workflows with `toil-cwl-runner` + +(use sbatch to submit toil-cwl-runner job to submit further jobs) + +## Monitoring a running workflow From e983bc6418119ad642f13dbb2e79b0562c81d154 Mon Sep 17 00:00:00 2001 From: "Michael R. Crusoe" <1330696+mr-c@users.noreply.github.com> Date: Thu, 29 Jun 2023 09:20:26 +0200 Subject: [PATCH 03/13] pros and cons --- docs/computing/running/toil-cwl-runner.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index 0cf95a1a52..5903dcbc52 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -10,11 +10,22 @@ This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, inc ## Strengths of the Common Workflow Language standards +- Open standard (free to read, free to contribute to) governed by a [not-for-profit charity which is legally obligated to work in the public interest]([https://sfconservancy.org/](https://sfconservancy.org/news/2018/apr/11/cwl-new-member-project/)). +- [Multiple implementations](https://www.commonwl.org/implementations/) of the CWL standards +- Used in many [different fields of research](https://www.commonwl.org/gallery/) +- YAML based syntax with [special support in many IDEs](https://www.commonwl.org/tools/#editors) +- Support, but does not require, software containers. Can also work with conda packages, `module load` environments, and locally available software. +- CWL's model works hard to keep site-specific deatuls out of the workflow definition. Enabling portability between laptops, clusters, and cloud systems. + ## Strengths of `toil-cwl-runner` +- Supports sending jobs to Slurm, translating CWL resource requirements to Slurm resources specifications. +- Can also run on other batch systems: Grid Engine, Torque, LSF, HTCondor. +- Launches and monitors Slurm jobs for you. Also constructs the apptainer commands. ## Disadvantages for using CWL ## Disadvantages for using `toil-cwl-runner` +- Just a workflow runner. Won't manage your data, or keep track of previous workflow runs. ## Installing `toil-cwl-runner` @@ -22,7 +33,9 @@ This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, inc ## Defining CWL workflows -(link to existing docs) +Learning resources +- [Novice CWL tutorial](https://carpentries-incubator.github.io/cwl-novice-tutorial/), includes detailed setup instructions for local editing and running on Microsoft Windows, macOS, and Linux +- ## Running CWL workflows with `toil-cwl-runner` From 260c015951d3b1d98f0fe2ef42943272f77f2117 Mon Sep 17 00:00:00 2001 From: "Michael R. Crusoe" <1330696+mr-c@users.noreply.github.com> Date: Thu, 29 Jun 2023 09:27:06 +0200 Subject: [PATCH 04/13] Logo --- docs/computing/running/toil-cwl-runner.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index 5903dcbc52..c9338b4362 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -1,6 +1,8 @@ # Running CWL workflows on Puhti with `toil-cwl-runner` -[Common Workflow Language](https://www.commonwl.org/) is a popular set of open standards implemented by several workflow runners and platforms. + + +The [Common Workflow Language](https://www.commonwl.org/) is a popular set of open standards implemented by several workflow runners and platforms. The CWL standards are targeted at creating portable workflows made of command line programs. The steps can be written in any compiled or interpreted language. Sub-workflows, optional steps, scatter-gather, and implicit parallelism are just some of the features. From a2288a0c403f9b9d944fbbab69c6de09dad25094 Mon Sep 17 00:00:00 2001 From: Teemu Kataja Date: Thu, 29 Jun 2023 10:28:03 +0300 Subject: [PATCH 05/13] add toil installation and running instructions --- docs/computing/running/toil-cwl-runner.md | 63 ++++++++++++++++++++++- 1 file changed, 61 insertions(+), 2 deletions(-) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index c9338b4362..4da2fc92fa 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -31,7 +31,20 @@ This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, inc ## Installing `toil-cwl-runner` -(and nodejs) +Install `toil` with `CWL` plugin. +``` +pip install -U setuptools wheel +pip install toil[cwl] +toil-cwl-runner --version +``` + +Install `nodejs` which provides helpful tools for debugging `toil` internals. +``` +cd /projappl/project_nnnnnnn +wget https://nodejs.org/dist/v18.16.1/node-v18.16.1-linux-x64.tar.xz +tar -xf node-v18.16.1-linux-x64.tar.xz +export PATH=$PATH:/projappl/project_nnnnnnn/node-v18.16.1-linux-x64/bin +``` ## Defining CWL workflows @@ -41,6 +54,52 @@ Learning resources ## Running CWL workflows with `toil-cwl-runner` -(use sbatch to submit toil-cwl-runner job to submit further jobs) +!!! Note + Singularity containers can't be run in the **login node** or in an **interactive session** due to network constraints. + +When you have defined a workflow with `CWL`, you can send it to the cluster using `sbatch`, and then `toil` will start new jobs for each item in the workflow description. + +### Preliminary Steps +Create working directories for `toil`. +``` +mkdir /projappl/project_nnnnnnn/ +mkdir /projappl/project_nnnnnnn/ +mkdir /projappl/project_nnnnnnn/ +``` + +### Creating the sbatch file +The `sbatch` file `workflow.sh` will reference the `CWL` file `workflow.cwl` where you have described your workflow steps. + +`workflow.sh` +```bash +#!/bin/sh +#SBATCH --job-name= +#SBATCH --account= +#SBATCH --time=01:00:00 +#SBATCH --mem-per-cpu=1G +#SBATCH --nodes=1 +#SBATCH --cpus-per-task=2 +#SBATCH --partition= + +WORKDIR=/projappl/project_nnnnnnn +SCRATCH=/scratch/project_nnnnnnn +export TMPDIR=$WORKDIR/tmp +export TOIL_SLURM_ARGS="--account=project_nnnnnnn --partition=small" +export CWL_SINGULARITY_CACHE=$WORKDIR/singularity +unset XDG_RUNTIME_DIR + +TOIL_SLURM_ARGS="--account= --partition=" toil-cwl-runner \ + --jobStore $WORKDIR/ \ + --workDir $SCRATCH/ \ + --tmpdir-prefix $TMPDIR/ \ + --batchSystem slurm \ + $WORKDIR/workflow.cwl \ + --message "message for job" +``` + +Send your workflow to the cluster. +``` +sbatch workflow.sh +``` ## Monitoring a running workflow From 8f33e7ffb50ed803829613cbfd62408c3d4425b2 Mon Sep 17 00:00:00 2001 From: Teemu Kataja Date: Thu, 29 Jun 2023 10:30:22 +0300 Subject: [PATCH 06/13] add python venv --- docs/computing/running/toil-cwl-runner.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index 4da2fc92fa..526ee5443a 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -33,8 +33,13 @@ This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, inc Install `toil` with `CWL` plugin. ``` +cd /projappl/ +python -m venv venv +source venv/bin/activate + pip install -U setuptools wheel pip install toil[cwl] + toil-cwl-runner --version ``` @@ -81,6 +86,8 @@ The `sbatch` file `workflow.sh` will reference the `CWL` file `workflow.cwl` whe #SBATCH --cpus-per-task=2 #SBATCH --partition= +source /projappl//venv/bin/activate + WORKDIR=/projappl/project_nnnnnnn SCRATCH=/scratch/project_nnnnnnn export TMPDIR=$WORKDIR/tmp From eb09d09b5210d2d8d4566e329a18442b129c0ac0 Mon Sep 17 00:00:00 2001 From: Teemu Kataja Date: Thu, 29 Jun 2023 10:37:00 +0300 Subject: [PATCH 07/13] add note about SBATCH headers --- docs/computing/running/toil-cwl-runner.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index 526ee5443a..ca98787e68 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -75,6 +75,9 @@ mkdir /projappl/project_nnnnnnn/ ### Creating the sbatch file The `sbatch` file `workflow.sh` will reference the `CWL` file `workflow.cwl` where you have described your workflow steps. +!!! Note + See [batch documentation](./creating-job-scripts-puhti.md) on how to fill out the `#SBATCH` values. + `workflow.sh` ```bash #!/bin/sh From a1d628e2ee26bd80978ee1dda05a958dcb053710 Mon Sep 17 00:00:00 2001 From: Teemu Kataja Date: Thu, 29 Jun 2023 10:52:47 +0300 Subject: [PATCH 08/13] set real partition name as example --- docs/computing/running/toil-cwl-runner.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index ca98787e68..69e623f80a 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -87,7 +87,7 @@ The `sbatch` file `workflow.sh` will reference the `CWL` file `workflow.cwl` whe #SBATCH --mem-per-cpu=1G #SBATCH --nodes=1 #SBATCH --cpus-per-task=2 -#SBATCH --partition= +#SBATCH --partition=small source /projappl//venv/bin/activate @@ -98,7 +98,7 @@ export TOIL_SLURM_ARGS="--account=project_nnnnnnn --partition=small" export CWL_SINGULARITY_CACHE=$WORKDIR/singularity unset XDG_RUNTIME_DIR -TOIL_SLURM_ARGS="--account= --partition=" toil-cwl-runner \ +TOIL_SLURM_ARGS="--account= --partition=small" toil-cwl-runner \ --jobStore $WORKDIR/ \ --workDir $SCRATCH/ \ --tmpdir-prefix $TMPDIR/ \ From ff0b5d220bad9c937cdbc721b6bbd10f55963228 Mon Sep 17 00:00:00 2001 From: Teemu Kataja Date: Thu, 29 Jun 2023 11:05:18 +0300 Subject: [PATCH 09/13] add links --- docs/computing/running/throughput.md | 7 +++++-- docs/support/tutorials/index.md | 1 + 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/computing/running/throughput.md b/docs/computing/running/throughput.md index e2f06942e0..60be7aecde 100644 --- a/docs/computing/running/throughput.md +++ b/docs/computing/running/throughput.md @@ -58,8 +58,8 @@ graph TD E -->|Serial| F(GNU Parallel
Array jobs
HyperQueue) E -->|Parallel| G(Single- or multinode subtasks?) G -->|Single| H(Dependencies between subtasks?) - G -->|Multi| I(FireWorks) - H -->|Yes| J(Snakemake
Nextflow
FireWorks) + G -->|Multi| I(FireWorks
Toil-CWL-Runner) + H -->|Yes| J(Snakemake
Nextflow
FireWorks
Toil-CWL-Runner) H -->|No| K(HyperQueue) ``` @@ -157,6 +157,8 @@ graph TD with `xargs`, see [xargsjob.sh] for example. * [FireWorks] is a flexible tool for defining, managing and executing workflows with multiple steps and complex dependencies +* [Toil-CWL-Runner] is an open source workflow manager using the Common + Workflow Language open standards * [HyperQueue] is a tool for efficient sub-node task scheduling * [Nextflow workflows using HyperQueue as an executor] can be leveraged to run large workflows involving thousands of processes efficiently @@ -199,6 +201,7 @@ workflows. [HyperQueue]: ../../apps/hyperqueue.md [GNU Parallel]: ../../support/tutorials/many.md [FireWorks]: fireworks.md +[Toil-CWL-Runner]: toil-cwl-runner.md [contact CSC Service Desk]: ../../support/contact.md [Nextflow]: ../../support/tutorials/nextflow-puhti.md [Snakemake]: https://snakemake.readthedocs.io/en/stable/ diff --git a/docs/support/tutorials/index.md b/docs/support/tutorials/index.md index 18cdc45c01..a801503e01 100644 --- a/docs/support/tutorials/index.md +++ b/docs/support/tutorials/index.md @@ -24,6 +24,7 @@ * [General high-throughput guidelines](../../computing/running/throughput.md) * [Running Nextflow workflows using HyperQueue](nextflow-hq.md) * [FireWorks workflow manager](../../computing/running/fireworks.md) +* [Toil CWL workflow manager](../../computing/running/toil-cwl-runner.md) * [How to run many short jobs with GNU Parallel](many.md) * [HyperQueue meta-scheduler](../../apps/hyperqueue.md) From 914dc1117f5c5a69aa6860035425a88a5a6bed8a Mon Sep 17 00:00:00 2001 From: "Michael R. Crusoe" <1330696+mr-c@users.noreply.github.com> Date: Tue, 11 Jul 2023 10:39:19 +0200 Subject: [PATCH 10/13] Apply suggestions from code review Co-authored-by: Joonas Somero <50655931+joonas-somero@users.noreply.github.com> --- docs/computing/running/toil-cwl-runner.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index 69e623f80a..0cad2d3a71 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -1,6 +1,6 @@ # Running CWL workflows on Puhti with `toil-cwl-runner` - +![CWL Logo](https://raw.githubusercontent.com/common-workflow-language/cwl-website/main/content/assets/img/CWL-Logo-HD-cropped2.png){ width=50% } The [Common Workflow Language](https://www.commonwl.org/) is a popular set of open standards implemented by several workflow runners and platforms. The CWL standards are targeted at creating portable workflows made of command line programs. The steps can be written in any compiled or interpreted language. @@ -12,7 +12,7 @@ This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, inc ## Strengths of the Common Workflow Language standards -- Open standard (free to read, free to contribute to) governed by a [not-for-profit charity which is legally obligated to work in the public interest]([https://sfconservancy.org/](https://sfconservancy.org/news/2018/apr/11/cwl-new-member-project/)). +- Open standard (free to read, free to contribute to) governed by a [not-for-profit charity which is legally obligated to work in the public interest](https://sfconservancy.org/news/2018/apr/11/cwl-new-member-project/). - [Multiple implementations](https://www.commonwl.org/implementations/) of the CWL standards - Used in many [different fields of research](https://www.commonwl.org/gallery/) - YAML based syntax with [special support in many IDEs](https://www.commonwl.org/tools/#editors) @@ -54,6 +54,7 @@ export PATH=$PATH:/projappl/project_nnnnnnn/node-v18.16.1-linux-x64/bin ## Defining CWL workflows Learning resources + - [Novice CWL tutorial](https://carpentries-incubator.github.io/cwl-novice-tutorial/), includes detailed setup instructions for local editing and running on Microsoft Windows, macOS, and Linux - From a5b169aa086cbf32986003caf3d9b2d59660d6bc Mon Sep 17 00:00:00 2001 From: "Michael R. Crusoe" Date: Tue, 11 Jul 2023 11:05:02 +0200 Subject: [PATCH 11/13] Monitoring a running toil-cwl-runner --- docs/computing/running/toil-cwl-runner.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index 0cad2d3a71..9e7809bc58 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -21,10 +21,10 @@ This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, inc ## Strengths of `toil-cwl-runner` - Supports sending jobs to Slurm, translating CWL resource requirements to Slurm resources specifications. +- Even when using Slurm, (sub-)tasks do not have to have identical resource requirements. - Can also run on other batch systems: Grid Engine, Torque, LSF, HTCondor. -- Launches and monitors Slurm jobs for you. Also constructs the apptainer commands. - -## Disadvantages for using CWL +- Launches and monitors Slurm jobs for you. Also constructs the `apptainer` commands (or some other software container engine as appropriate: `docker`, `podman`, `singularity`, `udocker`). +- No database needs to be setup. ## Disadvantages for using `toil-cwl-runner` - Just a workflow runner. Won't manage your data, or keep track of previous workflow runs. @@ -114,3 +114,6 @@ sbatch workflow.sh ``` ## Monitoring a running workflow + +Check the output logs from the main Toil job or +run `toil status $WORKDIR/`. From c2daeecb55cffff64b44827b05d2c6c0f23a665e Mon Sep 17 00:00:00 2001 From: "Michael R. Crusoe" <1330696+mr-c@users.noreply.github.com> Date: Tue, 11 Jul 2023 15:01:40 +0200 Subject: [PATCH 12/13] python3 to install --- docs/computing/running/toil-cwl-runner.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index 9e7809bc58..1f2b9bd9b3 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -34,7 +34,7 @@ This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, inc Install `toil` with `CWL` plugin. ``` cd /projappl/ -python -m venv venv +python3 -m venv venv source venv/bin/activate pip install -U setuptools wheel From fb992810fea58f3428ce9b46b39ab7ba2e9a33b1 Mon Sep 17 00:00:00 2001 From: "Michael R. Crusoe" <1330696+mr-c@users.noreply.github.com> Date: Fri, 14 Jul 2023 15:13:51 +0200 Subject: [PATCH 13/13] Update docs/computing/running/toil-cwl-runner.md Co-authored-by: Joonas Somero <50655931+joonas-somero@users.noreply.github.com> --- docs/computing/running/toil-cwl-runner.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/computing/running/toil-cwl-runner.md b/docs/computing/running/toil-cwl-runner.md index 1f2b9bd9b3..72271de363 100644 --- a/docs/computing/running/toil-cwl-runner.md +++ b/docs/computing/running/toil-cwl-runner.md @@ -33,8 +33,10 @@ This page describes how run CWL worklflows on Puhti using `toil-cwl-runner`, inc Install `toil` with `CWL` plugin. ``` +module load python-data + cd /projappl/ -python3 -m venv venv +python -m venv venv source venv/bin/activate pip install -U setuptools wheel