Skip to content

Conversation

@tefirman
Copy link
Contributor

@tefirman tefirman commented Nov 13, 2025

Description

  • Added new "WDL Workflows" article covering WDL fundamentals
  • Converted Cromwell page to "WDL Execution Engines" with expanded coverage of Cromwell/Sprocket/miniWDL and a comparison of which one is best for which use-case
  • Overhauled the "Using Workflows" article with WDL vs Nextflow comparison
  • Added WDL background section to "PROOF How-To" with links to WILDS WDL Library
  • Updated cross-references in SciComp articles (Compute Environments, Parallel Computing, Batch Computing pathway, Software Overview)
  • Reordered left sidebar menu for better navigation

Related Issue

Testing

  • Built site locally, looks good, tested most links.

@tefirman tefirman marked this pull request as ready for review November 21, 2025 16:02
@tefirman tefirman requested a review from a team as a code owner November 21, 2025 16:02
Copy link
Contributor

@emjbishop emjbishop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is an amazing amount of great work and is a wonderful overhaul of the sciwiki info related to WDL. At times it feels almost too comprehensive to the point of being redundant or creating confusion.

I do think wdl_workflows.md needs some major rethinking ("WDL workflows" is a big topic) - is this meant to be a sales pitch, a tutorial, a reference? I would consider the purpose of this page and its target audience and modify it to suit just that one purpose with those people in mind. Los of linking out may be necessary.

> Quick note: the Cromwell server is referred to as a PROOF server in these instructions. PROOF handles setting up the Cromwell server for you.
> Quick note: Throughout this guide, "PROOF server" and "Cromwell server" refer to the same thing - PROOF just handles all the Cromwell setup for you.
## Using PROOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under this section it says to use the Marconi network, is that still true? It should be Research Staff now, right? Line 51

Comment on lines +20 to +49
## Why Use Workflows?

### Without Workflows

A typical bioinformatics analysis might involve:
1. Quality control on raw sequencing reads
2. Alignment to a reference genome
3. Variant calling
4. Variant annotation
5. Statistical analysis

Managing this manually requires:
- Writing custom shell scripts for each step
- Manually tracking which samples completed which steps
- Re-running entire analyses when one step fails
- Coordinating resource requests for different tools
- Ensuring consistent software versions across runs
- Managing intermediate file storage

This approach is error-prone, time-consuming, and difficult to reproduce.

### With Workflows

Workflow systems handle all of this automatically:
- Define your analysis once in a workflow language
- The workflow manager handles job scheduling, retries, and data management
- Easily scale from one sample to thousands
- Share workflows with collaborators who can reproduce your exact analysis
- Update workflows incrementally as your analysis evolves

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "why workflows" is addressed in the intro paragraph and this extra text can be taken out.

Comment on lines +54 to +93
### WDL (Workflow Description Language)

**Best for:**
- Users who want a simple, human-readable workflow language
- Workflows that need to run on multiple platforms (local, HPC, cloud)
- Integration with Fred Hutch infrastructure via PROOF
- Projects that would benefit from growing bioinformatics task libraries

**Key features:**
- Open-source language developed by the Broad Institute
- Multiple execution engine options (Sprocket, miniWDL, Cromwell)
- Strong focus on reproducibility via containerization
- [WILDS WDL Library](/datascience/wilds_wdl/) provides tested, reusable components
- [PROOF](/datascience/proof/) platform for easy execution on Fred Hutch cluster

**Learn more:**
- [WDL Workflows Guide](/datascience/wdl_workflows/) - Language fundamentals
- [WDL Execution Engines](/datascience/wdl_execution_engines/) - How to run WDL workflows
- [WILDS WDL Library](/datascience/wilds_wdl/) - Ready-to-use modules and workflows
- [PROOF](/datascience/proof/) - User-friendly workflow submission at Fred Hutch

### Nextflow

**Best for:**
- Users comfortable with Groovy/Java-like syntax
- Workflows requiring complex logic or custom operations
- Integration with nf-core community workflows
- Projects needing fine-grained control over execution

**Key features:**
- Mature ecosystem with extensive community support
- Native support for containers, conda, and modules
- Powerful DSL (Domain Specific Language) for complex workflows
- Large collection of pre-built workflows via [nf-core](https://nf-co.re/pipelines)
- Active Fred Hutch community

**Learn more:**
- [Nextflow at Fred Hutch](/compdemos/nextflow/) - Getting started guide
- [Nextflow Catalog](/datascience/nextflow_catalog) - Fred Hutch curated workflows
- [nf-core](https://nf-co.re/pipelines) - Community workflow catalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### WDL (Workflow Description Language)
**Best for:**
- Users who want a simple, human-readable workflow language
- Workflows that need to run on multiple platforms (local, HPC, cloud)
- Integration with Fred Hutch infrastructure via PROOF
- Projects that would benefit from growing bioinformatics task libraries
**Key features:**
- Open-source language developed by the Broad Institute
- Multiple execution engine options (Sprocket, miniWDL, Cromwell)
- Strong focus on reproducibility via containerization
- [WILDS WDL Library](/datascience/wilds_wdl/) provides tested, reusable components
- [PROOF](/datascience/proof/) platform for easy execution on Fred Hutch cluster
**Learn more:**
- [WDL Workflows Guide](/datascience/wdl_workflows/) - Language fundamentals
- [WDL Execution Engines](/datascience/wdl_execution_engines/) - How to run WDL workflows
- [WILDS WDL Library](/datascience/wilds_wdl/) - Ready-to-use modules and workflows
- [PROOF](/datascience/proof/) - User-friendly workflow submission at Fred Hutch
### Nextflow
**Best for:**
- Users comfortable with Groovy/Java-like syntax
- Workflows requiring complex logic or custom operations
- Integration with nf-core community workflows
- Projects needing fine-grained control over execution
**Key features:**
- Mature ecosystem with extensive community support
- Native support for containers, conda, and modules
- Powerful DSL (Domain Specific Language) for complex workflows
- Large collection of pre-built workflows via [nf-core](https://nf-co.re/pipelines)
- Active Fred Hutch community
**Learn more:**
- [Nextflow at Fred Hutch](/compdemos/nextflow/) - Getting started guide
- [Nextflow Catalog](/datascience/nextflow_catalog) - Fred Hutch curated workflows
- [nf-core](https://nf-co.re/pipelines) - Community workflow catalog
### WDL (Workflow Description Language)
**Best for:**
- Users who want a simple, human-readable workflow language
- Integration with Fred Hutch infrastructure via PROOF
**Key features:**
- Open-source language developed by the Broad Institute
- [WILDS WDL Library](/datascience/wilds_wdl/) provides tested, reusable WDLs built by and for Fred Hutch scientists
- [PROOF](/datascience/proof/) platform for easy execution on Fred Hutch cluster
**Learn more:**
- [WDL Workflows Guide](/datascience/wdl_workflows/) - Language fundamentals
- [WDL Execution Engines](/datascience/wdl_execution_engines/) - How to run WDL workflows
- [WILDS WDL Library](/datascience/wilds_wdl/) - Ready-to-use modules and workflows
- [GATK Workflows](https://github.com/gatk-workflows)
- [PROOF](/datascience/proof/) - User-friendly workflow submission at Fred Hutch
### Nextflow
**Best for:**
- Users comfortable with Groovy/Java-like syntax
- Integration with [nf-core](https://nf-co.re/pipelines) community workflows
- Projects needing fine-grained control over execution
**Key features:**
- Mature ecosystem with extensive community support
- Native support for containers, conda, and modules
- Large collection of pre-built workflows via [nf-core](https://nf-co.re/pipelines)
- Active Fred Hutch community
**Learn more:**
- [Nextflow at Fred Hutch](/compdemos/nextflow/) - Getting started guide
- [Nextflow Catalog](/datascience/nextflow_catalog) - Fred Hutch curated workflows
- [nf-core](https://nf-co.re/pipelines) - Community workflow catalog

I would take out lines that are implied or don't help distinguish the two from each other. I would also try to note that with WDL we can take requests for WDL Library additions. Also add GATK library


| Consideration | WDL | Nextflow |
|--------------|-----|----------|
| **Learning curve** | Gentle - simple, declarative syntax | Moderate - requires Groovy knowledge |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| **Learning curve** | Gentle - simple, declarative syntax | Moderate - requires Groovy knowledge |
| **Learning curve** | Gentle - simple, declarative syntax | Moderate - Groovy/Java-like syntax |

Trying to keep these in the same style. Also, I don't think being a coder in Groovy is actually a prerequisite

| **Execution options** | Multiple engines (Sprocket, miniWDL, Cromwell) | Nextflow runtime |
| **Local testing** | Easy with Sprocket/miniWDL | Easy with Nextflow |
| **Pre-built workflows** | WILDS WDL Library, GATK workflows | nf-core (500+ workflows) |
| **Language style** | Declarative (what to do) | Imperative (how to do it) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| **Language style** | Declarative (what to do) | Imperative (how to do it) |

I don't think this line adds much to someones decision making

Comment on lines +226 to +241
## Choosing the Right Engine

### Decision Guide

**For Fred Hutch researchers running on Gizmo or AWS Batch:**
- Use **PROOF** (Cromwell backend) for workflow submission with all features managed for you
- Use **Sprocket or miniWDL** locally to test before scaling up

**For local workflow development and testing:**
- Use **Sprocket** for easy installation, clear error messages, and actively maintained modern execution
- Use **miniWDL** as an alternative if Sprocket doesn't meet your needs
- Consider testing locally before submitting to PROOF/Cromwell

**For production pipelines:**
- Use **Cromwell** for call caching, server mode, and robust HPC integration
- Leverage PROOF if you want infrastructure managed for you
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Choosing the Right Engine
### Decision Guide
**For Fred Hutch researchers running on Gizmo or AWS Batch:**
- Use **PROOF** (Cromwell backend) for workflow submission with all features managed for you
- Use **Sprocket or miniWDL** locally to test before scaling up
**For local workflow development and testing:**
- Use **Sprocket** for easy installation, clear error messages, and actively maintained modern execution
- Use **miniWDL** as an alternative if Sprocket doesn't meet your needs
- Consider testing locally before submitting to PROOF/Cromwell
**For production pipelines:**
- Use **Cromwell** for call caching, server mode, and robust HPC integration
- Leverage PROOF if you want infrastructure managed for you
## Summary
**For Fred Hutch researchers running on Gizmo or AWS Batch:**
- Use **PROOF** (Cromwell backend). Must use WDL version 1.0.
- Consider testing locally before submitting to PROOF/Cromwell
**For local workflow development and testing:**
- Use **Sprocket** for actively maintained modern execution
- Use **miniWDL** as an alternative if Sprocket doesn't meet your needs

I think this can be simplified

Comment on lines +243 to +251
### Workflow Portability

A key benefit of WDL is that workflows written for one engine generally work on others. The [WILDS WDL Library](/datascience/wilds_wdl/) ensures portability by testing all components on Cromwell, miniWDL, and Sprocket.

**Tips for portability:**
- Use WDL version 1.0 (required for PROOF)
- Specify Docker containers (not institution-specific modules)
- Use standard WDL features, avoiding engine-specific extensions
- Test on multiple engines if sharing workflows broadly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Workflow Portability
A key benefit of WDL is that workflows written for one engine generally work on others. The [WILDS WDL Library](/datascience/wilds_wdl/) ensures portability by testing all components on Cromwell, miniWDL, and Sprocket.
**Tips for portability:**
- Use WDL version 1.0 (required for PROOF)
- Specify Docker containers (not institution-specific modules)
- Use standard WDL features, avoiding engine-specific extensions
- Test on multiple engines if sharing workflows broadly
**All WDLs in the [WILDS WDL Library](/datascience/wilds_wdl/) have been tested on Cromwell, miniWDL, and Sprocket.**

I think this is the only important point here


- [OpenWDL Community](https://openwdl.org/)
- [WDL Slack Workspace](https://join.slack.com/t/openwdl/shared_invite/zt-ctmj4mhf-cFBNxIiZYs6SY9HgM9UAVw)
- Engine-specific GitHub repositories (linked above)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Engine-specific GitHub repositories (linked above)


The Workflow Description Language (WDL) is an open-source language for describing data processing workflows with human-readable syntax. WDL makes it straightforward to define analysis tasks, chain them together in workflows, and parallelize their execution across different computing environments.

## Why Use WDL?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we selling it or assuming they have chose to try WDL? If the point of this page is a gentle introduction I would cut this section out and go straight to WDL Fundamentals


The same workflow file runs identically across all these environments without modification.

## WDL Fundamentals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to just point to the WDL course and maybe have this page be a high level summary of that (possibly with some tips as below).

This page feels like a tutorial but not a comprehensive one. I would decide if this page is a tutorial or a reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cromwell page doesn't mention PROOF server

3 participants