-
Notifications
You must be signed in to change notification settings - Fork 45
Refreshing Fred Hutch WDL Content #1276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ge to an Execution Engines page
… to environments article
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is an amazing amount of great work and is a wonderful overhaul of the sciwiki info related to WDL. At times it feels almost too comprehensive to the point of being redundant or creating confusion.
I do think wdl_workflows.md needs some major rethinking ("WDL workflows" is a big topic) - is this meant to be a sales pitch, a tutorial, a reference? I would consider the purpose of this page and its target audience and modify it to suit just that one purpose with those people in mind. Los of linking out may be necessary.
| > Quick note: the Cromwell server is referred to as a PROOF server in these instructions. PROOF handles setting up the Cromwell server for you. | ||
| > Quick note: Throughout this guide, "PROOF server" and "Cromwell server" refer to the same thing - PROOF just handles all the Cromwell setup for you. | ||
| ## Using PROOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under this section it says to use the Marconi network, is that still true? It should be Research Staff now, right? Line 51
| ## Why Use Workflows? | ||
|
|
||
| ### Without Workflows | ||
|
|
||
| A typical bioinformatics analysis might involve: | ||
| 1. Quality control on raw sequencing reads | ||
| 2. Alignment to a reference genome | ||
| 3. Variant calling | ||
| 4. Variant annotation | ||
| 5. Statistical analysis | ||
|
|
||
| Managing this manually requires: | ||
| - Writing custom shell scripts for each step | ||
| - Manually tracking which samples completed which steps | ||
| - Re-running entire analyses when one step fails | ||
| - Coordinating resource requests for different tools | ||
| - Ensuring consistent software versions across runs | ||
| - Managing intermediate file storage | ||
|
|
||
| This approach is error-prone, time-consuming, and difficult to reproduce. | ||
|
|
||
| ### With Workflows | ||
|
|
||
| Workflow systems handle all of this automatically: | ||
| - Define your analysis once in a workflow language | ||
| - The workflow manager handles job scheduling, retries, and data management | ||
| - Easily scale from one sample to thousands | ||
| - Share workflows with collaborators who can reproduce your exact analysis | ||
| - Update workflows incrementally as your analysis evolves | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "why workflows" is addressed in the intro paragraph and this extra text can be taken out.
| ### WDL (Workflow Description Language) | ||
|
|
||
| **Best for:** | ||
| - Users who want a simple, human-readable workflow language | ||
| - Workflows that need to run on multiple platforms (local, HPC, cloud) | ||
| - Integration with Fred Hutch infrastructure via PROOF | ||
| - Projects that would benefit from growing bioinformatics task libraries | ||
|
|
||
| **Key features:** | ||
| - Open-source language developed by the Broad Institute | ||
| - Multiple execution engine options (Sprocket, miniWDL, Cromwell) | ||
| - Strong focus on reproducibility via containerization | ||
| - [WILDS WDL Library](/datascience/wilds_wdl/) provides tested, reusable components | ||
| - [PROOF](/datascience/proof/) platform for easy execution on Fred Hutch cluster | ||
|
|
||
| **Learn more:** | ||
| - [WDL Workflows Guide](/datascience/wdl_workflows/) - Language fundamentals | ||
| - [WDL Execution Engines](/datascience/wdl_execution_engines/) - How to run WDL workflows | ||
| - [WILDS WDL Library](/datascience/wilds_wdl/) - Ready-to-use modules and workflows | ||
| - [PROOF](/datascience/proof/) - User-friendly workflow submission at Fred Hutch | ||
|
|
||
| ### Nextflow | ||
|
|
||
| **Best for:** | ||
| - Users comfortable with Groovy/Java-like syntax | ||
| - Workflows requiring complex logic or custom operations | ||
| - Integration with nf-core community workflows | ||
| - Projects needing fine-grained control over execution | ||
|
|
||
| **Key features:** | ||
| - Mature ecosystem with extensive community support | ||
| - Native support for containers, conda, and modules | ||
| - Powerful DSL (Domain Specific Language) for complex workflows | ||
| - Large collection of pre-built workflows via [nf-core](https://nf-co.re/pipelines) | ||
| - Active Fred Hutch community | ||
|
|
||
| **Learn more:** | ||
| - [Nextflow at Fred Hutch](/compdemos/nextflow/) - Getting started guide | ||
| - [Nextflow Catalog](/datascience/nextflow_catalog) - Fred Hutch curated workflows | ||
| - [nf-core](https://nf-co.re/pipelines) - Community workflow catalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### WDL (Workflow Description Language) | |
| **Best for:** | |
| - Users who want a simple, human-readable workflow language | |
| - Workflows that need to run on multiple platforms (local, HPC, cloud) | |
| - Integration with Fred Hutch infrastructure via PROOF | |
| - Projects that would benefit from growing bioinformatics task libraries | |
| **Key features:** | |
| - Open-source language developed by the Broad Institute | |
| - Multiple execution engine options (Sprocket, miniWDL, Cromwell) | |
| - Strong focus on reproducibility via containerization | |
| - [WILDS WDL Library](/datascience/wilds_wdl/) provides tested, reusable components | |
| - [PROOF](/datascience/proof/) platform for easy execution on Fred Hutch cluster | |
| **Learn more:** | |
| - [WDL Workflows Guide](/datascience/wdl_workflows/) - Language fundamentals | |
| - [WDL Execution Engines](/datascience/wdl_execution_engines/) - How to run WDL workflows | |
| - [WILDS WDL Library](/datascience/wilds_wdl/) - Ready-to-use modules and workflows | |
| - [PROOF](/datascience/proof/) - User-friendly workflow submission at Fred Hutch | |
| ### Nextflow | |
| **Best for:** | |
| - Users comfortable with Groovy/Java-like syntax | |
| - Workflows requiring complex logic or custom operations | |
| - Integration with nf-core community workflows | |
| - Projects needing fine-grained control over execution | |
| **Key features:** | |
| - Mature ecosystem with extensive community support | |
| - Native support for containers, conda, and modules | |
| - Powerful DSL (Domain Specific Language) for complex workflows | |
| - Large collection of pre-built workflows via [nf-core](https://nf-co.re/pipelines) | |
| - Active Fred Hutch community | |
| **Learn more:** | |
| - [Nextflow at Fred Hutch](/compdemos/nextflow/) - Getting started guide | |
| - [Nextflow Catalog](/datascience/nextflow_catalog) - Fred Hutch curated workflows | |
| - [nf-core](https://nf-co.re/pipelines) - Community workflow catalog | |
| ### WDL (Workflow Description Language) | |
| **Best for:** | |
| - Users who want a simple, human-readable workflow language | |
| - Integration with Fred Hutch infrastructure via PROOF | |
| **Key features:** | |
| - Open-source language developed by the Broad Institute | |
| - [WILDS WDL Library](/datascience/wilds_wdl/) provides tested, reusable WDLs built by and for Fred Hutch scientists | |
| - [PROOF](/datascience/proof/) platform for easy execution on Fred Hutch cluster | |
| **Learn more:** | |
| - [WDL Workflows Guide](/datascience/wdl_workflows/) - Language fundamentals | |
| - [WDL Execution Engines](/datascience/wdl_execution_engines/) - How to run WDL workflows | |
| - [WILDS WDL Library](/datascience/wilds_wdl/) - Ready-to-use modules and workflows | |
| - [GATK Workflows](https://github.com/gatk-workflows) | |
| - [PROOF](/datascience/proof/) - User-friendly workflow submission at Fred Hutch | |
| ### Nextflow | |
| **Best for:** | |
| - Users comfortable with Groovy/Java-like syntax | |
| - Integration with [nf-core](https://nf-co.re/pipelines) community workflows | |
| - Projects needing fine-grained control over execution | |
| **Key features:** | |
| - Mature ecosystem with extensive community support | |
| - Native support for containers, conda, and modules | |
| - Large collection of pre-built workflows via [nf-core](https://nf-co.re/pipelines) | |
| - Active Fred Hutch community | |
| **Learn more:** | |
| - [Nextflow at Fred Hutch](/compdemos/nextflow/) - Getting started guide | |
| - [Nextflow Catalog](/datascience/nextflow_catalog) - Fred Hutch curated workflows | |
| - [nf-core](https://nf-co.re/pipelines) - Community workflow catalog |
I would take out lines that are implied or don't help distinguish the two from each other. I would also try to note that with WDL we can take requests for WDL Library additions. Also add GATK library
|
|
||
| | Consideration | WDL | Nextflow | | ||
| |--------------|-----|----------| | ||
| | **Learning curve** | Gentle - simple, declarative syntax | Moderate - requires Groovy knowledge | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | **Learning curve** | Gentle - simple, declarative syntax | Moderate - requires Groovy knowledge | | |
| | **Learning curve** | Gentle - simple, declarative syntax | Moderate - Groovy/Java-like syntax | |
Trying to keep these in the same style. Also, I don't think being a coder in Groovy is actually a prerequisite
| | **Execution options** | Multiple engines (Sprocket, miniWDL, Cromwell) | Nextflow runtime | | ||
| | **Local testing** | Easy with Sprocket/miniWDL | Easy with Nextflow | | ||
| | **Pre-built workflows** | WILDS WDL Library, GATK workflows | nf-core (500+ workflows) | | ||
| | **Language style** | Declarative (what to do) | Imperative (how to do it) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | **Language style** | Declarative (what to do) | Imperative (how to do it) | |
I don't think this line adds much to someones decision making
| ## Choosing the Right Engine | ||
|
|
||
| ### Decision Guide | ||
|
|
||
| **For Fred Hutch researchers running on Gizmo or AWS Batch:** | ||
| - Use **PROOF** (Cromwell backend) for workflow submission with all features managed for you | ||
| - Use **Sprocket or miniWDL** locally to test before scaling up | ||
|
|
||
| **For local workflow development and testing:** | ||
| - Use **Sprocket** for easy installation, clear error messages, and actively maintained modern execution | ||
| - Use **miniWDL** as an alternative if Sprocket doesn't meet your needs | ||
| - Consider testing locally before submitting to PROOF/Cromwell | ||
|
|
||
| **For production pipelines:** | ||
| - Use **Cromwell** for call caching, server mode, and robust HPC integration | ||
| - Leverage PROOF if you want infrastructure managed for you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## Choosing the Right Engine | |
| ### Decision Guide | |
| **For Fred Hutch researchers running on Gizmo or AWS Batch:** | |
| - Use **PROOF** (Cromwell backend) for workflow submission with all features managed for you | |
| - Use **Sprocket or miniWDL** locally to test before scaling up | |
| **For local workflow development and testing:** | |
| - Use **Sprocket** for easy installation, clear error messages, and actively maintained modern execution | |
| - Use **miniWDL** as an alternative if Sprocket doesn't meet your needs | |
| - Consider testing locally before submitting to PROOF/Cromwell | |
| **For production pipelines:** | |
| - Use **Cromwell** for call caching, server mode, and robust HPC integration | |
| - Leverage PROOF if you want infrastructure managed for you | |
| ## Summary | |
| **For Fred Hutch researchers running on Gizmo or AWS Batch:** | |
| - Use **PROOF** (Cromwell backend). Must use WDL version 1.0. | |
| - Consider testing locally before submitting to PROOF/Cromwell | |
| **For local workflow development and testing:** | |
| - Use **Sprocket** for actively maintained modern execution | |
| - Use **miniWDL** as an alternative if Sprocket doesn't meet your needs |
I think this can be simplified
| ### Workflow Portability | ||
|
|
||
| A key benefit of WDL is that workflows written for one engine generally work on others. The [WILDS WDL Library](/datascience/wilds_wdl/) ensures portability by testing all components on Cromwell, miniWDL, and Sprocket. | ||
|
|
||
| **Tips for portability:** | ||
| - Use WDL version 1.0 (required for PROOF) | ||
| - Specify Docker containers (not institution-specific modules) | ||
| - Use standard WDL features, avoiding engine-specific extensions | ||
| - Test on multiple engines if sharing workflows broadly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Workflow Portability | |
| A key benefit of WDL is that workflows written for one engine generally work on others. The [WILDS WDL Library](/datascience/wilds_wdl/) ensures portability by testing all components on Cromwell, miniWDL, and Sprocket. | |
| **Tips for portability:** | |
| - Use WDL version 1.0 (required for PROOF) | |
| - Specify Docker containers (not institution-specific modules) | |
| - Use standard WDL features, avoiding engine-specific extensions | |
| - Test on multiple engines if sharing workflows broadly | |
| **All WDLs in the [WILDS WDL Library](/datascience/wilds_wdl/) have been tested on Cromwell, miniWDL, and Sprocket.** |
I think this is the only important point here
|
|
||
| - [OpenWDL Community](https://openwdl.org/) | ||
| - [WDL Slack Workspace](https://join.slack.com/t/openwdl/shared_invite/zt-ctmj4mhf-cFBNxIiZYs6SY9HgM9UAVw) | ||
| - Engine-specific GitHub repositories (linked above) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Engine-specific GitHub repositories (linked above) |
|
|
||
| The Workflow Description Language (WDL) is an open-source language for describing data processing workflows with human-readable syntax. WDL makes it straightforward to define analysis tasks, chain them together in workflows, and parallelize their execution across different computing environments. | ||
|
|
||
| ## Why Use WDL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we selling it or assuming they have chose to try WDL? If the point of this page is a gentle introduction I would cut this section out and go straight to WDL Fundamentals
|
|
||
| The same workflow file runs identically across all these environments without modification. | ||
|
|
||
| ## WDL Fundamentals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be better to just point to the WDL course and maybe have this page be a high level summary of that (possibly with some tips as below).
This page feels like a tutorial but not a comprehensive one. I would decide if this page is a tutorial or a reference.
Description
Related Issue
Testing