diff --git a/CHANGELOG.md b/CHANGELOG.md index 380317b..0ce9855 100755 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,12 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm --- ## [Unreleased] + +--- + +## [5.0.0-rc.1] - 2023-01-24 ### Changed +- Update `README.md` for release `5.0.0-rc.1` - Move param checking to `methods.config` using `schema.config` - Parameterize Docker registry - Use `ghcr.io/uclahs-cds` as default registry diff --git a/README.md b/README.md index c7fe69b..600682a 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ * [License](#license) ## Overview: -The call-sSV pipeline calls somatic structural variants utilizing [Delly](https://github.com/dellytools/delly). This pipeline requires at least one tumor sample and a matched normal sample. +The call-sSV pipeline calls somatic structural variants utilizing [DELLY](https://github.com/dellytools/delly) and [Manta](https://github.com/Illumina/manta). This pipeline requires at least one tumor sample and a matched normal sample. This pipeline is developed using Nextflow, docker and can run either on a single node linux machine or a multi-node HPC Slurm cluster. ## How to Run: @@ -41,7 +41,7 @@ Pipelines should be run **WITH A SINGLE SAMPLE AT A TIME**. Otherwise resource a nextflow run path/to/main.nf -config path/to/sample-specific.config ``` -* For example, `path/to/main.nf` could be: `/hot/software/pipeline/pipeline-call-sSV/Nextflow/release/3.0.0/main.nf` +* For example, `path/to/main.nf` could be: `/hot/software/pipeline/pipeline-call-sSV/Nextflow/release/5.0.0/main.nf` * `path/to/sample-specific.config` is the path to where you saved your project-specific copy of [template.config](config/template.config) To submit to UCLAHS-CDS's Azure cloud, use the submission script [here](https://github.com/uclahs-cds/tool-submit-nf) with the command below: @@ -56,21 +56,23 @@ python path/to/submit_nextflow_pipeline.py \ ``` In the above command, the partition type can be changed based on the size of the dataset. At this point, node F16 is generally recommended for larger datasets like A-full and node F2 for smaller datasets like A-mini. +\* Manta SV calling wouldn't work on an F2 node due to incompatible resources. In order to test the pipeline for tasks not relevant to Manta, please set `algorithm = ['delly']` in the sample specific [config](config/template.config) file. + > **Note**: Because this pipeline uses an image stored in the GitHub Container Registry, you must follow the steps listed in the [Docker Introduction](https://confluence.mednet.ucla.edu/display/BOUTROSLAB/Docker+Introduction#DockerIntroduction-GitHubContainerRegistryGitHubContainerRegistry|Setup) on Confluence to set up a PAT for your GitHub account and log into the registry on the cluster before running this pipeline. ## Flow Diagram: -![](https://github.com/uclahs-cds/pipeline-call-sSV/blob/yupan-documentation/call-sSV-workflow.svg) +![call-sSV flow diagram](call-sSV-workflow.svg?raw=true) ## Pipeline Steps: -### Call Somatic Structural Variants: +### Call Somatic Structural Variants - DELLY workflow: #### 1. Calling Single Sample Somatic Structural Variants ```script delly call --genome hg38.fa --exclude hg38.excl --map-qual 20 --min-clique-size 5 --mad-cutoff 15 --outfile t1.bcf tumor1.bam normal1.bam ``` -This step requires an aligned and sorted tumor sample BAM file and a matched normal sample as an input for variant calling with Delly. +This step requires an aligned and sorted tumor sample BAM file and a matched normal sample as an input for variant calling with DELLY. The stringent filters (`--map-qual 20` `--min-clique-size 5` `--mad-cutoff 15`) are added, which can drastically reduce the runtime, especially when the input BAMs are big. In the pipeline, these filters are specified in the NextFlow input parameters [config file](config/template.config). If need be, these stringent filters can be adjusted in the config file. #### 2. Query the generated bcfs to get the sample names, which will be used in step 3. @@ -88,6 +90,14 @@ This step applies somatic filtering against the `.bcf` file generated in Step 1. Note: cohort based false positive filtering is compuationally heavy and not implemented in this pipeline. +### Call Somatic Structural Variants - Manta workflow: + +#### 1. Calling Single Sample Somatic Structural Variants +```script +configManta.py --normalBam "${normal_bam}" --tumorBam "${tumor_bam}" --referenceFasta "${reference_fasta}" --runDir MantaWorkflow +MantaWorkflow/runWorkflow.py +``` +This step requires an aligned and sorted tumor sample BAM file and a matched normal sample as an input for variant calling with Manta. ## Inputs @@ -95,14 +105,14 @@ Note: cohort based false positive filtering is compuationally heavy and not impl The input CSV should have each of the input fields listed below as separate columns, using the same order and comma as column separator. An example of the input CSV can be found [here](input/call-sSV-input.csv). -| Field | Type | Description | +| Input | Type | Required | Description | |--- | --- | --- | -|normal_bam | string | Absolute path to the normal sample `.bam` file | -|tumor_bam | string | Absolute path to the tumor sample `.bam` file. | +| normal_bam | string | yes | Absolute path to the normal sample `.bam` file | +| tumor_bam | string | yes | Absolute path to the tumor sample `.bam` file | ## Nextflow Config File Parameters -| Input Parameter | Required | Type | Description | +| Field | Required | Type | Description | | ------- | --------- | ------ | -------------| | dataset_id | yes | string | Boutros Lab dataset id | | blcds_registered_dataset | yes | boolean | Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is `false`. | @@ -123,13 +133,15 @@ An example of the NextFlow Input Parameters Config file can be found [here](conf ## Outputs -| Output | Output type | Description | -| ---- | ----- | -------- | -| .bcf | final | Binary VCF output format with somatic structural variants if found. | -| .bcf.csi | final | CSI-format index for BCF files | -| report.html, timeline.html and trace.txt | log | A Nextflow report, timeline and trace files. | -| \*.log.command.* | log | Process and sample specific logging files created by nextflow. | -| *.sha512 | checksum | Generates SHA-512 hash to validate file integrity. | +| Output | Description | +| ---- | -------- | +| .bcf | Binary VCF output format from DELLY with somatic structural variants if found. | +| .bcf.csi | CSI-format index for BCF files from DELLY. | +| .vcf.gz | zipped VCF output format from Manta with somatic structural variants if found. | +| .vcf.gz.tbi | TBI-format index for zipped VCF files from Manta. | +| report.html, timeline.html and trace.txt | A Nextflow report, timeline and trace files. | +| \*.log.command.* | Process and sample specific logging files created by nextflow. | +| *.sha512 | Generates SHA-512 hash to validate file integrity. | ## Testing and Validation @@ -138,51 +150,37 @@ An example of the NextFlow Input Parameters Config file can be found [here](conf | Data Set | Run Configuration | Output Dir | Normal Sample | Tumor Sample | | ------ | ------ | ------- | ------ | ------- | -| A-mini | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/A-mini-hg38.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/A-mini/call-sSV-2.0.0/S2.T-0/DELLY-1.0.3/output/ | /hot/resource/SMC-HET/normal/bams/A-mini/0/output/HG002.N-0.bam | /hot/resource/SMC-HET/tumours/A-mini/bams/0/output/S2.T-0.bam | -| A-full | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/A-full-F72-hg19.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/A-full-F72/call-sSV-2.0.0/T5.T.sorted_py/DELLY-1.0.3/output/ | /hot/resource/SMC-HET/normal/bams/HG002.N.bam | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/input/data/T5.T.sorted_py.bam | -| ILHNLNEV000001-T001-P01-F | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/ILHNLNEV000001-T001-P01-F-F32-hg38.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/ILHNLNEV000001-T001-P01-F-F32/call-sSV-2.0.0/ILHNLNEV000001-T001-P01-F_realigned_recalibrated_reheadered/DELLY-1.0.3/output/ | /hot/user/rhughwhite/ILHNLNEV/call-gSNP/output/2020-12-22/ILHNLNEV000001-T001-P01-F/gSNP/2021-01-05_22.01.08/ILHNLNEV000001/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000001-N001-B01-F_realigned_recalibrated_reheadered.bam | /hot/user/rhughwhite/ILHNLNEV/call-gSNP/output/2020-12-22/ILHNLNEV000001-T001-P01-F/gSNP/2021-01-05_22.01.08/ILHNLNEV000001/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000001-T001-P01-F_realigned_recalibrated_reheadered.bam | -| ILHNLNEV000005-T002-L01-F | /hot/user/rhughwhite/ILHNLNEV/call-sSV/inputs_configs/2021-09-10/ILHNLNEV000005-T002-L01-F/nextflow.config | /hot/user/rhughwhite/ILHNLNEV/call-sSV/output/ILHNLNEV000005-T002-L01-F_testing/call-sSV-20210930-180357/ | /hot/user/rhughwhite/ILHNLNEV/call-gSNP/output/2020-12-22/ILHNLNEV000005-T002-L01-F/gSNP/2021-01-08_17.01.47/ILHNLNEV000005/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000005-N001-B01-F_realigned_recalibrated_reheadered.bam | /hot/user/rhughwhite/ILHNLNEV/call-gSNP/output/2020-12-22/ILHNLNEV000005-T002-L01-F/gSNP/2021-01-08_17.01.47/ILHNLNEV000005/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000005-T002-L01-F_realigned_recalibrated_reheadered.bam | +| A-mini TWGSAMIN000001-T003-S03-F | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/config/TWGSAMIN000001-T003-S03-F_F16.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/TWGSAMIN000001-T003-S03-F | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/input/data/TWGSAMIN000001-N003-S03-F.bam | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/input/data/TWGSAMIN000001-T003-S03-F.bam | +| ILHNLNEV000009 | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/config/ILHNLNEV000009-T002-L01-F_F32.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/ILHNLNEV000009-T002-L01-F | /hot/project/disease/HeadNeckTumor/HNSC-000084-LNMEvolution/pipelines/call-gSNP/2020-12-22/ILHNLNEV000009-T002-L01-F//gSNP/2021-01-22_11.01.06/ILHNLNEV000009/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000009-N001-B01-F_realigned_recalibrated_reheadered.bam | /hot/project/disease/HeadNeckTumor/HNSC-000084-LNMEvolution/pipelines/call-gSNP/2020-12-22/ILHNLNEV000009-T002-L01-F//gSNP/2021-01-22_11.01.06/ILHNLNEV000009/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000009-T002-L01-F_realigned_recalibrated_reheadered.bam | +| DTB-266_WCDT | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/config/DTB-266_WCDT_F72.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/DTB-266_WCDT | /hot/data/unregistered/Quigley-Gebo-PRAD-SVMW/processed/output_call-gSNP/call-gSNP-DSL2-0.0.1/DTB-266/GATK-4.1.9.0/output/DTB-266_DNA_N_realigned_recalibrated_merged.bam | /hot/data/unregistered/Quigley-Gebo-PRAD-SVMW/processed/output_call-gSNP/call-gSNP-DSL2-0.0.1/DTB-266/GATK-4.1.9.0/output/DTB-266_DNA_T_realigned_recalibrated_merged.bam | ## Performance Validation Testing was performed primarily in the Boutros Lab SLURM Development cluster. Metrics below will be updated where relevant with additional testing and tuning outputs. -## with Delly v1.0.3 and newer versions -|Test Case | Test Date | Node Type | Duration | CPU Hours | Virtual Memory Usage (RAM)-peak rss| -|----- | -------| --------| ----------| ---------| --------| -|A-mini(with stringent filters) | 2022-07-01 | F2 | 20m 32s | 18m | 1.8 GB | -|A-full(with stringent filters) | 2022-07-10 | F16 | 17h 53m 49s | 17h 54m | 15.1 GB | -|A-full(with stringent filters) | 2021-09-20 | F32 | 20h 14m 1s | 20h 12m | 15.1 GB | -|A-full(with stringent filters) | 2022-07-10 | F72 | 18h 16m 15s | 18h 18m | 15.1 GB | -|ILHNLNEV000001-T001-P01-F (with stringent filters) | 2022-07-10 | F16 | 8h 46m 31s | 8h 48m | 4.4 GB | -|ILHNLNEV000001-T001-P01-F (with stringent filters) | 2022-07-10 | F32 | 9h 38m 29s | 9h 36m | 4.4 GB | - -## with Delly v0.9.1 and older versions -|Test Case | Test Date | Node Type | Duration | CPU Hours | Virtual Memory Usage (RAM)-peak rss| +| Test Case | Test Date | Node Type | Duration | CPU Hours | Peak RSS (RAM) | |----- | -------| --------| ----------| ---------| --------| -|A-mini(with default filters) | 2021-09-20 | F2 | 16m 24s | 16m 1s | 1.7 GB | -|A-mini(with stringent filters) | 2021-10-14 | F2 | 15m 13s | 15m | 1.7 GB | -|A-full(with default filters) | 2021-09-20 | F72 | 19h 54m 5s | 19h 53m 56s | 15.1 GB | -|ILHNLNEV000001-T001-P01-F (with default filters) | 2021-09-20 | F72 | 22h 30m 16s | 22h 30m 6s | 4.5 GB | -|ILHNLNEV000001-T001-P01-F (with stringent filters) | 2021-10-14 | F32 | 8h 37m 34s | 8h 37m 29s | 4.4 GB | -|ILHNLNEV000005-T002-L01-F (with default filters) | 2021-09-20 | F72 | 6d 22h 10m 42s | 11'797.8h | 11.733 GB | +| TWGSAMIN000001-T003-S03-F | 2023-01-19 | F16 | 41m 35s | 0.7 | 1.8 GB | +| ILHNLNEV000009-T002-L01-F | 2023-01-20 | F32 | 1d 23h 10m 46s | 63.3 | 12.1 GB | +| DTB-266_WCDT | 2023-01-19 | F72 | 22h 55m 17s | 45.1 | 13.2 GB | |ILHNLNEV000005-T002-L01-F (with stringent filters. See [#10](https://github.com/uclahs-cds/pipeline-call-sSV/issues/10) [2f72de1](https://github.com/uclahs-cds/pipeline-call-sSV/commit/2f72de1ba190623e4344f144a12cc315fda1dd18)) | 2021-10-02 | F72 | 1d 10h 55m 13s | 2'478.4h | 11.590 GB | ## References -* [Delly Structural Variant Calling](https://github.com/dellytools/delly) +* [DELLY Structural Variant Caller](https://github.com/dellytools/delly) +* [Manta Structural Variant Caller](https://github.com/Illumina/manta) ## License Authors: Yu Pan (YuPan@mednet.ucla.edu), Ghouse Mohammed (GMohammed@mednet.ucla.edu), Mohammed Faizal Eeman Mootor (MMootor@mednet.ucla.edu) -Call-sSV is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license. +`call-sSV` is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license. -Call-sSV takes BAM files and utilizes Delly to call somatic structural variants. +`call-sSV` takes BAM files and utilizes DELLY and Manta to call somatic structural variants. -Copyright (C) 2021-2022 University of California Los Angeles ("Boutros Lab") All rights reserved. +Copyright (C) 2021-2023 University of California Los Angeles ("Boutros Lab") All rights reserved. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. diff --git a/call-sSV-workflow.svg b/call-sSV-workflow.svg index 38b052f..6f83702 100644 --- a/call-sSV-workflow.svg +++ b/call-sSV-workflow.svg @@ -1 +1 @@ - \ No newline at end of file + \ No newline at end of file diff --git a/metadata.yaml b/metadata.yaml index 6c49d8e..f8f785b 100644 --- a/metadata.yaml +++ b/metadata.yaml @@ -5,4 +5,4 @@ maintainers: "Boutros Lab Infrastructure