Skip to content

Commit

Permalink
Release 5.0.0-rc.1 (#94)
Browse files Browse the repository at this point in the history
* Update README.md with Manta

* add Manta to call-sSV-workflow.svg

* Update SVG file in README.md

* Update nextflow.config

* Update performance validation in README

* Update CHANGELOG.md

* Add Manta to metadata.yaml

* Update nextflow.config

* Update README.md

* Update README.md

* update nextflow.config

* Update README and CHANGELOG

* fix BAM file path in README

Co-authored-by: Mootor <mmootor@ip-0A125238.rhxrlfvjyzbupc03cc22jkch3c.xx.internal.cloudapp.net>
  • Loading branch information
Faizal-Eeman and Mootor authored Jan 25, 2023
1 parent 11398bd commit 23243d7
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 45 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,12 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
---

## [Unreleased]

---

## [5.0.0-rc.1] - 2023-01-24
### Changed
- Update `README.md` for release `5.0.0-rc.1`
- Move param checking to `methods.config` using `schema.config`
- Parameterize Docker registry
- Use `ghcr.io/uclahs-cds` as default registry
Expand Down
82 changes: 40 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* [License](#license)

## Overview:
The call-sSV pipeline calls somatic structural variants utilizing [Delly](https://github.com/dellytools/delly). This pipeline requires at least one tumor sample and a matched normal sample.
The call-sSV pipeline calls somatic structural variants utilizing [DELLY](https://github.com/dellytools/delly) and [Manta](https://github.com/Illumina/manta). This pipeline requires at least one tumor sample and a matched normal sample.
This pipeline is developed using Nextflow, docker and can run either on a single node linux machine or a multi-node HPC Slurm cluster.

## How to Run:
Expand All @@ -41,7 +41,7 @@ Pipelines should be run **WITH A SINGLE SAMPLE AT A TIME**. Otherwise resource a
nextflow run path/to/main.nf -config path/to/sample-specific.config
```

* For example, `path/to/main.nf` could be: `/hot/software/pipeline/pipeline-call-sSV/Nextflow/release/3.0.0/main.nf`
* For example, `path/to/main.nf` could be: `/hot/software/pipeline/pipeline-call-sSV/Nextflow/release/5.0.0/main.nf`
* `path/to/sample-specific.config` is the path to where you saved your project-specific copy of [template.config](config/template.config)

To submit to UCLAHS-CDS's Azure cloud, use the submission script [here](https://github.com/uclahs-cds/tool-submit-nf) with the command below:
Expand All @@ -56,21 +56,23 @@ python path/to/submit_nextflow_pipeline.py \
```
In the above command, the partition type can be changed based on the size of the dataset. At this point, node F16 is generally recommended for larger datasets like A-full and node F2 for smaller datasets like A-mini.

\* Manta SV calling wouldn't work on an F2 node due to incompatible resources. In order to test the pipeline for tasks not relevant to Manta, please set `algorithm = ['delly']` in the sample specific [config](config/template.config) file.

> **Note**: Because this pipeline uses an image stored in the GitHub Container Registry, you must follow the steps listed in the [Docker Introduction](https://confluence.mednet.ucla.edu/display/BOUTROSLAB/Docker+Introduction#DockerIntroduction-GitHubContainerRegistryGitHubContainerRegistry|Setup) on Confluence to set up a PAT for your GitHub account and log into the registry on the cluster before running this pipeline.
## Flow Diagram:

![](https://github.com/uclahs-cds/pipeline-call-sSV/blob/yupan-documentation/call-sSV-workflow.svg)
![call-sSV flow diagram](call-sSV-workflow.svg?raw=true)

## Pipeline Steps:

### Call Somatic Structural Variants:
### Call Somatic Structural Variants - DELLY workflow:

#### 1. Calling Single Sample Somatic Structural Variants
```script
delly call --genome hg38.fa --exclude hg38.excl --map-qual 20 --min-clique-size 5 --mad-cutoff 15 --outfile t1.bcf tumor1.bam normal1.bam
```
This step requires an aligned and sorted tumor sample BAM file and a matched normal sample as an input for variant calling with Delly.
This step requires an aligned and sorted tumor sample BAM file and a matched normal sample as an input for variant calling with DELLY.
The stringent filters (`--map-qual 20` `--min-clique-size 5` `--mad-cutoff 15`) are added, which can drastically reduce the runtime, especially when the input BAMs are big. In the pipeline, these filters are specified in the NextFlow input parameters [config file](config/template.config). If need be, these stringent filters can be adjusted in the config file.

#### 2. Query the generated bcfs to get the sample names, which will be used in step 3.
Expand All @@ -88,21 +90,29 @@ This step applies somatic filtering against the `.bcf` file generated in Step 1.

Note: cohort based false positive filtering is compuationally heavy and not implemented in this pipeline.

### Call Somatic Structural Variants - Manta workflow:

#### 1. Calling Single Sample Somatic Structural Variants
```script
configManta.py --normalBam "${normal_bam}" --tumorBam "${tumor_bam}" --referenceFasta "${reference_fasta}" --runDir MantaWorkflow
MantaWorkflow/runWorkflow.py
```
This step requires an aligned and sorted tumor sample BAM file and a matched normal sample as an input for variant calling with Manta.

## Inputs

### Input CSV

The input CSV should have each of the input fields listed below as separate columns, using the same order and comma as column separator. An example of the input CSV can be found [here](input/call-sSV-input.csv).

| Field | Type | Description |
| Input | Type | Required | Description |
|--- | --- | --- |
|normal_bam | string | Absolute path to the normal sample `.bam` file |
|tumor_bam | string | Absolute path to the tumor sample `.bam` file. |
| normal_bam | string | yes | Absolute path to the normal sample `.bam` file |
| tumor_bam | string | yes | Absolute path to the tumor sample `.bam` file |

## Nextflow Config File Parameters

| Input Parameter | Required | Type | Description |
| Field | Required | Type | Description |
| ------- | --------- | ------ | -------------|
| dataset_id | yes | string | Boutros Lab dataset id |
| blcds_registered_dataset | yes | boolean | Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is `false`. |
Expand All @@ -123,13 +133,15 @@ An example of the NextFlow Input Parameters Config file can be found [here](conf

## Outputs

| Output | Output type | Description |
| ---- | ----- | -------- |
| .bcf | final | Binary VCF output format with somatic structural variants if found. |
| .bcf.csi | final | CSI-format index for BCF files |
| report.html, timeline.html and trace.txt | log | A Nextflow report, timeline and trace files. |
| \*.log.command.* | log | Process and sample specific logging files created by nextflow. |
| *.sha512 | checksum | Generates SHA-512 hash to validate file integrity. |
| Output | Description |
| ---- | -------- |
| .bcf | Binary VCF output format from DELLY with somatic structural variants if found. |
| .bcf.csi | CSI-format index for BCF files from DELLY. |
| .vcf.gz | zipped VCF output format from Manta with somatic structural variants if found. |
| .vcf.gz.tbi | TBI-format index for zipped VCF files from Manta. |
| report.html, timeline.html and trace.txt | A Nextflow report, timeline and trace files. |
| \*.log.command.* | Process and sample specific logging files created by nextflow. |
| *.sha512 | Generates SHA-512 hash to validate file integrity. |


## Testing and Validation
Expand All @@ -138,51 +150,37 @@ An example of the NextFlow Input Parameters Config file can be found [here](conf

| Data Set | Run Configuration | Output Dir | Normal Sample | Tumor Sample |
| ------ | ------ | ------- | ------ | ------- |
| A-mini | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/A-mini-hg38.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/A-mini/call-sSV-2.0.0/S2.T-0/DELLY-1.0.3/output/ | /hot/resource/SMC-HET/normal/bams/A-mini/0/output/HG002.N-0.bam | /hot/resource/SMC-HET/tumours/A-mini/bams/0/output/S2.T-0.bam |
| A-full | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/A-full-F72-hg19.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/A-full-F72/call-sSV-2.0.0/T5.T.sorted_py/DELLY-1.0.3/output/ | /hot/resource/SMC-HET/normal/bams/HG002.N.bam | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/input/data/T5.T.sorted_py.bam |
| ILHNLNEV000001-T001-P01-F | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/config/ILHNLNEV000001-T001-P01-F-F32-hg38.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/3.0.0/mmoootor-upgrade-delly-0.9.1-to-1.0.3/ILHNLNEV000001-T001-P01-F-F32/call-sSV-2.0.0/ILHNLNEV000001-T001-P01-F_realigned_recalibrated_reheadered/DELLY-1.0.3/output/ | /hot/user/rhughwhite/ILHNLNEV/call-gSNP/output/2020-12-22/ILHNLNEV000001-T001-P01-F/gSNP/2021-01-05_22.01.08/ILHNLNEV000001/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000001-N001-B01-F_realigned_recalibrated_reheadered.bam | /hot/user/rhughwhite/ILHNLNEV/call-gSNP/output/2020-12-22/ILHNLNEV000001-T001-P01-F/gSNP/2021-01-05_22.01.08/ILHNLNEV000001/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000001-T001-P01-F_realigned_recalibrated_reheadered.bam |
| ILHNLNEV000005-T002-L01-F | /hot/user/rhughwhite/ILHNLNEV/call-sSV/inputs_configs/2021-09-10/ILHNLNEV000005-T002-L01-F/nextflow.config | /hot/user/rhughwhite/ILHNLNEV/call-sSV/output/ILHNLNEV000005-T002-L01-F_testing/call-sSV-20210930-180357/ | /hot/user/rhughwhite/ILHNLNEV/call-gSNP/output/2020-12-22/ILHNLNEV000005-T002-L01-F/gSNP/2021-01-08_17.01.47/ILHNLNEV000005/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000005-N001-B01-F_realigned_recalibrated_reheadered.bam | /hot/user/rhughwhite/ILHNLNEV/call-gSNP/output/2020-12-22/ILHNLNEV000005-T002-L01-F/gSNP/2021-01-08_17.01.47/ILHNLNEV000005/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000005-T002-L01-F_realigned_recalibrated_reheadered.bam |
| A-mini TWGSAMIN000001-T003-S03-F | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/config/TWGSAMIN000001-T003-S03-F_F16.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/TWGSAMIN000001-T003-S03-F | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/input/data/TWGSAMIN000001-N003-S03-F.bam | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/input/data/TWGSAMIN000001-T003-S03-F.bam |
| ILHNLNEV000009 | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/config/ILHNLNEV000009-T002-L01-F_F32.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/ILHNLNEV000009-T002-L01-F | /hot/project/disease/HeadNeckTumor/HNSC-000084-LNMEvolution/pipelines/call-gSNP/2020-12-22/ILHNLNEV000009-T002-L01-F//gSNP/2021-01-22_11.01.06/ILHNLNEV000009/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000009-N001-B01-F_realigned_recalibrated_reheadered.bam | /hot/project/disease/HeadNeckTumor/HNSC-000084-LNMEvolution/pipelines/call-gSNP/2020-12-22/ILHNLNEV000009-T002-L01-F//gSNP/2021-01-22_11.01.06/ILHNLNEV000009/SAMtools-1.10_Picard-2.23.3/recalibrated_reheadered_bam_and_bai/ILHNLNEV000009-T002-L01-F_realigned_recalibrated_reheadered.bam |
| DTB-266_WCDT | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/config/DTB-266_WCDT_F72.config | /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-release-5-0-0-rc-1/DTB-266_WCDT | /hot/data/unregistered/Quigley-Gebo-PRAD-SVMW/processed/output_call-gSNP/call-gSNP-DSL2-0.0.1/DTB-266/GATK-4.1.9.0/output/DTB-266_DNA_N_realigned_recalibrated_merged.bam | /hot/data/unregistered/Quigley-Gebo-PRAD-SVMW/processed/output_call-gSNP/call-gSNP-DSL2-0.0.1/DTB-266/GATK-4.1.9.0/output/DTB-266_DNA_T_realigned_recalibrated_merged.bam |

## Performance Validation

Testing was performed primarily in the Boutros Lab SLURM Development cluster. Metrics below will be updated where relevant with additional testing and tuning outputs.

## with Delly v1.0.3 and newer versions
|Test Case | Test Date | Node Type | Duration | CPU Hours | Virtual Memory Usage (RAM)-peak rss|
|----- | -------| --------| ----------| ---------| --------|
|A-mini(with stringent filters) | 2022-07-01 | F2 | 20m 32s | 18m | 1.8 GB |
|A-full(with stringent filters) | 2022-07-10 | F16 | 17h 53m 49s | 17h 54m | 15.1 GB |
|A-full(with stringent filters) | 2021-09-20 | F32 | 20h 14m 1s | 20h 12m | 15.1 GB |
|A-full(with stringent filters) | 2022-07-10 | F72 | 18h 16m 15s | 18h 18m | 15.1 GB |
|ILHNLNEV000001-T001-P01-F (with stringent filters) | 2022-07-10 | F16 | 8h 46m 31s | 8h 48m | 4.4 GB |
|ILHNLNEV000001-T001-P01-F (with stringent filters) | 2022-07-10 | F32 | 9h 38m 29s | 9h 36m | 4.4 GB |

## with Delly v0.9.1 and older versions
|Test Case | Test Date | Node Type | Duration | CPU Hours | Virtual Memory Usage (RAM)-peak rss|
| Test Case | Test Date | Node Type | Duration | CPU Hours | Peak RSS (RAM) |
|----- | -------| --------| ----------| ---------| --------|
|A-mini(with default filters) | 2021-09-20 | F2 | 16m 24s | 16m 1s | 1.7 GB |
|A-mini(with stringent filters) | 2021-10-14 | F2 | 15m 13s | 15m | 1.7 GB |
|A-full(with default filters) | 2021-09-20 | F72 | 19h 54m 5s | 19h 53m 56s | 15.1 GB |
|ILHNLNEV000001-T001-P01-F (with default filters) | 2021-09-20 | F72 | 22h 30m 16s | 22h 30m 6s | 4.5 GB |
|ILHNLNEV000001-T001-P01-F (with stringent filters) | 2021-10-14 | F32 | 8h 37m 34s | 8h 37m 29s | 4.4 GB |
|ILHNLNEV000005-T002-L01-F (with default filters) | 2021-09-20 | F72 | 6d 22h 10m 42s | 11'797.8h | 11.733 GB |
| TWGSAMIN000001-T003-S03-F | 2023-01-19 | F16 | 41m 35s | 0.7 | 1.8 GB |
| ILHNLNEV000009-T002-L01-F | 2023-01-20 | F32 | 1d 23h 10m 46s | 63.3 | 12.1 GB |
| DTB-266_WCDT | 2023-01-19 | F72 | 22h 55m 17s | 45.1 | 13.2 GB |
|ILHNLNEV000005-T002-L01-F (with stringent filters. See [#10](https://github.com/uclahs-cds/pipeline-call-sSV/issues/10) [2f72de1](https://github.com/uclahs-cds/pipeline-call-sSV/commit/2f72de1ba190623e4344f144a12cc315fda1dd18)) | 2021-10-02 | F72 | 1d 10h 55m 13s | 2'478.4h | 11.590 GB |


## References

* [Delly Structural Variant Calling](https://github.com/dellytools/delly)
* [DELLY Structural Variant Caller](https://github.com/dellytools/delly)
* [Manta Structural Variant Caller](https://github.com/Illumina/manta)


## License

Authors: Yu Pan ([email protected]), Ghouse Mohammed ([email protected]), Mohammed Faizal Eeman Mootor ([email protected])

Call-sSV is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.
`call-sSV` is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.

Call-sSV takes BAM files and utilizes Delly to call somatic structural variants.
`call-sSV` takes BAM files and utilizes DELLY and Manta to call somatic structural variants.

Copyright (C) 2021-2022 University of California Los Angeles ("Boutros Lab") All rights reserved.
Copyright (C) 2021-2023 University of California Los Angeles ("Boutros Lab") All rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Expand Down
2 changes: 1 addition & 1 deletion call-sSV-workflow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ maintainers: "Boutros Lab Infrastructure <[email protected]
languages: ["Nextflow", "Docker"]
dependencies: ["Java", "Nextflow", "Docker"]
references: "https://uclahs.box.com/s/qfzr99sc8ntmfddn30ii62wx4273utoz"
tools: ["Delly:v1.1.3", "BCFtools:v1.15.1", "PipeVal:v3.0.0"]
tools: ["Delly:v1.1.3", "Manta:v1.6.0", "BCFtools:v1.15.1", "PipeVal:v3.0.0"]
2 changes: 1 addition & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ manifest {
name = "call-sSV"
author = "Yu Pan, Ghouse Mohammed, Mohammed Faizal Eeman Mootor"
description = "A pipeline to call somatic SVs utilizing Delly"
version = "4.0.0"
version = "5.0.0-rc.1"
}

0 comments on commit 23243d7

Please sign in to comment.