IntGenomicsLab/lr_somatic

Introduction

IntGenomicsLab/lr_somatic is a robust bioinformatics pipeline designed for processing and analyzing somatic DNA sequencing data for long-read sequencing technologies from Oxford Nanopore and PacBio. It supports both canonical base DNA and modified base calling, including specialized applications such as Fiber-seq.

This end-to-end pipeline handles the entire workflow — from raw read processing and alignment, to comprehensive somatic variant calling, including single nucleotide variants, indels, structural variants, copy number alterations, and modified bases.

It can be run in both matched tumour-normal and tumour-only mode, offering flexibility depending on the users study design.

Developed using Nextflow DSL2, it offers high portability and scalability across diverse computing environments. By leveraging Docker or Singularity containers, installation is streamlined and results are highly reproducible. Each process runs in an isolated container, simplifying dependency management and updates. Where applicable, pipeline components are sourced from nf-core/modules, promoting reuse, interoperability, and consistency within the broader Nextflow and nf-core ecosystems.

Pipeline summary

1) Pre-processing:

a. Raw read QC (cramino)

b. Alignment to the reference genome (minimap2)

c. Post alignment QC (cramino, samtools idxstats, samtools flagstats, samtools stats)

d. Specific for calling modified base calling (Modkit, Fibertools)

2i) Matched mode: small variant calling:

a. Calling Germline SNPs (Clair3)

b. Phasing and Haplotagging the SNPs in the normal and tumour BAM (LongPhase)

c. Calling somatic SNVs (ClairS)

2ii) Tumour only mode: small variant calling:

a. Calling Germline SNPs and somatic SNVs (ClairS-TO)

b. Phasing and Haplotagging germline SNPs in tumour BAM (LongPhase)

3) Large variant calling:

a. Somatic structural variant calling (Severus)

b. Copy number alterion calling; long read version of (ASCAT)

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.Make sure to test your setup with -profile test before running the workflow on actual data.

First prepare a samplesheet with your input data that looks as follows:

sample,bam_tumor,bam_normal,platform,sex,fiber
sample1,tumour.bam,normal.bam,ont,female,n
sample2,tumour.bam,,ont,female,y
sample3,tumour.bam,,pb,male,n
sample4,tumour.bam,normal.bam,pb,male,y

Each row represents a sample. The bam files should always be unaligned bam files. All fields except for bam_normal are required. If bam_normal is empty, the pipeline will run in tumour only mode. platform should be either ont or pb for Oxford Nanopore Sequencing or PacBio sequencing, respectively. sex refers to the biological sex of the sample and should be either female or male. Finally, fiber specifies whether your sample is Fiber-seq data or not and should have either y for Yes or n for No.

Now, you can run the pipeline using:

nextflow run IntGenomicsLab/lr_somatic \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

IntGenomicsLab/lr_somatic was originally written by Luuk Harbers, Alexandra Pančíková, Robert Forsyth, Marios Eftychiou, Ruben Cools, and Jonas Demeulemeester.

Pipeline output

This pipeline produces a series of different output files. The main output is an aligned and phased tumour bam file. This bam file can be used by any typical downstream tool that uses bam files as input. Furthermore, we have sample-specific QC outputs from cramino (fastq), cramino (bam), mosdepth, samtools (stats/flagstat/idxstats), and optionally fibertools. Finally, we have a multiqc report from that combines the output from mosdepth and samtools into one html report.

Besides QC and the aligned and phased bam file, we have output from (structural) variant and copy number callers, of which some are optional. The output from these variant callers can be found in their respective folders. For small and structural variant callers (clairS, clairS-TO, and severus) these will contain, among others, vcf files with called variants. For ascat these contain files with final copy number information and plots of the copy number profiles.

Example output directory structure:

results
|
├── multiqc
│
├── sample1
│   ├── bamfiles
│   ├── qc
│   │   ├── tumour
│   │   └── normal
│   ├── variants
│   │   ├── severus
│   │   └── clairs
│   └── ascat
│
└── sample2
    ├── bamfiles
    ├── qc
    │   ├── tumour
    │   └── normal
    ├── variants
    │   ├── severus
    │   └── clairs
    └── ascat

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use IntGenomicsLab/lr_somatic for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 324 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
assets		assets
conf		conf
docs		docs
modules		modules
subworkflows		subworkflows
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
ro-crate-metadata.json		ro-crate-metadata.json
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IntGenomicsLab/lr_somatic

Introduction

Pipeline summary

Usage

Credits

Pipeline output

Contributions and Support

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

IntGenomicsLab/lr_somatic

Folders and files

Latest commit

History

Repository files navigation

IntGenomicsLab/lr_somatic

Introduction

Pipeline summary

Usage

Credits

Pipeline output

Contributions and Support

Citations

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages