Skip to content

IntGenomicsLab/lr_somatic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IntGenomicsLab/lr_somatic

GitHub Actions CI Status GitHub Actions Linting StatusCite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Introduction

IntGenomicsLab/lr_somatic is a robust bioinformatics pipeline designed for processing and analyzing somatic DNA sequencing data for long-read sequencing technologies from Oxford Nanopore and PacBio. It supports both canonical base DNA and modified base calling, including specialized applications such as Fiber-seq.

This end-to-end pipeline handles the entire workflow — from raw read processing and alignment, to comprehensive somatic variant calling, including single nucleotide variants, indels, structural variants, copy number alterations, and modified bases.

It can be run in both matched tumour-normal and tumour-only mode, offering flexibility depending on the users study design.

Developed using Nextflow DSL2, it offers high portability and scalability across diverse computing environments. By leveraging Docker or Singularity containers, installation is streamlined and results are highly reproducible. Each process runs in an isolated container, simplifying dependency management and updates. Where applicable, pipeline components are sourced from nf-core/modules, promoting reuse, interoperability, and consistency within the broader Nextflow and nf-core ecosystems.

Pipeline summary

1) Pre-processing:

a. Raw read QC (cramino)

b. Alignment to the reference genome (minimap2)

c. Post alignment QC (cramino, samtools idxstats, samtools flagstats, samtools stats)

d. Specific for calling modified base calling (Modkit, Fibertools)

2i) Matched mode: small variant calling:

a. Calling Germline SNPs (Clair3)

b. Phasing and Haplotagging the SNPs in the normal and tumour BAM (LongPhase)

c. Calling somatic SNVs (ClairS)

2ii) Tumour only mode: small variant calling:

a. Calling Germline SNPs and somatic SNVs (ClairS-TO)

b. Phasing and Haplotagging germline SNPs in tumour BAM (LongPhase)

3) Large variant calling:

a. Somatic structural variant calling (Severus)

b. Copy number alterion calling; long read version of (ASCAT)

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.Make sure to test your setup with -profile test before running the workflow on actual data.

First prepare a samplesheet with your input data that looks as follows:

sample,bam_tumor,bam_normal,platform,sex,fiber
sample1,tumour.bam,normal.bam,ont,female,n
sample2,tumour.bam,,ont,female,y
sample3,tumour.bam,,pb,male,n
sample4,tumour.bam,normal.bam,pb,male,y

Each row represents a sample. The bam files should always be unaligned bam files. All fields except for bam_normal are required. If bam_normal is empty, the pipeline will run in tumour only mode. platform should be either ont or pb for Oxford Nanopore Sequencing or PacBio sequencing, respectively. sex refers to the biological sex of the sample and should be either female or male. Finally, fiber specifies whether your sample is Fiber-seq data or not and should have either y for Yes or n for No.

Now, you can run the pipeline using:

nextflow run IntGenomicsLab/lr_somatic \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

IntGenomicsLab/lr_somatic was originally written by Luuk Harbers, Alexandra Pančíková, Robert Forsyth, Marios Eftychiou, Ruben Cools, and Jonas Demeulemeester.

Pipeline output

This pipeline produces a series of different output files. The main output is an aligned and phased tumour bam file. This bam file can be used by any typical downstream tool that uses bam files as input. Furthermore, we have sample-specific QC outputs from cramino (fastq), cramino (bam), mosdepth, samtools (stats/flagstat/idxstats), and optionally fibertools. Finally, we have a multiqc report from that combines the output from mosdepth and samtools into one html report.

Besides QC and the aligned and phased bam file, we have output from (structural) variant and copy number callers, of which some are optional. The output from these variant callers can be found in their respective folders. For small and structural variant callers (clairS, clairS-TO, and severus) these will contain, among others, vcf files with called variants. For ascat these contain files with final copy number information and plots of the copy number profiles.

Example output directory structure:

results
|
├── multiqc
│
├── sample1
│   ├── bamfiles
│   ├── qc
│   │   ├── tumour
│   │   └── normal
│   ├── variants
│   │   ├── severus
│   │   └── clairs
│   └── ascat
│
└── sample2
    ├── bamfiles
    ├── qc
    │   ├── tumour
    │   └── normal
    ├── variants
    │   ├── severus
    │   └── clairs
    └── ascat

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use IntGenomicsLab/lr_somatic for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

Workflow for somatic variant calling of long read data

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6