tetris

TETRIS is a Nextflow pipeline for DNA mapping and variant calling. It provides a flexible and scalable workflow for processing sequencing data, from raw reads to variant calls.

It trims reads with (fastp), aligns with (BWA-MEM), marks duplicates (optional) with (GATK MarkDuplicates), and calls variants with (BCFTOOLS). Additionally QC stats are computed with (FastQC), (Samtools) and (mosdepth) which is aggregated into a report by (MultiQC)

The pipeline is split into two subworkflows, mapping and variant calling. The individual subworkflows can be run depending on your use case, or you can run both as a complete pipeline.

Usage

To run the pipeline with default parameters:

# To run complete pipeline
nextflow run main.nf --input samplesheet.csv --reference reference.fasta

# To only running mapping steps
nextflow run main.nf --input samplesheet.csv --reference reference.fasta --mapping_only

# To only running variant calling steps, start from bams
nextflow run main.nf --input samplesheet.csv --reference reference.fasta --calling_only

Input

The pipeline requires two main inputs:

A samplesheet CSV files with the follow columns:

name: Sample name
seqid: sequencing ID
seq_type: 'paired' or 'single' end read data type
fastq_1: path to forward reads
fastq_2: path to reverse reads (optional for single end data)

or when starting at the variant calling subworkflow:

name: Sample name
bam: path to bam file
bai: path to bam index file

A reference genome in FASTA format

Parameters

Parameter	Description	Default
--input	Input samplesheet CSV file	(required)
--reference	Reference genome FASTA file	(required)
--outdir	Output directory	results
--split_fastq	Number of reads to split FASTQ files by	0 (no splitting)
--skip_fastp	Skip fastp processing	false
--skip_markdup	Skip duplicate read marking	false
--markdup_tool	Tool for marking duplicates	gatk
--grouped_call	Run mpileup and call on all BAMs together	false
--mapping_only	Perform only read mapping (no variant calling)	false
--regions	bed file of genomic regions to call variants in	null
--constrain_alleles	Constrain alleles (requires indexed target positions)	false
--targets_index	Indexed set of target positions and alleles	null

Output

The pipeline outputs, bams, vcfs, multqc and pipeline reports, will be saved in the specified output directory (default: results).

TODO:

add in example data (current example data is too large for GitHub)
ensure compatibility with single end read data
add more optional samplesheet info to include in read group header
add second duplicate read mapping tool option

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
conf		conf
docs/images		docs/images
modules/local		modules/local
subworkflows		subworkflows
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tetris

Usage

Input

Parameters

Output

TODO:

About

Uh oh!

Releases

Packages

Languages

lpembleton/tetris

Folders and files

Latest commit

History

Repository files navigation

tetris

Usage

Input

Parameters

Output

TODO:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages