TETRIS is a Nextflow pipeline for DNA mapping and variant calling. It provides a flexible and scalable workflow for processing sequencing data, from raw reads to variant calls.
It trims reads with (fastp), aligns with (BWA-MEM), marks duplicates (optional) with (GATK MarkDuplicates), and calls variants with (BCFTOOLS). Additionally QC stats are computed with (FastQC), (Samtools) and (mosdepth) which is aggregated into a report by (MultiQC)
The pipeline is split into two subworkflows, mapping and variant calling. The individual subworkflows can be run depending on your use case, or you can run both as a complete pipeline.
To run the pipeline with default parameters:
# To run complete pipeline
nextflow run main.nf --input samplesheet.csv --reference reference.fasta
# To only running mapping steps
nextflow run main.nf --input samplesheet.csv --reference reference.fasta --mapping_only
# To only running variant calling steps, start from bams
nextflow run main.nf --input samplesheet.csv --reference reference.fasta --calling_onlyThe pipeline requires two main inputs:
- A samplesheet CSV files with the follow columns:
- name: Sample name
- seqid: sequencing ID
- seq_type: 'paired' or 'single' end read data type
- fastq_1: path to forward reads
- fastq_2: path to reverse reads (optional for single end data)
or when starting at the variant calling subworkflow:
- name: Sample name
- bam: path to bam file
- bai: path to bam index file
- A reference genome in FASTA format
| Parameter | Description | Default |
|---|---|---|
| --input | Input samplesheet CSV file | (required) |
| --reference | Reference genome FASTA file | (required) |
| --outdir | Output directory | results |
| --split_fastq | Number of reads to split FASTQ files by | 0 (no splitting) |
| --skip_fastp | Skip fastp processing | false |
| --skip_markdup | Skip duplicate read marking | false |
| --markdup_tool | Tool for marking duplicates | gatk |
| --grouped_call | Run mpileup and call on all BAMs together | false |
| --mapping_only | Perform only read mapping (no variant calling) | false |
| --regions | bed file of genomic regions to call variants in | null |
| --constrain_alleles | Constrain alleles (requires indexed target positions) | false |
| --targets_index | Indexed set of target positions and alleles | null |
The pipeline outputs, bams, vcfs, multqc and pipeline reports, will be saved in the specified output directory (default: results).
- add in example data (current example data is too large for GitHub)
- ensure compatibility with single end read data
- add more optional samplesheet info to include in read group header
- add second duplicate read mapping tool option

