This page lists all the pipelines and tools developed at IARC (mostly nextflow pipelines which are suffixed with -nf). It includes also some useful ressources like courses or data notes and tips/tricks. Finally at the bottom of the page you will also find explanations on how to use nextflow pipelines.
Table of Content:
4. Coming soon... (only dev branches yet)
5. Nextflow, Docker and Singularity installation and use
6. Outdated and unmaintained pipelines and tools
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
alignment-nf | v1.3 - March 2021 | ✔️ Yes | Performs BAM realignment or fastq alignment, with/without local indel realignment and base quality score recalibration | bwa, samblaster, sambamba, samtools, AdapterRemoval, GATK, k8 javascript execution shell, bwa-postalt.js |
BQSR-nf | v1.1 - Apr 2020 | ✔️ Yes | Performs base quality score recalibration of bam files using GATK | samtools, samblaster, sambamba, GATK |
abra-nf | v3.0 - Apr 2020 | ✔️ Yes | Runs ABRA (Assembly Based ReAligner) | ABRA, bedtools, bwa, sambamba, samtools |
gatk4-DataPreProcessing-nf | Nov 2018 | ? | Performs bwa alignment and pre-processing (mark duplicates and recalibration) following GATK4 best practices - compatible with hg38 | bwa, picard, GATK4, sambamba, qualimap |
PostAlignment-nf | Aug 2018 | ? | Perform post alignment on bam files | samtools, sambamba, bwa-postalt.js |
****************** | *********** | *********** | ************************* | ************************ |
marathon-wgs | June 2018 | ? | Studies intratumor heterogeneity with Canopy | bwa, platypus, strelka2, vt, annovar, R, Falcon, Canopy |
ITH-nf | Sept 2018 | ? | Perform intra-tumoral heterogeneity (ITH) analysis | Strelka2 , Platypus, Bcftools, Tabix, Falcon, Canopy |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
RNAseq-nf | v2.4 - Dec 2020 | ✔️ Yes | Performs RNAseq mapping, quality control, and reads counting - See also RNAseq_analysis_scripts for post-processing | fastqc, RESeQC, multiQC, STAR, htseq, cutadapt, Python version > 2.7, trim_galore, hisat2, GATK, samtools |
RNAseq-transcript-nf | v2.2 - June 2020 | ✔️ Yes | Performs transcript identification and quantification from a series of BAM files | StringTie |
RNAseq-fusion-nf | v1.1 - Aug 2020 | ✔️ Yes | Perform fusion-genes discovery from RNAseq data using STAR-Fusion | STAR-Fusion |
gene-fusions-nf | v1 - Oct 2020 - updated Nov 2021 | ✔️ Yes | Perform fusion-genes discovery from RNAseq data using Arriba | Arriba |
quantiseq-nf | v1.1 - July 2020 | ✔️ Yes | Quantify immune cell content from RNA-seq data | quanTIseq |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
SComatic-nf | April 2024 | ✔️ Yes | Performs variant calling from single-cell RNAseq data | SComatic, annovar |
numbat-nf | April 2024 | ✔️ Yes | Performs variant calling from single-cell RNAseq data | numbat, SigProfilerExtractor |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
NGSCheckMate | v1.1a - July 2021 | ✔️ Yes | Runs NGSCheckMate on BAM files to identify data files from a same indidual (i.e. check N/T pairs) | NGSCheckMate |
conpair-nf | June 2018 | ? | Runs conpair (concordance and contamination estimator) | conpair, Python 2.7, numpy 1.7.0 or higher, scipy 0.14.0 or higher, GATK 2.3 or higher |
damage-estimator-nf | June 2017 | ? | Runs "Damage Estimator" | Damage Estimator, samtools, R with GGPLOT2 package |
QC3 | May 2016 | No | Runs QC on DNA seq data (raw data, aligned data and variant calls - forked from slzhao | samtools |
fastqc-nf | v1.1 - July 2020 | ✔️ Yes | Runs fastqc and multiqc on DNA seq data (fastq data) | FastQC, Multiqc |
qualimap-nf | v1.1 - Nov 2019 | ✔️ Yes | Performs quality control on bam files (WES, WGS and target alignment data) | samtools, Qualimap, Multiqc |
mpileup-nf | Jan 2018 | ? | Computes bam coverage with samtools mpileup (bed parallelization) | samtools,annovar |
bamsurgeon-nf | Mar 2019 | ? | Runs bamsurgeon (tool to add mutations to bam files) with step of variant simulation | Python 2.7, bamsurgeon, R software (tested with R version 3.2.3) |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
needlestack | v1.1 - May 2019 | ✔️ Yes | Performs multi-sample somatic variant calling | perl, bedtools, samtools and R software |
target-seq | Aug 2019 | ? | Whole pipeline to perform multi-sample somatic variant calling using Needlestack on targeted sequencing data | abra2,QC3 ,needlestack, annovar and R software |
strelka2-nf | v1.2a - Dec 2020 | ✔️ Yes | Runs Strelka 2 (germline and somatic variant caller) | Strelka2 |
strelka-nf | Jun 2017 | No | Runs Strelka (germline and somatic variant caller) | Strelka |
mutect-nf | v2.3 - July 2021 | ✔️ Yes | Runs Mutect on tumor-matched normal bam pairs | Mutect and its dependencies (Java 1.7 and Maven 3.0+), bedtools |
gatk4-HaplotypeCaller-nf | Dec 2019 | ? | Runs variant calling in GVCF mode on bam files following GATK best practices | GATK |
gatk4-GenotypeGVCFs-nf | Apr 2019 | ? | Runs joint genotyping on gvcf files following GATK best practices | GATK |
GVCF_pipeline-nf | Nov 2016 | ? | Performs bam realignment and recalibration + variant calling in GVCF mode following GATK best practices | bwa, samblaster, sambamba, GATK |
platypus-nf | v1.0 - Apr 2018 | ? | Runs Platypus (germline variant caller) | Platypus |
TCGA_platypus-nf | Aug 2018 | ? | Converts TCGA Platypus vcf in format for annotation with annovar | vt,VCFTools |
vcf_normalization-nf | v1.1 - May 2020 | ✔️ Yes | Decomposes and normalizes variant calls (vcf files) | bcftools,samtools/htslib |
TCGA_germline-nf | May 2017 | ? | Extract germline variants from TCGA data for annotation with annovar (vcf files) | R software |
gama_annot-nf | Aug 2020 | ✔️ Yes | Filter and annotate batch of vcf files (annovar + strand + context) | annovar, R |
table_annovar-nf | v1.1.1 - Feb 2021 | ✔️ Yes | Annotate variants with annovar (vcf files) | annovar |
RF-mut-f | Nov 2021 | ✔️ Yes | Random forest implementation to filter germline mutations from tumor-only samples | annovar |
****************** | *********** | *********** | ************************* | ************************ |
MutSig | Oct 2021 | ✔️ Yes | Pipeline to perform mutational signatures analysis of WGS data using SigProfilerExtractor | SigProfilerExtractor |
MutSpec | v2.0 - May 2017 | ? | Suite of tools for analyzing and interpreting mutational signatures | annovar |
****************** | *********** | *********** | ************************* | ************************ |
purple-nf | v1.1 - Nov 2021 | ✔️ Yes | Pipeline to perform copy number calling from tumor/normal or tumor-only sequencing data using PURPLE | PURPLE |
facets-nf | v2.0 - Oct 2020 | ✔️ Yes | Performs fraction and copy number estimate from tumor/normal sequencing data using facets | facets , R |
CODEX-nf | Mar 2017 | ? | Performs copy number variant calling from whole exome sequencing data using CODEX | R with package Codex, Rscript |
svaba-nf | v1.0 - August 2020 | ✔️ Yes | Performs structural variant calling using SvABA | SvABA , R |
sv_somatic_cns-nf | v1.0 - Nov 2021 | ✔️ Yes | Pipeline using multiple SV callers for consensus structural variant calling from tumor/normal sequencing data | Delly, SvABA, Manta, SURVIVOR, bcftools, Samtools |
ssvht | v1 - Oct 2022 | ✔️ Yes | 🔴 NEW set of scripts to assist the calling of somatic structural variants from short reads using a random forest classifier |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
WSIPreprocessing | December 2023 | ✔️ Yes | Preprocessing pipeline for WSIs (Tiling, color normalization) | Python, openslide |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
TumorSegmentationCFlowAD | December 2023 | ✔️ Yes | Tumour segmentation with an anomaly detection model | Python, PyTorch |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
PathonetLNEN | December 2023 | ✔️ Yes | Detection and classification of cells as positive or negative for an immunomarker developed for PHH3 and Ki-67 in lung carcinoma. | Python, TensorFlow |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
LNENBarlowTwins | December 2023 | ✔️ Yes | Extractions of HE tiles features with Barlow Twins a self-supervised deep learning model. | Python, Pytorch |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
SpatialPCAForWSIs | December 2023 | ✔️ Yes | Spatially aware principal component analysis to obtain a low-dimensional representation of the tiles encoding vectors. | R |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
template-nf | May 2020 | ✔️ Yes | Empty template for nextflow pipelines | NA |
data_test | Aug 2020 | ✔️ Yes | Small data files to test IARC nextflow pipelines | NA |
bam2cram-nf | v1.0 - Nov 2020 | ✔️ Yes | Pipeline to convert bam files to cram files | samtools |
hla-neo-nf | v1.1 - June 2021 | ✔️ Yes | Pipeline to predict neoantigens from WGS of T/N pairs | xHLA, VEP, pVACtools |
PRSice | Nov 2020 | Pipeline to compute polygenic risk scores | PRSice-2 | |
methylkey | May 2021 | ✔️ Yes | Pipeline for 450k and 850k array analysis (bisulfite data analysis using Minfi, Methylumi, Comet, Bumphunter and DMRcate packages) | R software |
wsearch-nf | July 2022 | ✔️ Yes | 🔴 NEW pipeline: Microbiome analysis with usearch, vsearch and phyloseq | |
AmpliconArchitect-nf | v1.0 - Oct 2021 | ✔️ Yes | Discovers ecDNA in cancer genomes using AmpliconArchitect | AmpliconArchitect |
addreplacerg-nf | Jan 2017 | ? | Adds and replaces read group tags in BAM files | samtools |
bametrics-nf | Mar 2017 | ? | Computes average metrics from reads that overlap a given set of positions | NA |
Gviz_multiAlignments | Aug 2017 | ? | Generates multiple BAM alignments views using Gviz bioconductor package | Gviz |
nf_coverage_demo | v2.3 - July 2020 | ✔️ Yes | Plots mean coverage over a series of BAM files | bedtools, R software |
LiftOver-nf | Nov 2017 | ? | Converts BED/VCF between hg19 and hg38 | picard |
MinION_pipes | Jan 2020 | ? | Analyze MinION sequencing data for the reconstruction of viral genomes | Guppy V3.1.5+, Porechop V0.2.4, Nanofilt V2.2.0, Filtlong V0.2.0, SPAdes V3.10.1, CAP3 02/10/15, BLAST V2.9.0+, MUSCLE V3.8.1551, Nanopolish V0.11.0, Minimap2 V2.15, Samtools version 1.9 |
DraftPolisher | Jan 2020 | ? | Fast polishing of draft sequences (draft genome assembly) | MUSCLE, Python3 |
Imputation-nf | v1.1 - July 2021 | ✔️ Yes | Pipeline to perform dataset genotyping imputation | LiftOver, Plink, Admixture, Perl, Term::ReadKey, Becftools, Eagle, Minimac4 and samtools |
PVAmpliconFinder | Aug 2020 | ✔️ Yes | Identify and classify known and potentially new papilliomaviridae sequences from amplicon deep-sequencing with degenerated papillomavirus primers. | Python and Perl + FastQC, MultiQC, Trim Galore, VSEARCH, Blast, RaxML-EPA, PaPaRa, CAP3, KRONA) |
integration_analysis_scripts | Mar 2020 | ✔️ Yes | Performs unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta values | R software with iClusterPlus, gplots and lattice R packages |
mpileup2readcounts | Apr 2018 | ? | Get the readcounts at a locus by piping samtools mpileup output - forked from gatoravi | samtools |
Methylation_analysis_scripts | v1.0 - June 2020 - updated Nov 2021 | ✔️ Yes | Perform Illumina EPIC 850K array pre-processing and QC from idat files | R software |
DRMetrics | Oct 2020 | ✔️ Yes | Evaluate the quality of projections obtained after using dimensionality reduction techniques | R software |
acnviewer-singularity | Jul 2019 | ? | Build a singularity image of aCNViewer (tool for visualization of absolute copy number and copy neutral variations) ( | Singularity |
polysolver-singularity | Dec 2019 | ? | Build a singularity image of Polysolver (tool for HLA typing based on whole exome seq) | Singularity |
scanMyWorkDir | May 2018 | ? | Non-destructive and informative scan of a nextflow work folder | NA |
Name | Description | Tools used |
---|---|---|
nextflow-course-2018 | Nextflow course | NA |
SBG-CGC_course2018 | Analyzing TCGA data in SBG-CGC | NA |
Medical Genomics Course | Medical Genomics course held at the INSA Lyon - updated Fall 2022 | NA |
intro-cancer-genomics | Introduction to cancer genomics | NA |
mesomics_data_note | Repository with code and datasets used in the mesomics data note manuscript | NA |
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
BAM-tricks | Tips and tricks for BAM files | samtools, freebayes, bedtools, biobambam2, Picard, rbamtools | ||
VCF-tricks | Tips and tricks for VCF files | samtools,bcftools, vcflib, vcftools, R scripts | ||
R-tricks | Tips and tricks for R | NA | ||
EGA-tricks | Tips and tricks to use the European Genome-Phenome Archive from the European Bioinformatics Institute | EGA client | ||
GDC-tricks | Tips and tricks to use the GDC data portal | NA | ||
awesomeTCGA | Curated list of resources to access TCGA data | NA | ||
LSF-Tricks | Tips and tricks for LSF HPC scheduler | NA |
Name | Description | Tools used |
---|---|---|
DPclust-nf | Method for subclonal reconstruction using SNVs and/or CNAs from whole genome or whole exome sequencing data | dpclust , R |
ITH_pipeline | Study intra-tumoral heterogeneity (ITH) through subclonality reconstruction | HATCHet , DeCiFer, ClonEvol |
Nextflow_DSL2 | Repository with modules for nextflow DSL2 | NA |
variantflag | Merge and annotate variants from different callers | |
EPIDRIVER2020 | Scripts for EPIDRIVER Project |
-
Install java JRE if you don't already have it (7 or higher).
-
Install nextflow.
curl -fsSL get.nextflow.io | bash
And move it to a location in your
$PATH
(/usr/local/bin
for example here):sudo mv nextflow /usr/local/bin
To avoid having to installing all dependencies each time you use a pipeline, you can instead install docker and let nextflow dealing with it. Installing docker is system specific (but quite easy in most cases), follow docker documentation (docker CE is sufficient). Also follow the post-installation step to manage Docker as a non-root user (here for Linux), otherwise you will need to change the sudo
option in nextflow docker
config scope as described in the nextflow documentation here.
To run nextflow pipeline with Docker, simply add the -with-docker
option in the nextflow run
command.
To avoid having to installing all dependencies each time you use a pipeline, you can also install singularity and let nextflow dealing with it.
See documentation here.
In case you want to use the same singularity container - with the exactly same versions of pipeline and tools - on several data over time you may want to pull the container and archive it somewhere :
singularity pull shub://IARCbioinfo/pipeline-nf:v2.2
where "pipeline-nf" should be replaced by the name of the pipeline you want to use (example: RNAseq-nf) and 2.2 by the version of the pipeline you want to use (example: 2.4) This will create a singularity container file: pipeline-nf_v2.2.sif (example: RNAseq-nf_v2.4.sif) that you can then use by specifying it in the nextflow command (see usage)
=> example:
singularity pull shub://IARCbioinfo/RNAseq-nf:v2.4
nextflow run iarcbioinfo/pipeline_name -r X --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work
OR USING SINGULARITY
nextflow run iarcbioinfo/pipeline_name -r X -profile singularity --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work
OR USING SINGULARITY WITH SPECIFIC CONTAINER
nextflow run iarcbioinfo/pipeline_name -r X -with-singularity XXX.sif --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work
You can update the nextflow sofware and the pipeline itself simply using:
nextflow -self-update
nextflow pull iarcbioinfo/pipeline_name
You can also automatically update the pipeline when you run it by adding the option -latest
in the nextflow run
command. Doing so you will always run the latest version from Github.
nextflow run iarcbioinfo/pipeline_name --help
Name | Latest version | Maintained | Description | Tools used |
---|---|---|---|---|
GATK-Alignment-nf | June 2017 | No | Performs bwa alignment and pre-processing (realignment and recalibration) following first version of GATK best practices (less performant than alignment-nf ) | bwa, picard, GATK |