Skip to content

IARCbioinfo/IARC-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 

Repository files navigation

IARC bioinformatics pipelines and tools (updated on 29/07/2024)

This page lists all the pipelines and tools developed at IARC (mostly nextflow pipelines which are suffixed with -nf). It includes also some useful ressources like courses or data notes and tips/tricks. Finally at the bottom of the page you will also find explanations on how to use nextflow pipelines.

Table of Content:

1. IARC pipelines/tools list

2. Courses and data notes

3. Tips & Tricks

4. Coming soon... (only dev branches yet)

5. Nextflow, Docker and Singularity installation and use

6. Outdated and unmaintained pipelines and tools

1. IARC pipelines/tools list

1a. Raw NGS data processing

Name Latest version Maintained Description Tools used
alignment-nf v1.3 - March 2021 ✔️ Yes Performs BAM realignment or fastq alignment, with/without local indel realignment and base quality score recalibration bwa, samblaster, sambamba, samtools, AdapterRemoval, GATK, k8 javascript execution shell, bwa-postalt.js
BQSR-nf v1.1 - Apr 2020 ✔️ Yes Performs base quality score recalibration of bam files using GATK samtools, samblaster, sambamba, GATK
abra-nf v3.0 - Apr 2020 ✔️ Yes Runs ABRA (Assembly Based ReAligner) ABRA, bedtools, bwa, sambamba, samtools
gatk4-DataPreProcessing-nf Nov 2018 ? Performs bwa alignment and pre-processing (mark duplicates and recalibration) following GATK4 best practices - compatible with hg38 bwa, picard, GATK4, sambamba, qualimap
PostAlignment-nf Aug 2018 ? Perform post alignment on bam files samtools, sambamba, bwa-postalt.js
****************** *********** *********** ************************* ************************
marathon-wgs June 2018 ? Studies intratumor heterogeneity with Canopy bwa, platypus, strelka2, vt, annovar, R, Falcon, Canopy
ITH-nf Sept 2018 ? Perform intra-tumoral heterogeneity (ITH) analysis Strelka2 , Platypus, Bcftools, Tabix, Falcon, Canopy
Name Latest version Maintained Description Tools used
RNAseq-nf v2.4 - Dec 2020 ✔️ Yes Performs RNAseq mapping, quality control, and reads counting - See also RNAseq_analysis_scripts for post-processing fastqc, RESeQC, multiQC, STAR, htseq, cutadapt, Python version > 2.7, trim_galore, hisat2, GATK, samtools
RNAseq-transcript-nf v2.2 - June 2020 ✔️ Yes Performs transcript identification and quantification from a series of BAM files StringTie
RNAseq-fusion-nf v1.1 - Aug 2020 ✔️ Yes Perform fusion-genes discovery from RNAseq data using STAR-Fusion STAR-Fusion
gene-fusions-nf v1 - Oct 2020 - updated Nov 2021 ✔️ Yes Perform fusion-genes discovery from RNAseq data using Arriba Arriba
quantiseq-nf v1.1 - July 2020 ✔️ Yes Quantify immune cell content from RNA-seq data quanTIseq

workflow

Name Latest version Maintained Description Tools used
SComatic-nf April 2024 ✔️ Yes Performs variant calling from single-cell RNAseq data SComatic, annovar
numbat-nf April 2024 ✔️ Yes Performs variant calling from single-cell RNAseq data numbat, SigProfilerExtractor
Name Latest version Maintained Description Tools used
NGSCheckMate v1.1a - July 2021 ✔️ Yes Runs NGSCheckMate on BAM files to identify data files from a same indidual (i.e. check N/T pairs) NGSCheckMate
conpair-nf June 2018 ? Runs conpair (concordance and contamination estimator) conpair, Python 2.7, numpy 1.7.0 or higher, scipy 0.14.0 or higher, GATK 2.3 or higher
damage-estimator-nf June 2017 ? Runs "Damage Estimator" Damage Estimator, samtools, R with GGPLOT2 package
QC3 May 2016 No Runs QC on DNA seq data (raw data, aligned data and variant calls - forked from slzhao samtools
fastqc-nf v1.1 - July 2020 ✔️ Yes Runs fastqc and multiqc on DNA seq data (fastq data) FastQC, Multiqc
qualimap-nf v1.1 - Nov 2019 ✔️ Yes Performs quality control on bam files (WES, WGS and target alignment data) samtools, Qualimap, Multiqc
mpileup-nf Jan 2018 ? Computes bam coverage with samtools mpileup (bed parallelization) samtools,annovar
bamsurgeon-nf Mar 2019 ? Runs bamsurgeon (tool to add mutations to bam files) with step of variant simulation Python 2.7, bamsurgeon, R software (tested with R version 3.2.3)
Name Latest version Maintained Description Tools used
needlestack v1.1 - May 2019 ✔️ Yes Performs multi-sample somatic variant calling perl, bedtools, samtools and R software
target-seq Aug 2019 ? Whole pipeline to perform multi-sample somatic variant calling using Needlestack on targeted sequencing data abra2,QC3 ,needlestack, annovar and R software
strelka2-nf v1.2a - Dec 2020 ✔️ Yes Runs Strelka 2 (germline and somatic variant caller) Strelka2
strelka-nf Jun 2017 No Runs Strelka (germline and somatic variant caller) Strelka
mutect-nf v2.3 - July 2021 ✔️ Yes Runs Mutect on tumor-matched normal bam pairs Mutect and its dependencies (Java 1.7 and Maven 3.0+), bedtools
gatk4-HaplotypeCaller-nf Dec 2019 ? Runs variant calling in GVCF mode on bam files following GATK best practices GATK
gatk4-GenotypeGVCFs-nf Apr 2019 ? Runs joint genotyping on gvcf files following GATK best practices GATK
GVCF_pipeline-nf Nov 2016 ? Performs bam realignment and recalibration + variant calling in GVCF mode following GATK best practices bwa, samblaster, sambamba, GATK
platypus-nf v1.0 - Apr 2018 ? Runs Platypus (germline variant caller) Platypus
TCGA_platypus-nf Aug 2018 ? Converts TCGA Platypus vcf in format for annotation with annovar vt,VCFTools
vcf_normalization-nf v1.1 - May 2020 ✔️ Yes Decomposes and normalizes variant calls (vcf files) bcftools,samtools/htslib
TCGA_germline-nf May 2017 ? Extract germline variants from TCGA data for annotation with annovar (vcf files) R software
gama_annot-nf Aug 2020 ✔️ Yes Filter and annotate batch of vcf files (annovar + strand + context) annovar, R
table_annovar-nf v1.1.1 - Feb 2021 ✔️ Yes Annotate variants with annovar (vcf files) annovar
RF-mut-f Nov 2021 ✔️ Yes Random forest implementation to filter germline mutations from tumor-only samples annovar
****************** *********** *********** ************************* ************************
MutSig Oct 2021 ✔️ Yes Pipeline to perform mutational signatures analysis of WGS data using SigProfilerExtractor SigProfilerExtractor
MutSpec v2.0 - May 2017 ? Suite of tools for analyzing and interpreting mutational signatures annovar
****************** *********** *********** ************************* ************************
purple-nf v1.1 - Nov 2021 ✔️ Yes Pipeline to perform copy number calling from tumor/normal or tumor-only sequencing data using PURPLE PURPLE
facets-nf v2.0 - Oct 2020 ✔️ Yes Performs fraction and copy number estimate from tumor/normal sequencing data using facets facets , R
CODEX-nf Mar 2017 ? Performs copy number variant calling from whole exome sequencing data using CODEX R with package Codex, Rscript
svaba-nf v1.0 - August 2020 ✔️ Yes Performs structural variant calling using SvABA SvABA , R
sv_somatic_cns-nf v1.0 - Nov 2021 ✔️ Yes Pipeline using multiple SV callers for consensus structural variant calling from tumor/normal sequencing data Delly, SvABA, Manta, SURVIVOR, bcftools, Samtools
ssvht v1 - Oct 2022 ✔️ Yes 🔴 NEW set of scripts to assist the calling of somatic structural variants from short reads using a random forest classifier
Name Latest version Maintained Description Tools used
WSIPreprocessing December 2023 ✔️ Yes Preprocessing pipeline for WSIs (Tiling, color normalization) Python, openslide
Name Latest version Maintained Description Tools used
TumorSegmentationCFlowAD December 2023 ✔️ Yes Tumour segmentation with an anomaly detection model Python, PyTorch
Name Latest version Maintained Description Tools used
PathonetLNEN December 2023 ✔️ Yes Detection and classification of cells as positive or negative for an immunomarker developed for PHH3 and Ki-67 in lung carcinoma. Python, TensorFlow
Name Latest version Maintained Description Tools used
LNENBarlowTwins December 2023 ✔️ Yes Extractions of HE tiles features with Barlow Twins a self-supervised deep learning model. Python, Pytorch
Name Latest version Maintained Description Tools used
SpatialPCAForWSIs December 2023 ✔️ Yes Spatially aware principal component analysis to obtain a low-dimensional representation of the tiles encoding vectors. R
Name Latest version Maintained Description Tools used
template-nf May 2020 ✔️ Yes Empty template for nextflow pipelines NA
data_test Aug 2020 ✔️ Yes Small data files to test IARC nextflow pipelines NA
bam2cram-nf v1.0 - Nov 2020 ✔️ Yes Pipeline to convert bam files to cram files samtools
hla-neo-nf v1.1 - June 2021 ✔️ Yes Pipeline to predict neoantigens from WGS of T/N pairs xHLA, VEP, pVACtools
PRSice Nov 2020 Pipeline to compute polygenic risk scores PRSice-2
methylkey May 2021 ✔️ Yes Pipeline for 450k and 850k array analysis (bisulfite data analysis using Minfi, Methylumi, Comet, Bumphunter and DMRcate packages) R software
wsearch-nf July 2022 ✔️ Yes 🔴 NEW pipeline: Microbiome analysis with usearch, vsearch and phyloseq
AmpliconArchitect-nf v1.0 - Oct 2021 ✔️ Yes Discovers ecDNA in cancer genomes using AmpliconArchitect AmpliconArchitect
addreplacerg-nf Jan 2017 ? Adds and replaces read group tags in BAM files samtools
bametrics-nf Mar 2017 ? Computes average metrics from reads that overlap a given set of positions NA
Gviz_multiAlignments Aug 2017 ? Generates multiple BAM alignments views using Gviz bioconductor package Gviz
nf_coverage_demo v2.3 - July 2020 ✔️ Yes Plots mean coverage over a series of BAM files bedtools, R software
LiftOver-nf Nov 2017 ? Converts BED/VCF between hg19 and hg38 picard
MinION_pipes Jan 2020 ? Analyze MinION sequencing data for the reconstruction of viral genomes Guppy V3.1.5+, Porechop V0.2.4, Nanofilt V2.2.0, Filtlong V0.2.0, SPAdes V3.10.1, CAP3 02/10/15, BLAST V2.9.0+, MUSCLE V3.8.1551, Nanopolish V0.11.0, Minimap2 V2.15, Samtools version 1.9
DraftPolisher Jan 2020 ? Fast polishing of draft sequences (draft genome assembly) MUSCLE, Python3
Imputation-nf v1.1 - July 2021 ✔️ Yes Pipeline to perform dataset genotyping imputation LiftOver, Plink, Admixture, Perl, Term::ReadKey, Becftools, Eagle, Minimac4 and samtools
PVAmpliconFinder Aug 2020 ✔️ Yes Identify and classify known and potentially new papilliomaviridae sequences from amplicon deep-sequencing with degenerated papillomavirus primers. Python and Perl + FastQC, MultiQC, Trim Galore, VSEARCH, Blast, RaxML-EPA, PaPaRa, CAP3, KRONA)
integration_analysis_scripts Mar 2020 ✔️ Yes Performs unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta values R software with iClusterPlus, gplots and lattice R packages
mpileup2readcounts Apr 2018 ? Get the readcounts at a locus by piping samtools mpileup output - forked from gatoravi samtools
Methylation_analysis_scripts v1.0 - June 2020 - updated Nov 2021 ✔️ Yes Perform Illumina EPIC 850K array pre-processing and QC from idat files R software
DRMetrics Oct 2020 ✔️ Yes Evaluate the quality of projections obtained after using dimensionality reduction techniques R software
acnviewer-singularity Jul 2019 ? Build a singularity image of aCNViewer (tool for visualization of absolute copy number and copy neutral variations) ( Singularity
polysolver-singularity Dec 2019 ? Build a singularity image of Polysolver (tool for HLA typing based on whole exome seq) Singularity
scanMyWorkDir May 2018 ? Non-destructive and informative scan of a nextflow work folder NA
Name Description Tools used
nextflow-course-2018 Nextflow course NA
SBG-CGC_course2018 Analyzing TCGA data in SBG-CGC NA
Medical Genomics Course Medical Genomics course held at the INSA Lyon - updated Fall 2022 NA
intro-cancer-genomics Introduction to cancer genomics NA
mesomics_data_note Repository with code and datasets used in the mesomics data note manuscript NA
Name Latest version Maintained Description Tools used
BAM-tricks Tips and tricks for BAM files samtools, freebayes, bedtools, biobambam2, Picard, rbamtools
VCF-tricks Tips and tricks for VCF files samtools,bcftools, vcflib, vcftools, R scripts
R-tricks Tips and tricks for R NA
EGA-tricks Tips and tricks to use the European Genome-Phenome Archive from the European Bioinformatics Institute EGA client
GDC-tricks Tips and tricks to use the GDC data portal NA
awesomeTCGA Curated list of resources to access TCGA data NA
LSF-Tricks Tips and tricks for LSF HPC scheduler NA
Name Description Tools used
DPclust-nf Method for subclonal reconstruction using SNVs and/or CNAs from whole genome or whole exome sequencing data dpclust , R
ITH_pipeline Study intra-tumoral heterogeneity (ITH) through subclonality reconstruction HATCHet , DeCiFer, ClonEvol
Nextflow_DSL2 Repository with modules for nextflow DSL2 NA
variantflag Merge and annotate variants from different callers
EPIDRIVER2020 Scripts for EPIDRIVER Project
  1. Install java JRE if you don't already have it (7 or higher).

  2. Install nextflow.

    curl -fsSL get.nextflow.io | bash

    And move it to a location in your $PATH (/usr/local/bin for example here):

    sudo mv nextflow /usr/local/bin

To avoid having to installing all dependencies each time you use a pipeline, you can instead install docker and let nextflow dealing with it. Installing docker is system specific (but quite easy in most cases), follow  docker documentation (docker CE is sufficient). Also follow the post-installation step to manage Docker as a non-root user (here for Linux), otherwise you will need to change the sudo option in nextflow docker config scope as described in the nextflow documentation here.

To run nextflow pipeline with Docker, simply add the -with-docker option in the nextflow run command.

To avoid having to installing all dependencies each time you use a pipeline, you can also install singularity and let nextflow dealing with it.

See documentation here.

In case you want to use the same singularity container - with the exactly same versions of pipeline and tools - on several data over time you may want to pull the container and archive it somewhere :

singularity pull shub://IARCbioinfo/pipeline-nf:v2.2

where "pipeline-nf" should be replaced by the name of the pipeline you want to use (example: RNAseq-nf) and 2.2 by the version of the pipeline you want to use (example: 2.4) This will create a singularity container file: pipeline-nf_v2.2.sif (example: RNAseq-nf_v2.4.sif) that you can then use by specifying it in the nextflow command (see usage)

=> example:

singularity pull shub://IARCbioinfo/RNAseq-nf:v2.4
nextflow run iarcbioinfo/pipeline_name -r X --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

OR USING SINGULARITY

nextflow run iarcbioinfo/pipeline_name -r X -profile singularity --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

OR USING SINGULARITY WITH SPECIFIC CONTAINER

nextflow run iarcbioinfo/pipeline_name -r X -with-singularity XXX.sif --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

You can update the nextflow sofware and the pipeline itself simply using:

nextflow -self-update
nextflow pull iarcbioinfo/pipeline_name

You can also automatically update the pipeline when you run it by adding the option -latest in the nextflow run command. Doing so you will always run the latest version from Github.

nextflow run iarcbioinfo/pipeline_name --help
Name Latest version Maintained Description Tools used
GATK-Alignment-nf June 2017 No Performs bwa alignment and pre-processing (realignment and recalibration) following first version of GATK best practices (less performant than alignment-nf ) bwa, picard, GATK

Releases

No releases published

Packages

No packages published