This pipeline is designed to accurately quantify gene and transcript abundance from bulk RNA-seq data. By integrating both alignment-free and alignment-based methods, it enables cross-validation to ensure robust and reliable quantification results.
As illustrated above, the pipeline consists of three stages:
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate standard-in-format, clean-in-sequence FASTQ files. These preprocessed files are optimized for downstream quantification analysis.
In this stage, the pipeline quantifies the abundance of both genes and transcripts. It supports three well-established and widely-used quantifiers:
-
Salmon: An alignment-free quantifier known for its wicked-fast speed and comarable accuracy.
-
RSEM: An alignment-based quantifier with exceptional accuracy. It has been used as gold standard in many benchmarking studies.
-
STAR: An alignment-based quantifier featured by splice-aware alignment. This is the tool used by GDC mRNA quantification analysis pipeline.
The pipeline generates a comprehensive HTML report for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces a unified summary report and a master gene expression matrix including all samples, which can be directly utilized for downstream analyses such as NetBID.
A detailed tutorial to set up and run this pipeline can be found here: https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/.
