Run Picard tools and collate multiple metrics files. Check the quality of your sequencing data.
Run picardmetrics
like this:
for bam in data/project1/sample?/sample?.bam
do
# -k keeps the BAM file with marked duplicate reads
# -r runs RNA-seq Picard metrics
# -o specifies where to put the output files
picardmetrics run -k -r -o out/rnaseq $bam
done
# The final output file will be called "project1-all-metrics.tsv"
picardmetrics collate project1 out/rnaseq
picardmetrics
runs up to 12 Picard tools on each BAM file and
collates all of the output files into a single table with up to 90 different
metrics. It also automatically creates the .refFlat
and
.rRNA.list
files required for CollectRnaSeqMetrics.
See the picardmetrics manual for more details.
Next, plot and explore the metrics in R:
library(ggplot2)
dat <- read.delim("project1-all-metrics.tsv", stringsAsFactors = FALSE)
ggplot(dat) +
geom_point(aes(PF_READS, PF_ALIGNED_BASES))
See two example BAM files in the data/ folder. The
test/test.sh script illustrates the usage of picardmetrics
and tests
that it works correctly. See the outputs in the out/ folder. You can
also download the reference files used to test picardmetrics
.
Use Picard to assess the quality of your sequencing data. This example shows RNA-seq data from hundreds of glioblastoma cells and gliomasphere cell lines.
On the left, each point represents an RNA-seq sample. We see that samples with high mean mapping quality have the greatest number of detected genes. Further, the color reveals variation in the percent of reads per sample that are assigned to exons.
On the right, each bar represents an RNA-seq sample. Each sample is broken down into the percent of sequenced bases coming from different genomic regions. We see that many samples have few sequenced bases coming from coding regions relative to intergenic regions.
# Download the code.
git clone https://github.com/slowkow/picardmetrics
cd picardmetrics
# Download and install the dependencies.
make get-deps PREFIX=~/.local
# Install picardmetrics and the man page.
make install PREFIX=~/.local
# Edit the configuration file for your project.
vim ~/picardmetrics.conf
If you wish, you can manually install the dependencies:
- Picard
- samtools, which depends on htslib
- stats
- gtfToGenePred
Please submit an issue to report bugs or ask questions.
Please contribute bug fixes or new features with a pull request to this repository.
RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data. The input can be one or more BAM files. The output consists of HTML reports and tab delimited files of metrics data. This program can be valuable for comparing sequencing quality across different samples or experiments to evaluate different experimental parameters. It can also be run on individual samples as a means of quality control before continuing with downstream analysis.
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, etc.
The QoRTs software package is a fast, efficient, and portable multifunction toolkit designed to assist in the analysis, quality control, and data management of RNA-Seq datasets. Its primary function is to aid in the detection and identification of errors, biases, and artifacts produced by paired-end high-throughput RNA-Seq technology. In addition, it can produce count data designed for use with differential expression and differential exon usage tools 2, as well as individual-sample and/or group-summary genome track files suitable for use with the UCSC genome browser (or any compatible browser).