From 401ee9050ec61afc11b12acdebb921ba35bb0d35 Mon Sep 17 00:00:00 2001 From: hvaret Date: Wed, 10 Dec 2014 11:34:30 +0100 Subject: [PATCH] v1.0.0 First version on GitHub --- DESCRIPTION | 14 + NAMESPACE | 11 + NEWS | 18 + R/BCVPlot.R | 13 + R/MAPlot.R | 33 + R/MDSPlot.R | 25 + R/NAMESPACE.R | 9 + R/PCAPlot.R | 49 + R/SARTools-package.r | 6 + R/SERE.r | 28 + R/barplotNull.R | 23 + R/barplotTotal.R | 21 + R/checkParameters.DESeq2.r | 108 + R/checkParameters.edgeR.r | 87 + R/clusterPlot.R | 15 + R/countsBoxplots.R | 37 + R/densityPlot.R | 24 + R/descriptionPlots.r | 33 + R/diagSizeFactorsPlots.R | 34 + R/dispersionsPlot.R | 26 + R/exploreCounts.R | 26 + R/exportResults.DESeq2.R | 67 + R/exportResults.edgeR.R | 61 + R/loadCountData.R | 52 + R/loadTargetFile.R | 29 + R/majSequences.R | 32 + R/nDiffTotal.r | 26 + R/pairwiseScatterPlots.R | 25 + R/rawpHist.R | 19 + R/removeNull.R | 11 + R/run.DESeq2.r | 50 + R/run.edgeR.r | 69 + R/summarizeResults.DESeq2.r | 57 + R/summarizeResults.edgeR.r | 40 + R/tabIndepFiltering.R | 20 + R/tabSERE.R | 18 + R/welcome.R | 19 + R/writeReport.DESeq2.r | 42 + R/writeReport.edgeR.r | 37 + README.md | 23 +- inst/CITATION | 14 + inst/Thumbs.db | Bin 0 -> 9728 bytes inst/raw/KO1.htseq.out | 8222 ++++++++++++++++++++++++ inst/raw/KO2.htseq.out | 8222 ++++++++++++++++++++++++ inst/raw/WT1.htseq.out | 8222 ++++++++++++++++++++++++ inst/raw/WT2.htseq.out | 8222 ++++++++++++++++++++++++ inst/report_DESeq2.rmd | 324 + inst/report_edgeR.rmd | 286 + inst/target.txt | 1 + inst/template_script_DESeq2.r | 85 + inst/template_script_edgeR.r | 76 + man/BCVPlot.Rd | 20 + man/MAPlot.Rd | 22 + man/MDSPlot.Rd | 29 + man/PCAPlot.Rd | 27 + man/SARTools-package.Rd | 12 + man/SERE.Rd | 23 + man/barplotNull.Rd | 25 + man/barplotTotal.Rd | 25 + man/checkParameters.DESeq2.Rd | 52 + man/checkParameters.edgeR.Rd | 46 + man/clusterPlot.Rd | 22 + man/countsBoxplots.Rd | 25 + man/densityPlot.Rd | 25 + man/descriptionPlots.Rd | 25 + man/diagSizeFactorsPlots.Rd | 20 + man/dispersionsPlot.Rd | 20 + man/exploreCounts.Rd | 29 + man/exportResults.DESeq2.Rd | 27 + man/exportResults.edgeR.Rd | 29 + man/loadCountData.Rd | 33 + man/loadTargetFile.Rd | 29 + man/majSequences.Rd | 27 + man/nDiffTotal.Rd | 22 + man/pairwiseScatterPlots.Rd | 22 + man/rawpHist.Rd | 20 + man/removeNull.Rd | 20 + man/run.DESeq2.Rd | 42 + man/run.edgeR.Rd | 35 + man/summarizeResults.DESeq2.Rd | 33 + man/summarizeResults.edgeR.Rd | 29 + man/tabIndepFiltering.Rd | 20 + man/tabSERE.Rd | 20 + man/writeReport.DESeq2.Rd | 65 + man/writeReport.edgeR.Rd | 56 + template_script_DESeq2.r | 85 + template_script_edgeR.r | 76 + vignettes/Thumbs.db | Bin 0 -> 16896 bytes vignettes/batchPCA.png | Bin 0 -> 6227 bytes vignettes/batchcluster.png | Bin 0 -> 4216 bytes vignettes/diagSF.png | Bin 0 -> 25113 bytes vignettes/inversionPCA.png | Bin 0 -> 4882 bytes vignettes/inversioncluster.png | Bin 0 -> 3692 bytes vignettes/inversionpairwiseScatter.png | Bin 0 -> 22437 bytes vignettes/inversionrawpHist.png | Bin 0 -> 4044 bytes vignettes/library.bib | 109 + vignettes/outlierPCA.png | Bin 0 -> 5350 bytes vignettes/outlierbarplotNull.png | Bin 0 -> 3471 bytes vignettes/outlierbarplotTotal.png | Bin 0 -> 3265 bytes vignettes/tutorial.Rnw | 302 + 100 files changed, 36558 insertions(+), 1 deletion(-) create mode 100644 DESCRIPTION create mode 100644 NAMESPACE create mode 100644 NEWS create mode 100644 R/BCVPlot.R create mode 100644 R/MAPlot.R create mode 100644 R/MDSPlot.R create mode 100644 R/NAMESPACE.R create mode 100644 R/PCAPlot.R create mode 100644 R/SARTools-package.r create mode 100644 R/SERE.r create mode 100644 R/barplotNull.R create mode 100644 R/barplotTotal.R create mode 100644 R/checkParameters.DESeq2.r create mode 100644 R/checkParameters.edgeR.r create mode 100644 R/clusterPlot.R create mode 100644 R/countsBoxplots.R create mode 100644 R/densityPlot.R create mode 100644 R/descriptionPlots.r create mode 100644 R/diagSizeFactorsPlots.R create mode 100644 R/dispersionsPlot.R create mode 100644 R/exploreCounts.R create mode 100644 R/exportResults.DESeq2.R create mode 100644 R/exportResults.edgeR.R create mode 100644 R/loadCountData.R create mode 100644 R/loadTargetFile.R create mode 100644 R/majSequences.R create mode 100644 R/nDiffTotal.r create mode 100644 R/pairwiseScatterPlots.R create mode 100644 R/rawpHist.R create mode 100644 R/removeNull.R create mode 100644 R/run.DESeq2.r create mode 100644 R/run.edgeR.r create mode 100644 R/summarizeResults.DESeq2.r create mode 100644 R/summarizeResults.edgeR.r create mode 100644 R/tabIndepFiltering.R create mode 100644 R/tabSERE.R create mode 100644 R/welcome.R create mode 100644 R/writeReport.DESeq2.r create mode 100644 R/writeReport.edgeR.r create mode 100644 inst/CITATION create mode 100644 inst/Thumbs.db create mode 100644 inst/raw/KO1.htseq.out create mode 100644 inst/raw/KO2.htseq.out create mode 100644 inst/raw/WT1.htseq.out create mode 100644 inst/raw/WT2.htseq.out create mode 100644 inst/report_DESeq2.rmd create mode 100644 inst/report_edgeR.rmd create mode 100644 inst/target.txt create mode 100644 inst/template_script_DESeq2.r create mode 100644 inst/template_script_edgeR.r create mode 100644 man/BCVPlot.Rd create mode 100644 man/MAPlot.Rd create mode 100644 man/MDSPlot.Rd create mode 100644 man/PCAPlot.Rd create mode 100644 man/SARTools-package.Rd create mode 100644 man/SERE.Rd create mode 100644 man/barplotNull.Rd create mode 100644 man/barplotTotal.Rd create mode 100644 man/checkParameters.DESeq2.Rd create mode 100644 man/checkParameters.edgeR.Rd create mode 100644 man/clusterPlot.Rd create mode 100644 man/countsBoxplots.Rd create mode 100644 man/densityPlot.Rd create mode 100644 man/descriptionPlots.Rd create mode 100644 man/diagSizeFactorsPlots.Rd create mode 100644 man/dispersionsPlot.Rd create mode 100644 man/exploreCounts.Rd create mode 100644 man/exportResults.DESeq2.Rd create mode 100644 man/exportResults.edgeR.Rd create mode 100644 man/loadCountData.Rd create mode 100644 man/loadTargetFile.Rd create mode 100644 man/majSequences.Rd create mode 100644 man/nDiffTotal.Rd create mode 100644 man/pairwiseScatterPlots.Rd create mode 100644 man/rawpHist.Rd create mode 100644 man/removeNull.Rd create mode 100644 man/run.DESeq2.Rd create mode 100644 man/run.edgeR.Rd create mode 100644 man/summarizeResults.DESeq2.Rd create mode 100644 man/summarizeResults.edgeR.Rd create mode 100644 man/tabIndepFiltering.Rd create mode 100644 man/tabSERE.Rd create mode 100644 man/writeReport.DESeq2.Rd create mode 100644 man/writeReport.edgeR.Rd create mode 100644 template_script_DESeq2.r create mode 100644 template_script_edgeR.r create mode 100644 vignettes/Thumbs.db create mode 100644 vignettes/batchPCA.png create mode 100644 vignettes/batchcluster.png create mode 100644 vignettes/diagSF.png create mode 100644 vignettes/inversionPCA.png create mode 100644 vignettes/inversioncluster.png create mode 100644 vignettes/inversionpairwiseScatter.png create mode 100644 vignettes/inversionrawpHist.png create mode 100644 vignettes/library.bib create mode 100644 vignettes/outlierPCA.png create mode 100644 vignettes/outlierbarplotNull.png create mode 100644 vignettes/outlierbarplotTotal.png create mode 100644 vignettes/tutorial.Rnw diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..b8db77f --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,14 @@ +Package: SARTools +Type: Package +Title: Statistical Analysis of RNA-Seq Tools +Version: 1.0.0 +Date: 2014-12-10 +Author: Marie-Agnes Dillies and Hugo Varet +Maintainer: Hugo Varet +Depends: R (>= 3.1.0), DESeq2 (>= 1.6.0), edgeR (>= 3.8.0), xtable +Suggests: genefilter (>= 1.44.0), BiocStyle +Imports: knitr, GenomicRanges, S4Vectors, limma +VignetteBuilder: knitr +Encoding: latin1 +Description: SARTools provides R tools and an environment for the statistical analysis of RNA-Seq projects: load and clean data, produce figures, perform statistical analysis/testing with DESeq2 or edgeR, export results and create final report +License: GPL-2 \ No newline at end of file diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..069af45 --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,11 @@ +# Generated by roxygen2 (4.0.2): do not edit by hand + +exportPattern("^[a-zA-Z]") +import(DESeq2) +import(edgeR) +import(xtable) +importFrom(GenomicRanges,assay) +importFrom(GenomicRanges,colData) +importFrom(S4Vectors,mcols) +importFrom(knitr,knit2html) +importFrom(limma,plotMDS) diff --git a/NEWS b/NEWS new file mode 100644 index 0000000..0012887 --- /dev/null +++ b/NEWS @@ -0,0 +1,18 @@ +CHANGES IN VERSION 1.0.0 +------------------------ + o First version publicly available on GitHub + +CHANGES IN VERSION 0.99.6 +------------------------- + o Added batch argument to loadTargetFile() + o Simplified the R script templates with wrappers + +CHANGES IN VERSION 0.99.5 +------------------------- + o Now check the format/validity of the parameters at the beginning of the script + o Added functions checkParameters.DESeq2() and checkParameters.edgeR() + o Added small precisions in the vignette + +CHANGES IN VERSION 0.99.4 +------------------------- + o First version of the package diff --git a/R/BCVPlot.R b/R/BCVPlot.R new file mode 100644 index 0000000..7dfa836 --- /dev/null +++ b/R/BCVPlot.R @@ -0,0 +1,13 @@ +#' BCV plot (for edgeR dispersions) +#' +#' Biological Coefficient of Variation plot (for edgeR objects) +#' +#' @param dge a \code{DGEList} object +#' @return A file named BCV.png in the figures directory with a BCV plot produced by the \code{plotBCV()} function of the edgeR package +#' @author Marie-Agnes Dillies and Hugo Varet + +BCVPlot <- function(dge){ + png(filename="figures/BCV.png", width=400, height=400) + plotBCV(dge, las = 1, main = "BCV plot") + dev.off() +} diff --git a/R/MAPlot.R b/R/MAPlot.R new file mode 100644 index 0000000..c83bfc1 --- /dev/null +++ b/R/MAPlot.R @@ -0,0 +1,33 @@ +#' MA-plots +#' +#' MA-plot for each comparison: log2(FC) vs mean of normalized counts with one dot per feature (red dot for a differentially expressed feature, black dot otherwise) +#' +#' @param complete A \code{list} of \code{data.frame} containing features results (from \code{exportResults.DESeq2()} or \code{exportResults.edgeR()}) +#' @param alpha cut-off to apply on each adjusted p-value +#' @return A file named MAPlot.png in the figures directory containing one MA-plot per comparison +#' @author Marie-Agnes Dillies and Hugo Varet + +MAPlot <- function(complete, alpha=0.05){ + nrow <- ceiling(sqrt(length(complete))) + ncol <- ceiling(length(complete)/nrow) + png(filename="figures/MAPlot.png", width=400*max(ncol,nrow), height=400*min(ncol,nrow)) + par(mfrow=sort(c(nrow,ncol))) + for (name in names(complete)){ + complete.name <- complete[[name]] + complete.name <- complete.name[complete.name$baseMean>0,] + complete.name$padj <- ifelse(is.na(complete.name$padj),1,complete.name$padj) + log2FC <- complete.name$log2FoldChange + ylim = 1.1 * c(-1,1) * quantile(abs(log2FC[is.finite(log2FC)]), probs=0.99) + plot(complete.name$baseMean, pmax(ylim[1], pmin(ylim[2], log2FC)), + log = "x", cex=0.45, las = 1, ylim = ylim, + col = ifelse(complete.name[,"padj"] < alpha, "red", "black"), + pch = ifelse(log2FCylim[2], 2, 20)), + xlab = "Mean of normalized counts", + ylab = expression(log[2]~fold~change), + main = paste0("MA-plot - ",gsub("_"," ",name))) + abline(h=0, lwd=1, col="lightgray") + } + dev.off() +} + + diff --git a/R/MDSPlot.R b/R/MDSPlot.R new file mode 100644 index 0000000..b176cc4 --- /dev/null +++ b/R/MDSPlot.R @@ -0,0 +1,25 @@ +#' MDS plot (for edgeR objects) +#' +#' Multi-Dimensional Scaling plot of samples based on the 500 most variant features (for edgeR analyses) +#' +#' @param dge a \code{DGEList} object +#' @param group vector of the condition from which each sample belongs +#' @param n number of features to keep among the most variant +#' @param gene.selection \code{"pairwise"} to choose the top features separately for each pairwise comparison between the samples or \code{"common"} to select the same features for all comparisons. Only used when \code{method="logFC"} +#' @param col colors to use (one per biological condition) +#' @return A file named MDS.png in the figures directory +#' @author Marie-Agnes Dillies and Hugo Varet + +MDSPlot <- function(dge, group, n=500, gene.selection=c("pairwise", "common"), + col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + png(filename="figures/MDS.png", width=400, height=400) + coord <- plotMDS(dge, top=n, method="logFC", gene.selection=gene.selection[1]) + abs=range(coord$x); abs=abs(abs[2]-abs[1])/25; + ord=range(coord$y); ord=abs(ord[2]-ord[1])/25; + plot(coord$x,coord$y, col=col[as.integer(group)], las=1, main="Multi-Dimensional Scaling plot", + xlab="Leading logFC dimension 1", ylab="Leading logFC dimension 2", cex=2, pch=16) + abline(h=0,v=0,lty=2,col="lightgray") + text(coord$x - ifelse(coord$x>0,abs,-abs), coord$y - ifelse(coord$y>0,ord,-ord), + colnames(dge$counts), col=col[as.integer(group)]) + dev.off() +} diff --git a/R/NAMESPACE.R b/R/NAMESPACE.R new file mode 100644 index 0000000..1650f0d --- /dev/null +++ b/R/NAMESPACE.R @@ -0,0 +1,9 @@ +#' @exportPattern ^[a-zA-Z] +#' @import DESeq2 +#' @import edgeR +#' @import xtable +#' @importFrom knitr knit2html +#' @importFrom GenomicRanges colData assay +#' @importFrom S4Vectors mcols +#' @importFrom limma plotMDS +NULL diff --git a/R/PCAPlot.R b/R/PCAPlot.R new file mode 100644 index 0000000..1af71f7 --- /dev/null +++ b/R/PCAPlot.R @@ -0,0 +1,49 @@ +#' PCA of samples (if use of DESeq2) +#' +#' Principal Component Analysis of samples based on the 500 most variant features on VST- or rlog-counts (if use of DESeq2) +#' +#' @param counts.trans a matrix a transformed counts (VST- or rlog-counts) +#' @param group factor vector of the condition from which each sample belongs +#' @param n number of features to keep among the most variant +#' @param col colors to use (one per biological condition) +#' @return A file named PCA.png in the figures directory with a pairwise plot of the three first principal components +#' @author Marie-Agnes Dillies and Hugo Varet + +PCAPlot <- function(counts.trans, group, n=500, col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + # PCA on the 500 most variables features + rv = apply(counts.trans, 1, var, na.rm=TRUE) + select = order(rv, decreasing = TRUE)[1:n] + pca = prcomp(t(counts.trans[select, ])) + prp <- pca$sdev^2 * 100 / sum(pca$sdev^2) + prp <- round(prp[1:3],2) + + # create figure + png(filename="figures/PCA.png",width=400*2,height=400) + par(mfrow=c(1,2)) + # axes 1 et 2 + abs=range(pca$x[,1]); abs=abs(abs[2]-abs[1])/25; + ord=range(pca$x[,2]); ord=abs(ord[2]-ord[1])/25; + plot(pca$x[,1], pca$x[,2], + las = 1, cex = 2, pch = 16, col = col[as.integer(group)], + xlab = paste0("PC1 (",prp[1],"%)"), + ylab = paste0("PC2 (",prp[2],"%)"), + main = "Principal Component Analysis - Axes 1 and 2") + abline(h=0,v=0,lty=2,col="lightgray") + text(pca$x[,1] - ifelse(pca$x[,1]>0,abs,-abs), pca$x[,2] - ifelse(pca$x[,2]>0,ord,-ord), + colnames(counts.trans), col=col[as.integer(group)]) + + # axes 1 et 3 + abs=range(pca$x[,1]); abs=abs(abs[2]-abs[1])/25; + ord=range(pca$x[,3]); ord=abs(ord[2]-ord[1])/25; + plot(pca$x[,1], pca$x[,3], + las = 1, cex = 2, pch = 16, col = col[as.integer(group)], + xlab = paste0("PC1 (",prp[1],"%)"), + ylab = paste0("PC3 (",prp[3],"%)"), + main = "Principal Component Analysis - Axes 1 and 3") + abline(h=0,v=0,lty=2,col="lightgray") + text(pca$x[,1] - ifelse(pca$x[,1]>0,abs,-abs), pca$x[,3] - ifelse(pca$x[,3]>0,ord,-ord), + colnames(counts.trans), col=col[as.integer(group)]) + dev.off() + + return(invisible(pca$x)) +} diff --git a/R/SARTools-package.r b/R/SARTools-package.r new file mode 100644 index 0000000..9d4d9b1 --- /dev/null +++ b/R/SARTools-package.r @@ -0,0 +1,6 @@ +#' SARTools provides R tools and an environment for the statistical analysis of RNA-Seq projects: load and clean data, produce figures, perform statistical analysis/testing with DESeq2 or edgeR, export results and create final report +#' @title Statistical Analysis of RNA-Seq Tools +#' @author Marie-Agnes Dillies and Hugo Varet +#' @docType package +#' @name SARTools-package +NULL diff --git a/R/SERE.r b/R/SERE.r new file mode 100644 index 0000000..2ad0211 --- /dev/null +++ b/R/SERE.r @@ -0,0 +1,28 @@ +#' Pairwise SERE for two samples +#' +#' Compute the SERE coefficient for two samples +#' +#' @param observed \code{matrix} with two columns containing observed counts of two samples +#' @return The SERE coefficient for the two samples +#' @references Schulze, Kanwar, Golzenleuchter et al, SERE: Single-parameter quality control and sample comparison for RNA-Seq, BMC Genomics, 2012 +#' @author See paper published + +SERE <- function(observed){ + #calculate lambda and expected values + laneTotals <- colSums(observed) + total <- sum(laneTotals) + fullObserved <- observed[rowSums(observed)>0,] + fullLambda <- rowSums(fullObserved)/total + fullLhat <- fullLambda > 0 + fullExpected<- outer(fullLambda, laneTotals) + + #keep values + fullKeep <- which(fullExpected > 0) + + #calculate degrees of freedom (nrow*(ncol -1) >> number of parameters - calculated (just lamda is calculated >> thats why minus 1) + #calculate pearson and deviance for all values + oeFull <- (fullObserved[fullKeep] - fullExpected[fullKeep])^2/ fullExpected[fullKeep] # pearson chisq test + dfFull <- length(fullKeep) - sum(fullLhat!=0) + + sqrt(sum(oeFull)/dfFull) +} \ No newline at end of file diff --git a/R/barplotNull.R b/R/barplotNull.R new file mode 100644 index 0000000..dbe3ad8 --- /dev/null +++ b/R/barplotNull.R @@ -0,0 +1,23 @@ +#' Percentage of null counts per sample +#' +#' Bar plot of the percentage of null counts per sample +#' +#' @param counts \code{matrix} of counts +#' @param group factor vector of the condition from which each sample belongs +#' @param col colors of the bars (one color per biological condition) +#' @return A file named barplotNull.png in the figures directory +#' @author Marie-Agnes Dillies and Hugo Varet + +barplotNull <- function(counts, group, col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + png(filename="figures/barplotNull.png",width=400,height=400) + percentage <- apply(counts, 2, function(x){sum(x == 0)})*100/nrow(counts) + percentage.allNull <- (nrow(counts) - nrow(removeNull(counts)))*100/nrow(counts) + barplot(percentage, las = 2, + col = col[as.integer(group)], + ylab = "Proportion of null counts", + main = "Proportion of null counts per sample", + ylim = c(0,1.2*max(percentage))) + abline(h = percentage.allNull, lty = 2, lwd = 2) + legend("topright", levels(group), fill=col[1:nlevels(group)], bty="n") + dev.off() +} diff --git a/R/barplotTotal.R b/R/barplotTotal.R new file mode 100644 index 0000000..40a20c5 --- /dev/null +++ b/R/barplotTotal.R @@ -0,0 +1,21 @@ +#' Total number of reads per sample +#' +#' Bar plot of the total number of reads per sample +#' +#' @param counts \code{matrix} of counts +#' @param group factor vector of the condition from which each sample belongs +#' @param col colors of the bars (one color per biological condition) +#' @return A file named barplotTotal.png in the figures directory +#' @author Marie-Agnes Dillies and Hugo Varet + +barplotTotal <- function(counts, group, col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + png(filename="figures/barplotTotal.png",width=400,height=400) + barplot(colSums(counts), + main = "Total read count per sample", + ylab = "Total read count", + ylim = c(0, max(colSums(counts))*1.2), + col = col[as.integer(group)], + las = 2) + legend("topright", levels(group), fill=col[1:nlevels(group)], bty="n") + dev.off() +} diff --git a/R/checkParameters.DESeq2.r b/R/checkParameters.DESeq2.r new file mode 100644 index 0000000..0a1c8c3 --- /dev/null +++ b/R/checkParameters.DESeq2.r @@ -0,0 +1,108 @@ +#' Check the parameters (when using DESeq2) +#' +#' Check the format and the validity of the parameters which will be used for the analysis with DESeq2. +# For example, it is important that $alpha$ be a numeric of length 1 between 0 and 1. This function avoid +# potential stupid bugs when running the suite of the script. +#' +#' @param projectName name of the project +#' @param author author of the statistical analysis/report +#' @param targetFile path to the design/target file +#' @param rawDir path to the directory containing raw counts files +#' @param featuresToRemove names of the features to be removed +#' @param varInt factor of interest +#' @param condRef reference biological condition +#' @param batch blocking factor in the design +#' @param fitType mean-variance relationship: "parametric" (default) or "local" +#' @param cooksCutoff outliers detection threshold (NULL to let DESeq2 choosing it) +#' @param independentFiltering TRUE/FALSE to perform independent filtering +#' @param alpha threshold of statistical significance +#' @param pAdjustMethod p-value adjustment method: "BH" (default) or "BY" for example +#' @param typeTrans transformation for PCA/clustering: "VST" ou "rlog" +#' @param locfunc "median" (default) or "shorth" to estimate the size factors +#' @param colors vector of colors of each biological condition on the plots +#' @return A boolean indicating if there is a problem in the parameters +#' @author Hugo Varet + +checkParameters.DESeq2 <- function(projectName,author,targetFile,rawDir, + featuresToRemove,varInt,condRef,batch,fitType, + cooksCutoff,independentFiltering,alpha,pAdjustMethod, + typeTrans,locfunc,colors){ + problem <- FALSE + if (!is.character(projectName) | length(projectName)!=1){ + print("projectName must be a character vector of length 1") + problem <- TRUE + } + if (!is.character(author) | length(author)!=1){ + print("author must be a character vector of length 1") + problem <- TRUE + } + if (!is.character(targetFile) | length(targetFile)!=1 || !file.exists(targetFile)){ + print("targetFile must be a character vector of length 1 specifying an accessible file") + problem <- TRUE + } + if (!is.character(rawDir) | length(rawDir)!=1 || is.na(file.info(rawDir)[1,"isdir"]) | !file.info(rawDir)[1,"isdir"]){ + print("rawDir must be a character vector of length 1 specifying an accessible directory") + problem <- TRUE + } + if (!is.character(featuresToRemove)){ + print("featuresToRemove must be a character vector") + problem <- TRUE + } + if (!is.character(varInt) | length(varInt)!=1){ + print("varInt must be a character vector of length 1") + problem <- TRUE + } + if (!is.character(condRef) | length(condRef)!=1){ + print("condRef must be a character vector of length 1") + problem <- TRUE + } + if (!is.null(batch) && I(!is.character(batch) | length(batch)!=1)){ + print("batch must be NULL or a character vector of length 1") + problem <- TRUE + } + if (!is.character(fitType) | length(fitType)!=1 || !I(fitType %in% c("parametric","local"))){ + print("fitType must be equal to 'parametric' or 'local'") + problem <- TRUE + } + if (!is.null(cooksCutoff) && I(!is.numeric(cooksCutoff) | length(cooksCutoff)!=1 || cooksCutoff<=0)){ + print("cooksCutoff must be NULL or a numeric vector of length 1 with a positive value") + problem <- TRUE + } + if (!is.logical(independentFiltering) | length(independentFiltering)!=1){ + print("independentFiltering must be a boolean vector of length 1") + problem <- TRUE + } + if (!is.numeric(alpha) | length(alpha)!=1 || I(alpha<=0 | alpha>=1)){ + print("alpha must be a numeric vector of length 1 with a value between 0 and 1") + problem <- TRUE + } + if (!is.character(pAdjustMethod) | length(pAdjustMethod)!=1 || !I(pAdjustMethod %in% p.adjust.methods)){ + print(paste("pAdjustMethod must be a value in", paste(p.adjust.methods, collapse=", "))) + problem <- TRUE + } + if (!is.character(typeTrans) | length(typeTrans)!=1 || !I(typeTrans %in% c("VST","rlog"))){ + print("typeTrans must be equal to 'VST' or 'rlog'") + problem <- TRUE + } + if (!is.character(locfunc) | length(locfunc)!=1 || !I(locfunc %in% c("median","shorth"))){ + print("locfunc must be equal to 'median' or 'shorth'") + problem <- TRUE + } else{ + if (locfunc=="shorth" & !I("genefilter" %in% installed.packages()[,"Package"])){ + print("Package genefilter is needed if using locfunc='shorth'") + problem <- TRUE + } + } + areColors <- function(col){ + sapply(col, function(X){tryCatch(is.matrix(col2rgb(X)), error=function(e){FALSE})}) + } + if (!is.vector(colors) || !all(areColors(colors))){ + print("colors must be a vector of colors") + problem <- TRUE + } + + if (!problem){ + print("All the parameters are correct") + } + return(invisible(problem)) +} diff --git a/R/checkParameters.edgeR.r b/R/checkParameters.edgeR.r new file mode 100644 index 0000000..68b5fd7 --- /dev/null +++ b/R/checkParameters.edgeR.r @@ -0,0 +1,87 @@ +#' Check the parameters (when using edgeR) +#' +#' Check the format and the validity of the parameters which will be used for the analysis with edgeR. +# For example, it is important that $alpha$ be a numeric of length 1 between 0 and 1. This function avoid +# potential stupid bugs when running the suite of the script. +#' +#' @param projectName name of the project +#' @param author author of the statistical analysis/report +#' @param targetFile path to the design/target file +#' @param rawDir path to the directory containing raw counts files +#' @param featuresToRemove names of the features to be removed +#' @param varInt factor of interest +#' @param condRef reference biological condition +#' @param batch blocking factor in the design +#' @param alpha threshold of statistical significance +#' @param pAdjustMethod p-value adjustment method: "BH" (default) or "BY" for example +#' @param cpmCutoff counts-per-million cut-off to filter low counts +#' @param gene.selection selection of the features in MDSPlot +#' @param colors vector of colors of each biological condition on the plots +#' @return A boolean indicating if there is a problem in the parameters +#' @author Hugo Varet + +checkParameters.edgeR <- function(projectName,author,targetFile,rawDir, + featuresToRemove,varInt,condRef,batch,alpha, + pAdjustMethod,cpmCutoff,gene.selection,colors){ + problem <- FALSE + if (!is.character(projectName) | length(projectName)!=1){ + print("projectName must be a character vector of length 1") + problem <- TRUE + } + if (!is.character(author) | length(author)!=1){ + print("author must be a character vector of length 1") + problem <- TRUE + } + if (!is.character(targetFile) | length(targetFile)!=1 || !file.exists(targetFile)){ + print("targetFile must be a character vector of length 1 specifying an accessible file") + problem <- TRUE + } + if (!is.character(rawDir) | length(rawDir)!=1 || is.na(file.info(rawDir)[1,"isdir"]) | !file.info(rawDir)[1,"isdir"]){ + print("rawDir must be a character vector of length 1 specifying an accessible directory") + problem <- TRUE + } + if (!is.character(featuresToRemove)){ + print("featuresToRemove must be a character vector") + problem <- TRUE + } + if (!is.character(varInt) | length(varInt)!=1){ + print("varInt must be a character vector of length 1") + problem <- TRUE + } + if (!is.character(condRef) | length(condRef)!=1){ + print("condRef must be a character vector of length 1") + problem <- TRUE + } + if (!is.null(batch) && I(!is.character(batch) | length(batch)!=1)){ + print("batch must be NULL or a character vector of length 1") + problem <- TRUE + } + if (!is.numeric(alpha) | length(alpha)!=1 || I(alpha<=0 | alpha>=1)){ + print("alpha must be a numeric vector of length 1 with a value between 0 and 1") + problem <- TRUE + } + if (!is.character(pAdjustMethod) | length(pAdjustMethod)!=1 || !I(pAdjustMethod %in% p.adjust.methods)){ + print(paste("pAdjustMethod must be a value in", paste(p.adjust.methods, collapse=", "))) + problem <- TRUE + } + if (!is.numeric(cpmCutoff) | length(cpmCutoff)!=1 || cpmCutoff<=0){ + print("cpmCutoff must be a numeric vector of length 1 with a value equal to or greater than 0") + problem <- TRUE + } + if (!is.character(gene.selection) | length(gene.selection)!=1 || !I(gene.selection %in% c("pairwise","common"))){ + print("gene.selection must be equal to 'pairwise' or 'common'") + problem <- TRUE + } + areColors <- function(col){ + sapply(col, function(X){tryCatch(is.matrix(col2rgb(X)), error=function(e){FALSE})}) + } + if (!is.vector(colors) || !all(areColors(colors))){ + print("colors must be a vector of colors") + problem <- TRUE + } + + if (!problem){ + print("All the parameters are correct") + } + return(invisible(problem)) +} diff --git a/R/clusterPlot.R b/R/clusterPlot.R new file mode 100644 index 0000000..cfb35a9 --- /dev/null +++ b/R/clusterPlot.R @@ -0,0 +1,15 @@ +#' Clustering of the samples +#' +#' Clustering of the samples based on VST- or rlog-counts (if use of DESeq2) or cpm-counts (if use of edgeR) +#' +#' @param counts.trans a matrix a transformed counts (VST- or rlog-counts if use of DESeq2 or cpm-counts if use of edgeR) +#' @param group factor vector of the condition from which each sample belongs +#' @return A file named cluster.png in the figures directory with the dendrogram of the clustering +#' @author Marie-Agnes Dillies and Hugo Varet + +clusterPlot <- function(counts.trans, group){ + hc <- hclust(dist(t(counts.trans)), method="ward.D") + png(filename="figures/cluster.png",width=400,height=400) + plot(hc, hang=-1, ylab="Height", las=2, xlab="Method: Euclidean distance - Ward criterion", main="Cluster dendrogram") + dev.off() +} diff --git a/R/countsBoxplots.R b/R/countsBoxplots.R new file mode 100644 index 0000000..32567ef --- /dev/null +++ b/R/countsBoxplots.R @@ -0,0 +1,37 @@ +#' Box-plots of (normalized) counts distribution per sample +#' +#' Box-plots of raw and normalized counts distributions per sample to assess the effect of the normalization +#' +#' @param object a \code{DESeqDataSet} object from DESeq2 or a \code{DGEList} object from edgeR +#' @param group factor vector of the condition from which each sample belongs +#' @param col colors of the boxplots (one per biological condition) +#' @return A file named countsBoxplots.png in the figures directory containing boxplots of the raw and normalized counts +#' @author Marie-Agnes Dillies and Hugo Varet + +countsBoxplots <- function(object, group, col = c("lightblue","orange","MediumVioletRed","SpringGreen")){ + if (class(object)=="DESeqDataSet"){ + counts <- counts(object) + counts <- removeNull(counts) + norm.counts <- counts(object, normalized=TRUE) + norm.counts <- removeNull(norm.counts) + } else{ + counts <- object$counts + counts <- removeNull(counts) + tmm <- object$samples$norm.factors + N <- colSums(object$counts) + f <- tmm * N/mean(tmm * N) + norm.counts <- scale(object$counts, center=FALSE, scale=f) + norm.counts <- removeNull(norm.counts) + } + png(filename="figures/countsBoxplots.png",width=2*400,height=400) + par(mfrow=c(1,2)) + # raw counts + boxplot(log2(counts+1), col = col[as.integer(group)], las = 2, + main = "Raw counts distribution", ylab = expression(log[2] ~ (raw ~ count + 1))) + legend("topright", levels(group), fill=col[1:nlevels(group)], bty="n") + # norm counts + boxplot(log2(norm.counts+1), col = col[as.integer(group)], las = 2, + main = "Normalized counts distribution", ylab = expression(log[2] ~ (norm ~ count + 1))) + legend("topright", levels(group), fill=col[1:nlevels(group)], bty="n") + dev.off() +} diff --git a/R/densityPlot.R b/R/densityPlot.R new file mode 100644 index 0000000..010fd23 --- /dev/null +++ b/R/densityPlot.R @@ -0,0 +1,24 @@ +#' Density plot of all samples +#' +#' Estimation the counts density for each sample +#' +#' @param counts \code{matrix} of counts +#' @param group factor vector of the condition from which each sample belongs +#' @param col colors of the curves (one per biological condition) +#' @return A file named densplot.png in the figures directory +#' @author Marie-Agnes Dillies and Hugo Varet + +densityPlot <- function(counts, group, col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + png(filename="figures/densplot.png",width=400,height=400) + counts <- removeNull(counts) + plot(density(log2(counts[,1]+1)), las = 1, lwd = 2, + main = "Density of counts distribution", + xlab = expression(log[2] ~ (raw ~ count + 1)), + ylim = c(0,max(apply(counts,2,function(x){max(density(log2(x+1))$y)}))*1.05), + col = col[as.integer(group)[1]]) + for (i in 2:ncol(counts)){ + lines(density(log2(counts[,i]+1)),col=col[as.integer(group)[i]],lwd=2) + } + legend("topright", levels(group), lty=1, col=col[1:nlevels(group)], lwd=2, bty="n") + dev.off() +} diff --git a/R/descriptionPlots.r b/R/descriptionPlots.r new file mode 100644 index 0000000..348a429 --- /dev/null +++ b/R/descriptionPlots.r @@ -0,0 +1,33 @@ +#' Description plots of the counts +#' +#' Description plots of the counts according to the biological condition +#' +#' @param counts \code{matrix} of counts +#' @param group factor vector of the condition from which each sample belongs +#' @param col colors for the plots (one per biological condition) +#' @return PNG files in the "figures" directory and the matrix of the most expressed sequences +#' @author Hugo Varet + +descriptionPlots <- function(counts, group, col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + # create the figures directory if does not exist + if (!I("figures" %in% dir())) dir.create("figures", showWarnings=FALSE) + + # total number of reads per sample + barplotTotal(counts=counts, group=group, col=col) + + # percentage of null counts per sample + barplotNull(counts=counts, group=group, col=col) + + # distribution of counts per sample + densityPlot(counts=counts, group=group, col=col) + + # features which catch the most important number of reads + majSequences <- majSequences(counts=counts, group=group, col=col) + + # SERE and pairwise scatter plots + cat("Matrix of SERE statistics:\n") + print(tabSERE(counts)) + pairwiseScatterPlots(counts=counts, group=group) + + return(majSequences) +} diff --git a/R/diagSizeFactorsPlots.R b/R/diagSizeFactorsPlots.R new file mode 100644 index 0000000..e3f2c69 --- /dev/null +++ b/R/diagSizeFactorsPlots.R @@ -0,0 +1,34 @@ +#' Assess the estimations of the size factors +#' +#' Plots to assess the estimations of the size factors +#' +#' @param dds a \code{DESeqDataSet} object +#' @return Two files in the figures directory: diagSizeFactorsHist.png containing one histogram per sample and diagSizeFactorsTC.png for a plot of the size factors vs the total number of reads +#' @author Marie-Agnes Dillies and Hugo Varet + +diagSizeFactorsPlots <- function(dds){ + # histograms + nrow <- ceiling(sqrt(ncol(counts(dds)))) + ncol <- ceiling(ncol(counts(dds))/nrow) + png(filename="figures/diagSizeFactorsHist.png", width=300*max(ncol,nrow), height=300*min(ncol,nrow)) + par(mfrow=sort(c(nrow,ncol))) + geomeans <- exp(rowMeans(log(counts(dds)))) + samples <- colnames(counts(dds)) + counts.trans <- log2(counts(dds)/geomeans) + xmin <- min(counts.trans[is.finite(counts.trans)],na.rm=TRUE) + xmax <- max(counts.trans[is.finite(counts.trans)],na.rm=TRUE) + for (j in 1:ncol(dds)){ + hist(log2(counts(dds)[,j]/geomeans), nclass=100, xlab="counts/geometric mean", las=1, xlim=c(xmin,xmax), + main=paste0("Size factors diagnostic - Sample ",samples[j]),col="skyblue") + abline(v = log2(sizeFactors(dds)[j]), col="red", lwd=1.5) + } + dev.off() + # total read counts vs size factors + png(filename="figures/diagSizeFactorsTC.png",width=400,height=400) + plot(sizeFactors(dds), colSums(counts(dds)), pch=19, las=1, xlab="Size factors", + ylab="Total number of reads",main="Diagnostic: size factors vs total number of reads") + abline(lm(colSums(counts(dds)) ~ sizeFactors(dds) + 0), lty=2, col="grey") + dev.off() +} + + diff --git a/R/dispersionsPlot.R b/R/dispersionsPlot.R new file mode 100644 index 0000000..e88b86c --- /dev/null +++ b/R/dispersionsPlot.R @@ -0,0 +1,26 @@ +#' Plots about DESeq2 dispersions +#' +#' A plot of the mean-dispersion relationship and a diagnostic of log normality of the dispersions (if use of DESeq2) +#' +#' @param dds a \code{DESeqDataSet} object +#' @return A file named dispersionsPlot.png in the figures directory containing the plot of the mean-dispersion relationship and a diagnostic of log normality of the dispersions +#' @author Marie-Agnes Dillies and Hugo Varet + +dispersionsPlot <- function(dds){ + disp <- mcols(dds)$dispGeneEst + disp <- disp[!is.na(disp)] + disp <- disp[disp>1e-8] + disp <- log(disp) + mean.disp <- mean(disp,na.rm=TRUE) + sd.disp <- sd(disp,na.rm=TRUE) + png(filename="figures/dispersionsPlot.png",width=800,height=400) + par(mfrow=c(1,2)) + # dispersions plot + plotDispEsts(dds, main="Dispersions", las=1, xlab="Mean of normalized counts", ylab="Dispersion") + # diagnostic of log normality + hist(disp, freq=FALSE, nclass=50, xlab="Feature dispersion estimate", las=1, + main = "log-normality dispersion diagnostic",col="skyblue") + fun <- function(x){dnorm(x,mean=mean.disp,sd=sd.disp)} + curve(fun,min(disp,na.rm=TRUE),max(disp,na.rm=TRUE),lwd=2,n=101,add=TRUE) + dev.off() +} diff --git a/R/exploreCounts.R b/R/exploreCounts.R new file mode 100644 index 0000000..5e847fd --- /dev/null +++ b/R/exploreCounts.R @@ -0,0 +1,26 @@ +#' Explore counts structure +#' +#' Explore counts structure: PCA (DESeq2) or MDS (edgeR) and clustering +#' +#' @param object a \code{DESeqDataSet} from DESeq2 or \code{DGEList} object from edgeR +#' @param group factor vector of the condition from which each sample belongs +#' @param typeTrans transformation method for PCA/clustering with DESeq2: \code{"VST"} or \code{"rlog"} +#' @param gene.selection selection of the features in MDSPlot (\code{"pairwise"} by default) +#' @param col colors used for the PCA/MDS (one per biological condition) +#' @return A list containing the dds object and the results object +#' @author Hugo Varet + +exploreCounts <- function(object, group, typeTrans="VST", gene.selection="pairwise", + col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + if (class(object)=="DESeqDataSet"){ + if (typeTrans == "VST") counts.trans <- assay(varianceStabilizingTransformation(object)) + else counts.trans <- assay(rlogTransformation(object)) + PCAPlot(counts.trans=counts.trans, group=group, col=col) + clusterPlot(counts.trans=counts.trans, group=group) + } else if (class(object)=="DGEList"){ + MDSPlot(dge=object, group=group, col=col, gene.selection=gene.selection) + clusterPlot(counts.trans=cpm(object, prior.count=2, log=TRUE), group=group) + } else{ + stop("The object is not a DESeqDataSet nor a DGEList") + } +} diff --git a/R/exportResults.DESeq2.R b/R/exportResults.DESeq2.R new file mode 100644 index 0000000..f0a9937 --- /dev/null +++ b/R/exportResults.DESeq2.R @@ -0,0 +1,67 @@ +#' Export results for DESeq2 analyses +#' +#' Export counts and DESeq2 results +#' +#' @param out.DESeq2 the result of \code{run.DESeq2()} +#' @param group factor vector of the condition from which each sample belongs +#' @param cooksCutoff Cook's distance threshold for detecting outliers (\code{Inf} +#' to disable the detection, \code{NULL} to keep DESeq2 threshold) +#' @param alpha threshold to apply to adjusted p-values +#' @return A list of \code{data.frame} containing counts, pvalues, FDR, log2FC... +#' @author Marie-Agnes Dillies and Hugo Varet + +exportResults.DESeq2 <- function(out.DESeq2, group, cooksCutoff=NULL, alpha=0.05){ + + dds <- out.DESeq2$dds + results <- out.DESeq2$results + + # comptages bruts et normalisés + counts <- data.frame(Id=rownames(counts(dds)), counts(dds), round(counts(dds, normalized=TRUE))) + colnames(counts) <- c("Id", colnames(counts(dds)), paste0("norm.", colnames(counts(dds)))) + # baseMean avec identifiant + bm <- data.frame(Id=rownames(results[[1]]),baseMean=round(results[[1]][,"baseMean"],2)) + # merge des info, comptages et baseMean selon l'Id + base <- merge(counts, bm, by="Id", all=TRUE) + tmp <- base[,paste("norm", colnames(counts(dds)), sep=".")] + for (cond in levels(group)){ + base[,cond] <- round(apply(as.data.frame(tmp[,group==cond]),1,mean),0) + } + + complete <- list() + for (name in names(results)){ + complete.name <- base + + # ajout d'elements depuis results + res.name <- data.frame(Id=rownames(results[[name]]),FoldChange=round(2^(results[[name]][,"log2FoldChange"]),3), + log2FoldChange=round(results[[name]][,"log2FoldChange"],3),pvalue=results[[name]][,"pvalue"], + padj=results[[name]][,"padj"]) + complete.name <- merge(complete.name, res.name, by="Id", all=TRUE) + # ajout d'elements depuis mcols(dds) + mcols.add <- data.frame(Id=rownames(counts(dds)),dispGeneEst=round(mcols(dds)$dispGeneEst,4), + dispFit=round(mcols(dds)$dispFit,4),dispMAP=round(mcols(dds)$dispMAP,4), + dispersion=round(mcols(dds)$dispersion,4),betaConv=mcols(dds)$betaConv, + maxCooks=round(mcols(dds)$maxCooks,4)) + if (is.null(cooksCutoff)){ + m <- nrow(attr(dds,"modelMatrix")) + p <- ncol(attr(dds,"modelMatrix")) + cooksCutoff <- qf(.99, p, m - p) + } + mcols.add$outlier <- ifelse(mcols(dds)$maxCooks > cooksCutoff,"Yes","No") + complete.name <- merge(complete.name, mcols.add, by="Id", all=TRUE) + complete[[name]] <- complete.name + + # sélection des up et down + up.name <- complete.name[which(complete.name$padj <= alpha & complete.name$betaConv & complete.name$log2FoldChange>=0),] + up.name <- up.name[order(up.name$padj),] + down.name <- complete.name[which(complete.name$padj <= alpha & complete.name$betaConv & complete.name$log2FoldChange<=0),] + down.name <- down.name[order(down.name$padj),] + + # exports + name <- gsub("_","",name) + write.table(complete.name, file=paste0("tables/",name,".complete.txt"), sep="\t", row.names=FALSE, dec=".", quote=FALSE) + write.table(up.name, file=paste0("tables/", name,".up.txt"), row.names=FALSE, sep="\t", dec=".", quote=FALSE) + write.table(down.name, file=paste0("tables/", name,".down.txt"), row.names=FALSE, sep="\t", dec=".", quote=FALSE) + } + + return(complete) +} diff --git a/R/exportResults.edgeR.R b/R/exportResults.edgeR.R new file mode 100644 index 0000000..1e7429e --- /dev/null +++ b/R/exportResults.edgeR.R @@ -0,0 +1,61 @@ +#' Export results for edgeR analyses +#' +#' Export counts and edgeR results +#' +#' @param out.edgeR the result of \code{run.edgeR()} +#' @param group factor vector of the condition from which each sample belongs +#' @param counts non-filtered counts (used to keep them in the final table) +#' @param alpha threshold to apply to adjusted p-values +#' @return A list of \code{data.frame} containing counts, pvalues, FDR, log2FC... +#' @details \code{counts} are used as input just in order to export features with null counts too. +#' @author Marie-Agnes Dillies and Hugo Varet + +exportResults.edgeR <- function(out.edgeR, group, counts, alpha=0.05){ + + dge <- out.edgeR$dge + res <- out.edgeR$results + + # comptages bruts, normalisés et baseMean + tmm <- dge$samples$norm.factors + N <- colSums(dge$counts) + f <- tmm * N/mean(tmm * N) + normCounts <- round(scale(dge$counts, center=FALSE, scale=f)) + base <- data.frame(Id=rownames(counts), counts) + norm.bm <- data.frame(Id=rownames(normCounts),normCounts) + names(norm.bm) <- c("Id", paste0("norm.",colnames(normCounts))) + norm.bm$baseMean <- round(apply(scale(dge$counts, center=FALSE, scale=f),1,mean),2) + for (cond in levels(group)){ + norm.bm[,cond] <- round(apply(as.data.frame(normCounts[,group==cond]),1,mean),0) + } + base <- merge(base,norm.bm,by="Id",all=TRUE) + + complete <- list() + for (name in names(res)){ + complete.name <- base + + # ajout d'elements depuis res + res.name <- data.frame(Id=rownames(res[[name]]),FC=round(2^(res[[name]][,"logFC"]),3), + log2FoldChange=round(res[[name]][,"logFC"],3),pvalue=res[[name]][,"PValue"], + padj=res[[name]][,"FDR"]) + complete.name <- merge(complete.name, res.name, by="Id", all=TRUE) + # ajout d'elements depuis dge + dge.add <- data.frame(Id=rownames(dge$counts),tagwise.dispersion=round(dge$tagwise.dispersion,4), + trended.dispersion=round(dge$trended.dispersion,4)) + complete.name <- merge(complete.name, dge.add, by="Id", all=TRUE) + complete[[name]] <- complete.name + + # sélection des up et down + up.name <- complete.name[which(complete.name$padj <= alpha & complete.name$log2FoldChange>=0),] + up.name <- up.name[order(up.name$padj),] + down.name <- complete.name[which(complete.name$padj <= alpha & complete.name$log2FoldChange<=0),] + down.name <- down.name[order(down.name$padj),] + + # exports + name <- gsub("_","",name) + write.table(complete.name, file=paste0("tables/",name,".complete.txt"), sep="\t", row.names=FALSE, dec=".", quote=FALSE) + write.table(up.name, file=paste0("tables/", name,".up.txt"), row.names=FALSE, sep="\t", dec=".", quote=FALSE) + write.table(down.name, file=paste0("tables/", name,".down.txt"), row.names=FALSE, sep="\t", dec=".", quote=FALSE) + } + + return(complete) +} diff --git a/R/loadCountData.R b/R/loadCountData.R new file mode 100644 index 0000000..f9c05d4 --- /dev/null +++ b/R/loadCountData.R @@ -0,0 +1,52 @@ +#' Load count files +#' +#' Load one count file per sample thanks to the file names in the target file. +#' +#' @param target target \code{data.frame} of the project returned by \code{loadTargetFile()} +#' @param rawDir path to the directory containing the count files +#' @param header a logical value indicating whether the file contains the names of the variables as its first line +#' @param skip number of lines of the data file to skip before beginning to read data +#' @param featuresToRemove vector of feature Ids (or character string common to feature Ids) to remove from the counts +#' @return The \code{matrix} of raw counts with row names corresponding to the feature Ids and column names to the sample names as provided in the first column of the target. +#' @details If \code{featuresToRemove} is equal to \code{"rRNA"}, all the features containing the character string "rRNA" will be removed from the counts. +#' @author Marie-Agnes Dillies and Hugo Varet + +loadCountData <- function(target, rawDir="raw", header=FALSE, skip=0, + featuresToRemove=c("alignment_not_unique", "ambiguous", "no_feature", "not_aligned", "too_low_aQual")){ + + labels <- as.character(target[,1]) + files <- as.character(target[,2]) + + rawCounts <- read.table(paste(rawDir,files[1],sep="/"), sep="\t", quote="\"", header=header, skip=skip) + rawCounts <- rawCounts[,1:2] + colnames(rawCounts) <- c("Id", labels[1]) + cat("Loading files:\n") + cat(files[1],": ",length(rawCounts[,labels[1]])," rows and ",sum(rawCounts[,labels[1]]==0)," null count(s)\n",sep="") + + for (i in 2:length(files)){ + tmp <- read.table(paste(rawDir,files[i],sep="/"), sep="\t", header=header, skip=skip) + tmp <- tmp[,1:2] + colnames(tmp) <- c("Id", labels[i]) + rawCounts <- merge(rawCounts, tmp, by="Id", all=TRUE) + cat(files[i],": ",length(tmp[,labels[i]])," rows and ",sum(tmp[,labels[i]]==0)," null count(s)\n",sep="") + } + + rawCounts[is.na(rawCounts)] <- 0 + counts <- as.matrix(rawCounts[,-1]) + rownames(counts) <- rawCounts[,1] + + cat("\nFeatures removed:\n") + for (f in featuresToRemove){ + match <- grep(f, rownames(counts)) + if (length(match)>0){ + cat(rownames(counts)[match],sep="\n") + counts <- counts[-match,] + } + } + + cat("\nTop of the counts matrix:\n") + print(head(counts)) + cat("\nBottom of the counts matrix:\n") + print(tail(counts)) + return(counts) +} diff --git a/R/loadTargetFile.R b/R/loadTargetFile.R new file mode 100644 index 0000000..8ab1387 --- /dev/null +++ b/R/loadTargetFile.R @@ -0,0 +1,29 @@ +#' Load target file +#' +#' Load the target file containing sample information +#' +#' @param targetFile path to the target file +#' @param varInt variable on which sorting the target +#' @param condRef reference condition of \code{varInt} +#' @param batch batch effect to take into account +#' @return A \code{data.frame} containing the informations about the samples (name, file containing the counts and biological condition) +#' @details The \code{batch} parameter is used only to check if it is available in the target file before running the suite of the script. +#' @author Marie-Agnes Dillies and Hugo Varet + +loadTargetFile <- function(targetFile, varInt, condRef, batch){ + target <- read.table(targetFile, header=TRUE, sep="\t") + if (!I(varInt %in% names(target))) stop(paste("The factor of interest", varInt, "is not in the target file")) + if (!is.null(batch) && !I(batch %in% names(target))) stop(paste("The batch effect", batch, "is not in the target file")) + target[,varInt] <- as.factor(target[,varInt]) + if (!I(condRef %in% as.character(target[,varInt]))) stop(paste("The reference level", condRef, "is not a level of the factor of interest")) + target[,varInt] <- relevel(target[,varInt],ref=condRef) + target <- target[order(target[,varInt]),] + rownames(target) <- as.character(target[,1]) + # check if varInt contains replicates + if (min(table(target[,varInt]))<2) stop(paste("The factor of interest", varInt, "has a level without replicates")) + # warning message if batch is numeric + if (!is.null(batch) && is.numeric(target[,batch])) warning(paste("The", batch, "variable is numeric. Use factor() to convert it into a factor")) + cat("Target file:\n") + print(target) + return(target) +} diff --git a/R/majSequences.R b/R/majSequences.R new file mode 100644 index 0000000..3f7d096 --- /dev/null +++ b/R/majSequences.R @@ -0,0 +1,32 @@ +#' Most expressed sequences per sample +#' +#' Proportion of reads associated with the three most expressed sequences per sample +#' +#' @param counts \code{matrix} of counts +#' @param n number of most expressed sequences to return +#' @param group factor vector of the condition from which each sample belongs +#' @param col colors of the bars (one per biological condition) +#' @return A \code{matrix} with the percentage of reads of the three most expressed sequences and a file named majSeq.png in the figures directory +#' @author Marie-Agnes Dillies and Hugo Varet + +majSequences <- function(counts, n=3, group, col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + + seqnames <- apply(counts, 2, function(x){x <- sort(x, decreasing=TRUE); names(x)[1:n]}) + seqnames <- unique(unlist(as.character(seqnames))) + + sum <- apply(counts,2,sum) + counts <- counts[seqnames,] + sum <- matrix(sum,nrow(counts),ncol(counts),byrow=TRUE) + p <- round(100*counts/sum,digits=3) + + png(filename="figures/majSeq.png",width=400,height=400) + maj <- apply(p, 2, max) + seqname <- rownames(p)[apply(p, 2, which.max)] + x <- barplot(maj, col=col[as.integer(group)], main="Proportion of reads from most expressed sequence", + ylim=c(0, max(maj)*1.2), las=2, ylab="Proportion of reads") + legend("topright", levels(group), fill=col[1:nlevels(group)], bty="n") + for (i in 1:length(seqname)) text(x[i], maj[i]/2, seqname[i], cex=0.8, srt=90, adj=0) + dev.off() + + return(invisible(p)) +} diff --git a/R/nDiffTotal.r b/R/nDiffTotal.r new file mode 100644 index 0000000..4180237 --- /dev/null +++ b/R/nDiffTotal.r @@ -0,0 +1,26 @@ +#' Number of differentially expressed features per comparison +#' +#' Number of down- and up-regulated features per comparison +#' +#' @param complete list of \code{data.frame} containing features results (from \code{exportResults.DESeq2()} or \code{exportResults.edgeR()}) +#' @param alpha threshold to apply to the FDR +#' @return A matrix with the number of up, down and total of features per comparison +#' @author Marie-Agnes Dillies and Hugo Varet + +nDiffTotal <- function(complete, alpha=0.05){ + nDiffTotal <- matrix(NA,ncol=4,nrow=length(complete),dimnames=list(names(complete),c("Test vs Ref", "# down","# up","# total"))) + for (name in names(complete)){ + complete.name <- complete[[name]] + if (!is.null(complete.name$betaConv)){ + nDiffTotal[name,2:3]=c(nrow(complete.name[which(complete.name$padj <= alpha & complete.name$betaConv & complete.name$log2FoldChange<=0),]), + nrow(complete.name[which(complete.name$padj <= alpha & complete.name$betaConv & complete.name$log2FoldChange>=0),])) + } else{ + nDiffTotal[name,2:3]=c(nrow(complete.name[which(complete.name$padj <= alpha & complete.name$log2FoldChange<=0),]), + nrow(complete.name[which(complete.name$padj <= alpha & complete.name$log2FoldChange>=0),])) + } + } + nDiffTotal[,4] <- nDiffTotal[,2] + nDiffTotal[,3] + nDiffTotal[,1] <- gsub("_"," ",rownames(nDiffTotal)) + rownames(nDiffTotal) <- NULL + return(nDiffTotal) +} diff --git a/R/pairwiseScatterPlots.R b/R/pairwiseScatterPlots.R new file mode 100644 index 0000000..673abf6 --- /dev/null +++ b/R/pairwiseScatterPlots.R @@ -0,0 +1,25 @@ +#' Scatter plots for pairwise comparaisons of log counts +#' +#' Scatter plots for pairwise comparaisons of log counts +#' +#' @param counts \code{matrix} of raw counts +#' @param group factor vector of the condition from which each sample belongs +#' @return A file named pairwiseScatter.png in the figures directory containing a pairwise scatter plot with the SERE statistics in the lower panel +#' @author Marie-Agnes Dillies and Hugo Varet + +pairwiseScatterPlots <- function(counts, group){ + ncol <- ncol(counts) + png(filename="figures/pairwiseScatter.png",width=150*ncol,height=150*ncol) + # defining panel and lower.panel functions + panel <- function(x,y,...){points(x, y, pch=".");abline(a=0,b=1,lty=2);} + lower.panel <- function(x,y,...){ + horizontal <- (par("usr")[1] + par("usr")[2]) / 2; + vertical <- (par("usr")[3] + par("usr")[4]) / 2; + text(horizontal, vertical, round(SERE(2^cbind(x,y) - 1), digits=2), cex=ncol/2.5) + } + # use of the paris function + pairs(log2(counts+1), panel=panel, lower.panel=lower.panel, + las=1, labels=paste(colnames(counts),group,sep="\n"), + main="Pairwise scatter plot",cex.labels=ncol/2,cex.main=ncol/4) + dev.off() +} diff --git a/R/rawpHist.R b/R/rawpHist.R new file mode 100644 index 0000000..5bdb664 --- /dev/null +++ b/R/rawpHist.R @@ -0,0 +1,19 @@ +#' Histograms of raw p-values +#' +#' Histogram of raw p-values for each comparison +#' +#' @param complete a list of \code{data.frames} created by \code{summaryResults.DESeq2()} or \code{summaryResults.edgeR()} +#' @return A file named rawpHist.png in the figures directory with one histogram of raw p-values per comparison +#' @author Marie-Agnes Dillies and Hugo Varet + +rawpHist <- function(complete){ + nrow <- ceiling(sqrt(length(complete))) + ncol <- ceiling(length(complete)/nrow) + png(filename="figures/rawpHist.png", width=400*max(ncol,nrow), height=400*min(ncol,nrow)) + par(mfrow=sort(c(nrow,ncol))) + for (name in names(complete)){ + hist(complete[[name]][,"pvalue"], nclass=50, xlab="Raw p-value", + col="skyblue", las=1, main=paste0("Distribution of raw p-values - ",gsub("_"," ",name))) + } + dev.off() +} diff --git a/R/removeNull.R b/R/removeNull.R new file mode 100644 index 0000000..d6fe84c --- /dev/null +++ b/R/removeNull.R @@ -0,0 +1,11 @@ +#' Remove features with null counts in all samples +#' +#' Remove features with null counts in all samples. These features do not contain any information and will not be used for the statistical analysis. +#' +#' @param counts \code{matrix} of raw counts +#' @return The \code{matrix} of counts without features with only null counts +#' @author Marie-Agnes Dillies and Hugo Varet + +removeNull <- function(counts){ + return(counts[rowSums(counts) > 0,]) +} diff --git a/R/run.DESeq2.r b/R/run.DESeq2.r new file mode 100644 index 0000000..fc81e08 --- /dev/null +++ b/R/run.DESeq2.r @@ -0,0 +1,50 @@ +#' Wrapper to run DESeq2 +#' +#' Wrapper to run DESeq2: create the \code{DESeqDataSet}, normalize data, estimate dispersions, statistical testing... +#' +#' @param counts \code{matrix} of raw counts +#' @param target target \code{data.frame} of the project +#' @param varInt name of the factor of interest (biological condition) +#' @param batch batch effect to take into account (\code{NULL} by default) +#' @param locfunc \code{"median"} (default) or \code{"shorth"} to estimate the size factors +#' @param fitType mean-variance relationship: "parametric" (default) or "local" +#' @param pAdjustMethod p-value adjustment method: \code{"BH"} (default) or \code{"BY"} for instance +#' @param cooksCutoff outliers detection threshold (\code{NULL} to let DESeq2 choosing it) +#' @param independentFiltering \code{TRUE} or \code{FALSE} to perform the independent filtering or not +#' @param alpha significance threshold to apply to the adjusted p-values +#' @param ... optional arguments to be passed to \code{nbinomWaldTest()} +#' @return A list containing the \code{dds} object (\code{DESeqDataSet} class), the \code{results} objects (\code{DESeqResults} class) and the vector of size factors +#' @author Hugo Varet + +run.DESeq2 <- function(counts, target, varInt, batch=NULL, + locfunc="median", fitType="parametric", pAdjustMethod="BH", + cooksCutoff=NULL, independentFiltering=TRUE, alpha=0.05, ...){ + # building dds object + dds <- DESeqDataSetFromMatrix(countData=counts, colData=target, + design=formula(paste("~", ifelse(!is.null(batch), paste(batch,"+"), ""), varInt))) + cat("Design of the statistical model:\n") + cat(paste(as.character(design(dds)),collapse=" "),"\n") + + # normalization + dds <- estimateSizeFactors(dds,locfunc=eval(as.name(locfunc))) + cat("\nNormalization factors:\n") + print(sizeFactors(dds)) + + # estimating dispersions + dds <- estimateDispersions(dds, fitType=fitType) + + # statistical testing: perform all the comparisons between the levels of varInt + dds <- nbinomWaldTest(dds, ...) + results <- list() + for (comp in combn(nlevels(colData(dds)[,varInt]), 2, simplify=FALSE)){ + levelRef <- levels(colData(dds)[,varInt])[comp[1]] + levelTest <- levels(colData(dds)[,varInt])[comp[2]] + results[[paste0(levelTest,"_vs_",levelRef)]] <- results(dds, contrast=c(varInt, levelTest, levelRef), + pAdjustMethod=pAdjustMethod, + cooksCutoff=ifelse(!is.null(cooksCutoff),cooksCutoff,TRUE), + independentFiltering=independentFiltering, alpha=alpha) + cat(paste("Comparison", levelTest, "vs", levelRef, "done\n")) + } + + return(list(dds=dds,results=results,sf=sizeFactors(dds))) +} diff --git a/R/run.edgeR.r b/R/run.edgeR.r new file mode 100644 index 0000000..335b129 --- /dev/null +++ b/R/run.edgeR.r @@ -0,0 +1,69 @@ +#' Wrapper to run edgeR +#' +#' Wrapper to run edgeR: create the \code{dge} object, normalize data, estimate dispersions, statistical testing... +#' +#' @param counts \code{matrix} of counts +#' @param target target \code{data.frame} of the project +#' @param varInt name of the factor of interest (biological condition) +#' @param condRef reference biological condition +#' @param batch batch effect to take into account (\code{NULL} by default) +#' @param cpmCutoff counts-per-million cut-off to filter low counts +#' @param pAdjustMethod p-value adjustment method: \code{"BH"} (default) or \code{"BY"} +#' @param ... optional arguments to be passed to \code{glmFit()} +#' @return A list containing the \code{dge} object and the \code{results} object +#' @author Hugo Varet + +run.edgeR <- function(counts, target, varInt, condRef, batch=NULL, cpmCutoff=1, pAdjustMethod="BH", ...){ + + # filtering: select features which contain at least + # minReplicates (smallest number of replicates) with + # at least cpmCutoff counts per million + minReplicates <- min(table(target[,varInt])) + fcounts <- counts[rowSums(cpm(counts) >= cpmCutoff) >= minReplicates,] + cat("Number of features discarded by the filtering:\n") + cat(nrow(counts)-nrow(fcounts),"\n") + + # building dge object + design <- formula(paste("~", ifelse(!is.null(batch), paste(batch,"+"), ""), varInt)) + dge <- DGEList(counts=fcounts, remove.zeros=TRUE) + dge$design <- model.matrix(design, data=target) + cat("\nDesign of the statistical model:\n") + cat(paste(as.character(design),collapse=" "),"\n") + + # normalization + dge <- calcNormFactors(dge) + cat("\nNormalization factors:\n") + print(dge$samples$norm.factors) + + # estimating dispersions + dge <- estimateGLMCommonDisp(dge, dge$design) + dge <- estimateGLMTrendedDisp(dge, dge$design) + dge <- estimateGLMTagwiseDisp(dge, dge$design) + + # statistical testing: perform all the comparisons between the levels of varInt + fit <- glmFit(dge, dge$design, ...) + cat(paste("Coefficients of the model:",paste(colnames(fit$design),collapse=" ")),"\n") + colsToTest <- grep(varInt,colnames(fit$design)) + namesToTest <- paste0(gsub(varInt,"",colnames(fit$design)[colsToTest]),"_vs_",condRef) + results <- list() + # testing coefficients individually (tests againts the reference level) + for (i in 1:length(colsToTest)){ + cat(paste0("Comparison ",gsub("_"," ",namesToTest[i]),": testing coefficient ",colnames(fit$design)[colsToTest[i]]),"\n") + lrt <- glmLRT(fit, coef=colsToTest[i]) + results[[namesToTest[i]]] <- topTags(lrt,n=nrow(dge$counts),adjust.method=pAdjustMethod,sort.by="none")$table + } + # defining contrasts for the other comparisons (if applicable) + if (length(colsToTest)>=2){ + colnames <- gsub(varInt,"",colnames(fit$design)) + for (comp in combn(length(colsToTest),2,simplify=FALSE)){ + contrast <- numeric(ncol(dge$design)) + contrast[colsToTest[comp[1:2]]] <- c(-1,1) + namecomp <- paste0(colnames[colsToTest[comp[2]]],"_vs_",colnames[colsToTest[comp[1]]]) + cat(paste0("Comparison ",gsub("_"," ",namecomp),": testing contrast (",paste(contrast,collapse=", "),")"),"\n") + lrt <- glmLRT(fit, contrast=contrast) + results[[namecomp]] <- topTags(lrt,n=nrow(dge$counts),adjust.method=pAdjustMethod,sort.by="none")$table + } + } + + return(list(dge=dge,results=results)) +} diff --git a/R/summarizeResults.DESeq2.r b/R/summarizeResults.DESeq2.r new file mode 100644 index 0000000..931d84f --- /dev/null +++ b/R/summarizeResults.DESeq2.r @@ -0,0 +1,57 @@ +#' Summarize DESeq2 analysis +#' +#' Summarize DESeq2 analysis: diagnotic plots, dispersions plot, summary of the independent filtering, export results... +#' +#' @param out.DESeq2 the result of \code{run.DESeq2()} +#' @param group factor vector of the condition from which each sample belongs +#' @param independentFiltering \code{TRUE} or \code{FALSE} to perform the independent filtering or not +#' @param cooksCutoff Cook's distance threshold for detecting outliers (\code{Inf} +#' to disable the detection, \code{NULL} to keep DESeq2 threshold) +#' @param alpha significance threshold to apply to the adjusted p-values +#' @param col colors for the plots +#' @return A list containing: (i) a list of \code{data.frames} from \code{exportResults.DESeq2()}, (ii) the table summarizing the independent filtering procedure and (iii) a table summarizing the number of differentially expressed features +#' @author Hugo Varet + +summarizeResults.DESeq2 <- function(out.DESeq2, group, independentFiltering=TRUE, cooksCutoff=NULL, + alpha=0.05, col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + # create the figures/tables directory if does not exist + if (!I("figures" %in% dir())) dir.create("figures", showWarnings=FALSE) + if (!I("tables" %in% dir())) dir.create("tables", showWarnings=FALSE) + + dds <- out.DESeq2$dds + results <- out.DESeq2$results + + # diagnostic of the size factors + diagSizeFactorsPlots(dds=dds) + + # boxplots before and after normalisation + countsBoxplots(dds, group=group, col=col) + + # dispersions plot + dispersionsPlot(dds=dds) + + # results of the independent filtering + if (independentFiltering){ + tabIndepFiltering <- tabIndepFiltering(results) + cat("Number of features discarded by the independent filtering:\n") + print(tabIndepFiltering, quote=FALSE) + } else{ + tabIndepFiltering <- NULL + } + + # exporting results of the differential analysis + complete <- exportResults.DESeq2(out.DESeq2, group=group, cooksCutoff=cooksCutoff, alpha=alpha) + + # small table with number of differentially expressed features + nDiffTotal <- nDiffTotal(complete=complete, alpha=alpha) + cat("\nNumber of features down/up and total:\n") + print(nDiffTotal, quote=FALSE) + + # histograms of raw p-values + rawpHist(complete=complete) + + # MA-plots + MAPlot(complete=complete, alpha=alpha) + + return(list(complete=complete, tabIndepFiltering=tabIndepFiltering, nDiffTotal=nDiffTotal)) +} diff --git a/R/summarizeResults.edgeR.r b/R/summarizeResults.edgeR.r new file mode 100644 index 0000000..e7a252e --- /dev/null +++ b/R/summarizeResults.edgeR.r @@ -0,0 +1,40 @@ +#' Summarize edgeR analysis +#' +#' Summarize edgeR analysis: diagnotic plots, dispersions plot, summary of the independent filtering, export results... +#' +#' @param out.edgeR the result of \code{run.edgeR()} +#' @param group factor vector of the condition from which each sample belongs +#' @param counts matrix of raw counts +#' @param alpha significance threshold to apply to the adjusted p-values +#' @param col colors for the plots +#' @return A list containing: (i) a list of \code{data.frames} from \code{exportResults.edgeR()} and (ii) a table summarizing the number of differentially expressed features +#' @author Hugo Varet + +summarizeResults.edgeR <- function(out.edgeR, group, counts, alpha=0.05, + col=c("lightblue","orange","MediumVioletRed","SpringGreen")){ + # create the figures/tables directory if does not exist + if (!I("figures" %in% dir())) dir.create("figures", showWarnings=FALSE) + if (!I("tables" %in% dir())) dir.create("tables", showWarnings=FALSE) + + # boxplots before and after normalisation + countsBoxplots(out.edgeR$dge, group=group, col=col) + + # dispersions + BCVPlot(dge=out.edgeR$dge) + + # exporting results of the differential analysis + complete <- exportResults.edgeR(out.edgeR=out.edgeR, group=group, counts=counts, alpha=alpha) + + # small table with number of differentially expressed features + nDiffTotal <- nDiffTotal(complete=complete, alpha=alpha) + cat("Number of features down/up and total:\n") + print(nDiffTotal, quote=FALSE) + + # histograms of raw p-values + rawpHist(complete=complete) + + # MA-plots + MAPlot(complete=complete, alpha=alpha) + + return(list(complete=complete, nDiffTotal=nDiffTotal)) +} diff --git a/R/tabIndepFiltering.R b/R/tabIndepFiltering.R new file mode 100644 index 0000000..9d554f2 --- /dev/null +++ b/R/tabIndepFiltering.R @@ -0,0 +1,20 @@ +#' Table of the number of features discarded by the independent filtering (if use of DESeq2) +#' +#' Compute the number of features discarded by the independent filtering for each comparison (if use of DESeq2) +#' +#' @param results list of results of \code{results(dds,...)} with chosen parameters +#' @return A \code{matrix} with the threshold and the number of features discarded for each comparison +#' @author Marie-Agnes Dillies and Hugo Varet + +tabIndepFiltering <- function(results){ + out <- matrix(NA,ncol=3,nrow=length(names(results)),dimnames=list(names(results),c("Test vs Ref","Threshold","# discarded"))) + for (name in names(results)){ + threshold <- attr(results[[name]],"filterThreshold") + out[name,2] <- round(threshold,2) + use <- results[[name]]$baseMean > threshold + out[name,3] <- ifelse(is.na(table(use)["FALSE"]),0,table(use)["FALSE"]) + } + out[,1] <- gsub("_"," ",rownames(out)) + rownames(out) <- NULL + return(out) +} diff --git a/R/tabSERE.R b/R/tabSERE.R new file mode 100644 index 0000000..0c846df --- /dev/null +++ b/R/tabSERE.R @@ -0,0 +1,18 @@ +#' SERE statistics for several samples +#' +#' Compute the SERE statistic for each pair of samples +#' +#' @param counts \code{matrix} of raw counts +#' @return The \code{matrix} of SERE values +#' @author Marie-Agnes Dillies and Hugo Varet + +tabSERE <- function(counts){ + sere <- matrix(NA, ncol=ncol(counts), nrow=ncol(counts)) + for (i in 1:ncol(counts)){ + for (j in 1:ncol(counts)){ + sere[i,j] <- SERE(counts[,c(i,j)]) + } + } + colnames(sere) <- rownames(sere) <- colnames(counts) + return(invisible(round(sere, digits=3))) +} diff --git a/R/welcome.R b/R/welcome.R new file mode 100644 index 0000000..467cf66 --- /dev/null +++ b/R/welcome.R @@ -0,0 +1,19 @@ +# ========================================================================== +# package initialization +# ========================================================================== +.onAttach = function(libname, pkgname) { + msg <- c("----------------------------------------------", + "Welcome to SARTools.", + "R template scripts are available at the end of the vignette.") + # checking DESeq2 version + if (packageVersion("DESeq2") < "1.6.0" | packageVersion("DESeq2") >= "1.7.0"){ + msg <- c(msg,"warning: SARTools has been developped with DESeq2 1.6.X, your version of DESeq2 might be incompatible with the workflow.") + } + # checking edgeR version + if (packageVersion("edgeR") < "3.8.0" | packageVersion("edgeR") >= "3.9.0"){ + msg <- c(msg,"warning: SARTools has been developped with edgeR 3.8.X, your version of edgeR might be incompatible with the workflow.") + } + msg <- c(msg,"----------------------------------------------") + msg <- strwrap(msg, exdent=4, indent=4) + packageStartupMessage(paste(msg, collapse="\n"), appendLF=TRUE) +} diff --git a/R/writeReport.DESeq2.r b/R/writeReport.DESeq2.r new file mode 100644 index 0000000..a04e9c3 --- /dev/null +++ b/R/writeReport.DESeq2.r @@ -0,0 +1,42 @@ +#' Write HTML report for DESeq2 analyses +#' +#' Write HTML report from graphs and tables created during the analysis with DESeq2 +#' +#' @param target target \code{data.frame} of the project returned by \code{loadTargetFile()} +#' @param counts \code{matrix} of counts returned by \code{loadCountData()} +#' @param out.DESeq2 the result of \code{run.DESeq2()} +#' @param summaryResults the result of \code{summarizeResults.DESeq2()} +#' @param majSequences the result of \code{descriptionPlots()} +#' @param workDir working directory +#' @param projectName name of the project +#' @param author name of the author of the analysis +#' @param targetFile path to the target file +#' @param rawDir path to the directory containing the counts files +#' @param featuresToRemove vector of features to remove from the counts matrix +#' @param varInt factor of interest (biological condition) +#' @param condRef reference condition for the factor of interest +#' @param batch variable to take as a batch effect +#' @param fitType mean-variance relationship: \code{"parametric"} (default) or \code{"local"} +#' @param cooksCutoff outliers detection threshold +#' @param independentFiltering \code{TRUE} or \code{FALSE} to perform the independent filtering or not +#' @param alpha threshold of statistical significance +#' @param pAdjustMethod p-value adjustment method: \code{"BH"} or \code{"BY"} for instance +#' @param typeTrans transformation for PCA/clustering: \code{"VST"} or \code{"rlog"} +#' @param locfunc \code{"median"} (default) or \code{"shorth"} to estimate the size factors +#' @param colors vector of colors of each biological condition on the plots +#' @details This function generates the HTML report for a statistical analysis with DESeq2. It uses the tables and graphs created during the workflow as well as the parameters defined at the beginning of the script. +#' @author Hugo Varet + +writeReport.DESeq2 <- function(target, counts, out.DESeq2, summaryResults, majSequences, + workDir, projectName, author, targetFile, rawDir, + featuresToRemove, varInt, condRef, batch, fitType, + cooksCutoff, independentFiltering, alpha, pAdjustMethod, + typeTrans, locfunc, colors){ + knit2html(input=system.file("report_DESeq2.rmd", package="SARTools"), + output=paste0(projectName, "_report.html"), + quiet=TRUE, title="Statistical report") + # delete unwanted directory/file + unlink("cache",force=TRUE,recursive=TRUE) + unlink("report_DESeq2.md",force=TRUE) + cat("HTML report created\n") +} diff --git a/R/writeReport.edgeR.r b/R/writeReport.edgeR.r new file mode 100644 index 0000000..7aec7c1 --- /dev/null +++ b/R/writeReport.edgeR.r @@ -0,0 +1,37 @@ +#' Write HTML report for edgeR analyses +#' +#' Write HTML report from graphs and tables created during the analysis with edgeR +#' +#' @param target target \code{data.frame} of the project returned by \code{loadTargetFile()} +#' @param counts \code{matrix} of counts returned by \code{loadCountData()} +#' @param out.edgeR the result of \code{run.edgeR()} +#' @param summaryResults the result of \code{summarizeResults.DESeq2()} +#' @param majSequences the result of \code{descriptionPlots()} +#' @param workDir path to the working directory +#' @param projectName name of the project +#' @param author name of the author of the analysis +#' @param targetFile path to the target file +#' @param rawDir path to the directory containing the counts files +#' @param featuresToRemove vector of features to remove from the counts matrix +#' @param varInt factor of interest (biological condition) +#' @param condRef reference condition for the factor of interest +#' @param batch variable to take as a batch effect +#' @param alpha threshold of statistical significance +#' @param pAdjustMethod p-value adjustment method: \code{"BH"} (default) or \code{"BY"} +#' @param colors vector of colors of each biological condition on the plots +#' @param gene.selection selection of the features in \code{MDSPlot()} (\code{"pairwise"} by default) +#' @details This function generates the HTML report for a statistical analysis with edgeR. It uses the tables and graphs created during the workflow as well as the parameters defined at the beginning of the script. +#' @author Hugo Varet + +writeReport.edgeR <- function(target,counts,out.edgeR,summaryResults,majSequences, + workDir,projectName,author,targetFile,rawDir, + featuresToRemove,varInt,condRef,batch, + alpha,pAdjustMethod,colors,gene.selection){ + knit2html(input=system.file("report_edgeR.rmd", package="SARTools"), + output=paste0(projectName, "_report.html"), + quiet=TRUE, title="Statistical report") + # delete unwanted directory/file + unlink("cache",force=TRUE,recursive=TRUE) + unlink(paste0("report_edgeR.md"),force=TRUE) + cat("HTML report created\n") +} diff --git a/README.md b/README.md index 0a452fc..c1e94ca 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,25 @@ SARTools ======== -SARTools will be available soon... +SARTools is a R package dedicated to the differential analysis of RNA-seq data. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis. Note that SARTools does not intend to replace DESeq2 or edgeR: it simply provides an environment to go with them. For more details about the methodology behind DESeq2 or edgeR, the user should read their documentations and papers. + +SARTools is distributed with two R script templates which use functions of the package. For a more fluid analysis and to avoid possible bugs when creating the final HTML report, the user is encouraged to use them rather than writing a new script. + +How to install SARTools? +------------------------ + +In addition to the SARTools package itself, the workflow requires the installation of several packages: DESeq2, edgeR, genefilter, xtable and knitr (all available online, see the dedicated webpages). This current version of SARTools has been developed under R 3.1.1 and with DESeq2 1.6.1, edgeR 3.8.2, genefilter 1.48.1 and knitr 1.7. As a DESeq2 or edgeR update might make the workflow unusable due to modifications on the statistical models, care is recommended when updating these packages. + +To install the SARTools package from GitHub, open a R session and: +- install DESeq2, edgeR, genefilter if not installed yet (see the dedicated webpages for Bioconductor packages) +- load the devtools R package: `library(devtools)` (after `install.packages("devtools")` if not installed yet) +- run `install_github("hvaret/SARTools")` + +How to use SARTools? +-------------------- + +A PDF vignette (tutorial.pdf) is available within the package and provides extensive information on the use of SARTools. To open it, run `vignette("tutorial",package="SARTools")` + +About SARTools +-------------- +The SARTools package has been developped at PF2 - Institut Pasteur by M.-A. Dillies and H. Varet (hugo.varet@pasteur.fr). Thanks to cite H. Varet, J.-Y. Coppee and M.-A. Dillies, _SARTools: a DESeq2- and edgeR-based R pipeline for comprehensive differential analysis of RNA-seq data_, 2014 (submitted) when using this tool for any analysis published. diff --git a/inst/CITATION b/inst/CITATION new file mode 100644 index 0000000..bc57f15 --- /dev/null +++ b/inst/CITATION @@ -0,0 +1,14 @@ +citEntry(entry = "article", + title = "SARTools: a DESeq2- and edgeR-based R pipeline for comprehensive differential analysis of RNA-seq data", + author = personList( as.person("Hugo Varet"), + as.person("Jean-Yves Coppée"), + as.person("Marie-Agnès Dillies")), + year = 2014, + journal = "Bioinformatics", + doi = "(submitted)", + url = "", + textVersion = + paste("Hugo Varet, Jean-Yves Coppée and Marie-Agnès Dillies (2014):", + "SARTools: a DESeq2- and edgeR-based R pipeline for comprehensive differential analysis of RNA-seq data.", + "Bioinformatics (submitted)" )) + diff --git a/inst/Thumbs.db b/inst/Thumbs.db new file mode 100644 index 0000000000000000000000000000000000000000..37426081a6354661a0162a304db58cbbac6eaa2a GIT binary patch literal 9728 zcmeHMcRXC*wm*mdwDK1`%73^2f4alJYv7;h|8JlDm;V3H z0zzOcF@X^K_xC}VJlyi5`iuJCh1kD}|3@wGkNi&`qfY}zuIXLX0~i?oePX9i06G9O z(}4pAn3%x>Gc)r+77kVxP`KFH**LhlczC$ExVd=+LIg!#F-g^mjei-|!X5MDuv z6B6PlM8zTEzZ+oyA7wena+sC%us9z#pZI@U=*<8(E8xfgFfm8~jNA-N+zj+~00MrF zGJ~W2W58gUk?8<9N>;W*?BER*N5IvBiHQ*$KS%(q4F&6g1Ki9!$IojYBU1t~9dv5N2{s9jIgB}Gxjd&Ir_54M2Qu3>m)U?;>Z?dv;a`TX%@;{e-DX*xk zs;;SRZENr7M0b7d9vm7T8O4r$AIHti&dvW?SX^2rY;F;^cXmm8`@ebp=J_A;*Esw; zUfdurM$jz>SbpEp)C;ua6e-QgmUgH1<69ecxCT;)Zgos%|~qHlR0-OLsh6kqkjeB4!9yhGr_o7=7} zCP3Mu^5Y6K5tim8H+#s0glIi=V7L`+79bNg3?^{`s+{qTr1uaEFgmO*^Kf zS$*u;v}c=EJo?+yqAJt!U!nIm`0Q+~m&AE1(zD~2tk+jhc5bt{pWNy4)h|J%7a`w_ z>dvOk%9yKC`5xBg|$AYut`uUm+F*_RNjvn0Q+z{*2n(nxqZd@^QXPnlFWe)JC!N$P~Bk5?z5iU0j$z2O^o* zKkV$sMXZzsNnPqVeDUfLBGjQMsHy|?ee~zF;pZSOfz8ctNCP?`3E6i+uT#0iXBuZY z-?y6W1k85R0aW$g{To+R+s1FEDNqp88Rz0F_670loti4m)|{_yGshT5_$=S#-A{JN z*spy1>E8aU+K2hBn(u#{t~7ADT!>usAS$*tZOSh>Y|nGi9?Kt=NLd=H9Dj>n^;Aon z(wRq1=+mxQrqF@g{&Zkq2o{O02^)uQnwB?_nb33~I}as^M??jj3|dB=!Ma$pX~LoQX@nPG%|29hqDRCXwo^<%=PK5Eu5&fjO+FM@Ye2{XgsM8acCpA#?`R=kWV2IF z%543hGG#@}&m|NclFfSjXP~G7t0y^FY|?he=FwP3RNot_RP8gb)45fMItf1LfGKTu20daXZ9PWf7orMXfJ`En- z6lHVLK+Ekb?4mw5^1}o1*yurZF)xF1O`GsaSUWFdN0bO%&1il@8jap2)OdJ3ACFVXQ7;>tkbQ2 zSJj*6%5}J2KF3`}_xi5oIm}6u@S1Fn z2U>ZJu=C9pt+M&o2upN;Bh=n5Avea?CoUID6UAxCQN#utj|b%qSr1p()LB%O+jvVe zJG^Fj*k?>8QRLXWXrdH$d@r`LvQ%m{zf#F{D!_j-P3C3I%2=31H*L6y8xF&XIT;PY zp`X}gy;gh6YCH3;Z%9UJs)`lNQ(>mVN*>sT;SHJk8brg1KC6TokAdcC`I&p@am;RL z;;-3K;<=IB)qL7koe~@|^UbF7gLE}E;8CEVOpmD3dYjNTQ zM=C(?dnGa+?72{EJb5VECgaW}NL;AMgG}QxgS~d7eQuR!2Es&_f3I}kz(ki1_%ncY zXH+bNyOdyO2}QA1Wla@1Tvno~&3zh92e>Imwq5%lETExR;FDwzMAt$lVoa3|EOw$8QK8g|ra8a!*4RaTH=xQzPMQa>l51xeX#+zwM_)KU{~eY|*+QH7D^%nNoS{r3fO6n7nB{ z(~hDEB$8r)eq1#TX<=5|+KEf7Fzm5|Gz;#(6WF?N$GPrUaQbm*?&^^w7`O! zRV}R{ixf7l{@%~x%TQ+G&WSXI=Ob^AHp?_J2X)i>cdyr_;fkwa!v+Y_q$RZobt4;J z&=ws2EK^0rr}vIi(%Vd@a;jqVVs7g2PCwGZ%CLW-Al-CFG5_i$dU$tHI(~+AAyPN# zB%hkvyGT<7*1~U!?TwJ(?U479#-pc-E7)&d-x;_jZ1DJ)ShMdr14!H>Y>Ria0x543 z(`JM-q$+t5@vY0QQjpsg%_91?3q>lpuqK?5&aB`IVuMWUQ=^fhB&R^cf*1J`?=eX) z)vw2B{O?N4qSW35UZX;J4Q3s+^bZ!vcr}GD34=k+sROGSJQWZl^8WI0Np)qkvzvt6 zgFxB2+XU;1P35>*D-ABfZs+)`VE&&aF7>rh-BtG$Xk_&SmU#JbxG^DEj0x8lH_MJX zJn3ZSo8hF$9Q6G3aLNJ_8I0vlHIK;3FUYr^@b*Y|+*0$4eXs8-G8Kw)8kYS$ZZtJP zEHK{&}^|s9n?*NxA-*Xz!V^rB(P<9C^(vlm}x~T0OkhcC<(`-TKzf%fs`_zfI?k?b`Wp9>GK# zXgoJEbftEbBguARPhc(5wmT|N2i=p<*_eUXtNkg3FN&`oOwn>eXBzR`)AOJT<~F)al{UrM9Nf1$!7x zIVa~6f!}q9S%xgS8jIpiwsGH`bq%OQA_p9_Y{Le|bL&rQOmPITW}2iXZdjRfnO;{~ zZsaAJ14{!do66YT;6mB^D-0#*X$pbf7$-sG~h&?B~JJZ4rw*`Kq_;nt-B&zPhQXM%84r zz2FEcr{9a{bl(~xhbzNux<1&`7af2x*uQ%8S+8e~uaqcnnEI}m=4yhYhUgNQ3)8H{ z3taQHMfSX<%CmGWn^jK)wm{yy!MYG%7TWjVyts)X3Fx6^;gZ^>tyui|L+MxSkn_+< z%&a|`U^tT+H?1zB?c=FLD+takDkt3#3tvY=WHDlFG9OpsO07JtEvWDKH!3?-ZM##f z#Rt7cr$P^-?FNVqfkw$jqokAGI7n3tLW)_n8!tF6`-Z&dDAU(`Qu4w1$E=-rUdY-g>8nG{nC?r6u0YcE+=?N`@4L zZ)$B+A3lw$AQk`gxL5zWSMimIud$?fUTHb%@f7EM8LRlX)8EZrR;h5Tkiug}GNPe^ z^_tT*W5m3wnvseZfTOcV>_<)Z(xng0jG+aFIB0wh;b(ftBkAzKY!SBzd%HI04Ec8n zuY7EfaY*f}%9+czM-Sx&3M6Ob_BD2oj&~2r8>$Bttq!kG40&iFgPRYI!Xr42*6M!d&njA#M`sHS&dn42ZSX@?Hz5PbLD zZ_}bE@{2xATf8g03ndO)oJ+@{_tk^UuHhSV5L8i#Zt}9Gs4* zoVUngu#E?k0sBtOxw;n&PoKu;h-;%ORCdr%f^m>@Xux$kFq<+3*CwUorlP*)cA%t| zR0b5Uc+PO`Dlc4=B)67jtCi+z57}aV4t0%|pX@FQFtsh1#P&AhUC{|F9x#)o+MGap z@nLq>UX_>#&ZT;P%B8}y6DkmV_&w4RH4jR(fn~rS5_ThKCv4oty`NoRui5_M%h)&C z;_Tcaerx>d{UokR`Tj;V60{vQ#c)C1L^SQqhZwQ?v_xd@s(?hX{_b-bt!Z8xvo^Cf zY_{c(v=9|)n6%P)rSp99rcIiuO9<(7kF%pNGg4+~U1SWX>f$orYyLmjg?A-dbOW*gX?idmDK9av6dp##!9HTfp* zg|M+B@0-l>itbsauxU>^5V@W}H+`p$Kv%%;(N~8Csb8ArbOxJN=wO$3m3n@3V+Oxj)7v@!PExji?|J(yd$>r?NIST9?6WR8!AJq&!hwW%t2`_=c{(&2qf^M zn+_1&hcvjzmlofsMBOJE3%Uj8eked*uKaS~sd_YL&9Rr}gs$z+8DCs-A%d7@%)=;`7EuK8PNHp^#f>se)$bU@MVPChJO+bm^rXYc$JOIqNX&1}ZK zGZ@-+9S7a$y%5V0BE0ybF;3qmf-6o&q3-ZF) z4~<$>)L$mNfN{nqc2VRx`rc(oS>UUS!t}m78x#i{k-LqJ(~BHL;ivG}1P$$(1U^J7 z@60`-4(CK@Hhfe6m-7U~=Y|i)nk?Suv6=+6Z?!&M^}S) ziXH@Yzm6`8{Yh~1hq>IU7t_#`?PhnA-mjA!q-j15J4|}*54i)~KY~&sdEn|W0%2d* zGMCm10@eHaBUL$iE_&vWXZhm@-m`gWGZ?0yK_P9R5ilVFXYB6+x<5dHnQTU}{n;`L6NJ z!`fonJ17Z)3VQ0v-$ZwpMqas&6(;myO5jneYo1(}*sM zCZc&j7u*JWyC3Q<;%UEoqh{oIS>=n=apoU$d#)ndL>VPWOM>_WMsReaOwy$Px2LR4s!gWG)}q=g4SxH_g0NAxwjQOT$J8B0rU2l`-~ z_OQ@d)%82nWQ~*jKYY=0?g~%J@;O-EDjx*rqQPlucBA-(Lqpc?16}&Er!7iC6>BSK zXBIsPhST#6n5e}1B}>iA-6sLdm!8n$t8AQmqYpmeH<4_n4-%BtC--j+`j1K|$sDTR zoUcf#vU!hw44u=#(*f#}wI-q|HWq$@4kRN97;112n9*?jL}4Io=;L=I3+pe=a>|>3 z875Yn4qke>@IbqqhwXqULq>2uS!4MnDS4I-#8C8+9V%mI?oXz?KL_(qTx02fU$iIO zEo5EpXm4g|M{vFJl#w*$d$du+rfdt1M5`DJ!!&K`YpBx!TL>M{Yv6-1JM0PSP#m(=mk?c@2prX9 z3%Mo*X1;o;(0r6JH6C1~tws%`(Cj zvnTmeoDLLog6|{;p*%e)Wv_2_uTV0OAL)6kC*?_jB1HuUGTg0Np za;-k)%^9#8&n9AyiEgMRloUhpp(La>!?`Ft6wEvS4RDosN~`5sa0np@(sVVUv}!P? z85D&Jui~PZX@yYp5p%vnO;ZqX-uhTEwZ}6>Ye?HWBwNe@BR4`8_QgmJv)H{bWNR20 z6))!(r~%#!4xSXy2uWtR4DvzHF&_tXgO6^I8=wyTTbpNRF+UXRJ?xk>P`FGSLmHNh3(o_(H4# YTcCt Statistical report of project `r projectName`: +#
pairwise comparison(s) of conditions
+#
with DESeq2
+ +-------------------------------------------------------------------------------------------------------------------------- + +Author: `r author` + +Date: `r Sys.Date()` + +The SARTools R package which generated this report has been developped at PF2 - Institut Pasteur by M.-A. Dillies and H. Varet (hugo.varet@pasteur.fr). Thanks to cite H. Varet, J.-Y. Coppee and M.-A. Dillies, _SARTools: a DESeq2- and edgeR-based R pipeline for comprehensive differential analysis of RNA-seq data_, 2014 (submitted) when using this tool for any analysis published. + +-------------------------------------------------------------------------------------------------------------------------- + +## Table of contents + +1. Introduction +2. Description of raw data +3. Variability within the experiment: data exploration +4. Normalization +5. Differential analysis +6. R session information and parameters +7. Bibliography + +-------------------------------------------------------------------------------------------------------------------------- + +## 1 Introduction + +The analyses reported in this document are part of the `r projectName` project. The aim is to find features that are differentially expressed between `r paste(paste(levels(target[,varInt])[-nlevels(target[,varInt])],collapse=", "),levels(target[,varInt])[nlevels(target[,varInt])],sep=" and ")`. The statistical analysis process includes data normalization, graphical exploration of raw and normalized data, test for differential expression for each feature between the conditions, raw p-value adjustment and export of lists of features having a significant differential expression between the conditions. `r ifelse(!is.null(batch),paste0("In this analysis, the ",batch, " effect will be taken into account in the statistical models."),"")` + +The analysis is performed using the R software [R Core Team, 2014], Bioconductor [Gentleman, 2004] packages including DESeq2 [Anders, 2010 and Love, 2014] and the SARTools package developed at PF2 - Institut Pasteur. Normalization and differential analysis are carried out according to the DESeq2 model and package. This report comes with additional tab-delimited text files that contain lists of differentially expressed features. + +For more details about the DESeq2 methodology, please refer to its related publications [Anders, 2010 and Love, 2014]. + +-------------------------------------------------------------------------------------------------------------------------- + +## 2 Description of raw data + +The count data files and associated biological conditions are listed in the following table. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(target,caption="Table 1: Data files and associated biological conditions."), type="html", include.rownames=FALSE, html.table.attributes = "align='center'") +``` + +After loading the data we first have a look at the raw data table itself. The data table contains one row per annotated feature and one column per sequenced sample. Row names of this table are feature IDs (unique identifiers). The table contains raw count values representing the number of reads that map onto the features. For this project, there are `r nrow(counts)` features in the count data table. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(head(counts),caption="Table 2: Partial view of the count data table."), type="html", html.table.attributes = "align='center'") +``` + +Looking at the summary of the count table provides a basic description of these raw counts (min and max values, median, etc). + +```{r , cache=TRUE, echo=FALSE, results="asis"} +fun_summary=function(x){ + out=c(quantile(x,c(0,0.25,0.5),type=1),mean(x),quantile(x,c(0.75,1),type=1)) + names(out)=c("Min.","1st Qu.","Median","Mean","3rd Qu.","Max.") + return(round(out,0)) +} +print(xtable(apply(counts,2,fun_summary),caption="Table 3: Summary of the raw counts.",digits=0), type="html", html.table.attributes = "align='center'") +nbNull <- nrow(counts) - nrow(removeNull(counts)) # needed in one of the next paragraphs +``` + +Figure 1 shows the total number of mapped reads for each sample. Reads that map on multiple locations on the transcriptome are counted more than once, as far as they are mapped on less than 50 different loci. We expect total read counts to be similar within conditions, they may be different across conditions. Total counts sometimes vary widely between replicates. This may happen for several reasons, including: +- different rRNA contamination levels between samples (even between biological replicates); +- slight differences between library concentrations, since they may be difficult to measure with high precision. + +
+
+ Barplot total counts +
Figure 1: Number of mapped reads per sample. Colors refer to the biological condition of the sample.
+
+
+ +Figure 2 shows the proportion of features with no read count in each sample. We expect this proportion to be similar within conditions. Features with null read counts in the `r ncol(counts)` samples are left in the data but are not taken into account for the analysis with DESeq2. Here, `r nbNull` features (`r round(100*nbNull/nrow(counts),2)`%) are in this situation (dashed line). Results for those features (fold-change and p-values) are set to NA in the results files. + +
+
+ Barplot null counts +
Figure 2: Proportion of features with null read counts in each sample.
+
+
+ +Figure 3 shows the distribution of read counts for each sample. For sake of readability, $\text{log}_2(\text{counts}+1)$ are used instead of raw counts. Again we expect replicates to have similar distributions. In addition, this figure shows if read counts are preferably low, medium or high. This depends on the organisms as well as the biological conditions under consideration. + +
+
+ Estimated densities of raw counts +
Figure 3: Density distribution of read counts.
+
+
+ +It may happen that one or a few features capture a high proportion of reads (up to 20% or more). This phenomenon should not influence the normalization process. The DESeq2 normalization has proved to be robust to this situation [Dillies, 2012]. Anyway, we expect these high count features to be the same across replicates. They are not necessarily the same across conditions. Figure 4 and table 4 illustrate the possible presence of such high count features in the data set. + +
+
+ Most represented sequences +
Figure 4: Percentage of reads associated with the sequence having the highest count (provided in each box on the graph) for each sample.
+
+
+ +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(majSequences,caption="Table 4: Percentage of reads associated with the sequences having the highest counts."), type="html", html.table.attributes = "align='center'") +``` + +We may wish to assess the similarity between samples across conditions. A pairwise scatter plot is produced (figure 5) to show how replicates and samples from different biological conditions are similar or different ($\text{log}_2(\text{counts}+1)$ are used instead of raw count values). Moreover, as the Pearson correlation has been shown not to be relevant to measure the similarity between replicates, the SERE statistic has been proposed as a similarity index between RNA-Seq samples [Schulze, 2012]. It measures whether the variability between samples is random Poisson variability or higher. Pairwise SERE values are printed in the lower triangle of the pairwise scatter plot. The value of the SERE statistic is: +- 0 when samples are identical (no variability at all: this may happen in the case of a sample duplication); +- 1 for technical replicates (technical variability follows a Poisson distribution); +- greater than 1 for biological replicates and samples from different biological conditions (biological variability is higher than technical one, data are over-dispersed with respect to Poisson). The higher the SERE value, the lower the similarity. It is expected to be lower between biological replicates than between samples of different biological conditions. Hence, the SERE statistic can be used to detect inversions between samples. + +
+
+ Pairwise scatter plot +
Figure 5: Pairwise comparison of samples.
+
+
+ + +-------------------------------------------------------------------------------------------------------------------------- + +## 3 Variability within the experiment: data exploration + +The main variability within the experiment is expected to come from biological differences between the samples. This can be checked in two ways. The first one is to perform a hierarchical clustering of the whole sample set. This is performed after a transformation of the count data which can be either a Variance Stabilizing Transformation (VST) or a regularized log transformation (rlog) [Anders, 2010 and Love, 2014]. + +A VST is a transformation of the data that makes them homoscedastic, meaning that the variance is then independent of the mean. It is performed in two steps: (i) a mean-variance relationship is estimated from the data with the same function that is used to normalize count data and (ii) from this relationship, a transformation of the data is performed in order to get a dataset in which the variance is independent of the mean. The homoscedasticity is a prerequisite for the use of some data analysis methods, such as hierarchical clustering or Principal Component Analysis (PCA). The regularized log transformation is based on a GLM (Generalized Linear Model) on the counts and has the same goal as a VST but is more robust in the case when the size factors vary widely. + +Figure 6 shows the dendrogram obtained from `r typeTrans`-transformed data. An euclidean distance is computed between samples, and the dendrogram is built upon the Ward criterion. We expect this dendrogram to group replicates and separate biological conditions. + +
+
+ Clustering +
Figure 6: Sample clustering based on normalized data.
+
+
+ +Another way of visualizing the experiment variability is to look at the first principal components of the PCA, as shown on the figure 7. On this figure, the first principal component (PC1) is expected to separate samples from the different biological conditions, meaning that the biological variability is the main source of variance in the data. + +
+
+ Principal component analysis +
Figure 7: First two components of a Principal Component Analysis, with percentages of variance associated with each axis.
+
+
+ +```{r , cache=TRUE, echo=FALSE, results="asis"} +if (!is.null(batch)){ + cat("For the statistical analysis, we need to take into account the effect of the ",batch," parameter. Statistical models and tests will thus be adjusted on it.\n") +} +``` + +-------------------------------------------------------------------------------------------------------------------------- + +## 4 Normalization + +Normalization aims at correcting systematic technical biases in the data, in order to make read counts comparable across samples. The normalization proposed by DESeq2 relies on the hypothesis that most features are not differentially expressed. It computes a scaling factor for each sample. Normalized read counts are obtained by dividing raw read counts by the scaling factor associated with the sample they belong to. Scaling factors around 1 mean (almost) no normalization is performed. Scaling factors lower than 1 will produce normalized counts higher than raw ones, and the other way around. Two options are available to compute scaling factors: locfunc="median" (default) or locfunc="shorth". Here, the normalization was performed with locfunc="`r locfunc`". + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(t(matrix(out.DESeq2$sf, dimnames=list(target$label,"Size factor"))),caption="Table 5: Normalization factors."), type="html", html.table.attributes = "align='center'") +``` + +The histograms (figure 8) can help to validate the choice of the normalization parameter ("median" or "shorth"). Under the hypothesis that most features are not differentially expressed, each size factor is expected to be close to the mode of the distribution of the counts divided by their geometric means across samples. + +
+
+ Diagnostic of size factors +
Figure 8: Diagnostic of the estimation of the size factors.
+
+
+ +The figure 9 shows that the scaling factors of DESeq2 and the total count normalization factors may not perform similarly. + +
+
+ Size factors vs total counts +
Figure 9: Plot of the estimated size factors and the total number of reads per sample.
+
+
+ +Boxplots are often used as a qualitative measure of the quality of the normalization process, as they show how distributions are globally affected during this process. We expect normalization to stabilize distributions across samples. Figure 10 shows boxplots of raw (left) and normalized (right) data respectively. + +
+
+ Boxplots of raw and normalized counts +
Figure 10: Boxplots of raw (left) and normalized (right) read counts.
+
+
+ +-------------------------------------------------------------------------------------------------------------------------- + +## 5 Differential analysis + +### 5.1 Modelisation + +DESeq2 aims at fitting one linear model per feature. For this project, the design used is counts `r paste(as.character(design(out.DESeq2$dds)),collapse=" ")` and the goal is to estimate the models' coefficients which can be interpreted as $\log_2(\texttt{FC})$. These coefficients will then be tested to get p-values and adjusted p-values. + +### 5.2 Outlier detection + +Model outliers are features for which at least one sample seems unrelated to the experimental or study design. For every feature and for every sample, the Cook's distance [Cook, 1977] reflects how the sample matches the model. A large value of the Cook's distance indicates an outlier count and p-values are not computed for the corresponding feature. `r ifelse(!is.null(cooksCutoff) && is.infinite(cooksCutoff),"For this project, the detection of model outliers have been turned off by setting the cut-off to the infinite.","")` + +### 5.3 Dispersions estimation + +The DESeq2 model assumes that the count data follow a negative binomial distribution which is a robust alternative to the Poisson law when data are over-dispersed (the variance is higher than the mean). The first step of the statistical procedure is to estimate the dispersion of the data. Its purpose is to determine the shape of the mean-variance relationship. The default is to apply a GLM (Generalized Linear Model) based method (fitType="parametric"), which can handle complex designs but may not converge in some cases. The alternative is to use fitType="local" as described in the original paper [Anders, 2010]. The parameter used for this project is fitType="`r fitType`". Then, DESeq2 imposes a Cox Reid-adjusted profile likelihood maximization [Cox, 1987 and McCarthy, 2012] and uses the maximum _a posteriori_ (MAP) of the dispersion [Wu, 2013]. + +
+
+ Dispersions estimations +
Figure 11: Dispersion estimates (left) and diagnostic of log-normality (right).
+
+
+ +The left panel on figure 11 shows the result of the dispersion estimation step. The x- and y-axes represent the mean count value and the estimated dispersion respectively. Black dots represent empirical dispersion estimates for each feature (from the observed counts). The red dots show the mean-variance relationship function (fitted dispersion value) as estimated by the model. The blue dots are the final estimates from the maximum _a posteriori_ and are used to perform the statistical test. Blue circles (if any) point out dispersion outliers. These are features with a very high empirical variance (computed from observed counts). These high dispersion values fall far from the model estimation. For these features, the statistical test is based on the empirical variance in order to be more conservative than with the MAP dispersion. These features will have low chance to be declared significant. The figure on the right panel allows to check the hypothesis of log-normality of the dispersions. + +### 5.4 Statistical test for differential expression + +Once the dispersion estimation and the model fitting have been done, DESeq2 can perform the statistical testing. Figure 12 shows the distributions of raw p-values computed by the statistical test for the comparison(s) done. This distribution is expected to be a mixture of a uniform distribution on $[0,1]$ and a peak around 0 corresponding to the differentially expressed features. + +
+
+ Histogram(s) of raw p-values +
Figure 12: Distribution(s) of raw p-values.
+
+
+ +### 5.5 Independent filtering + +DESeq2 can perform an independent filtering to increase the detection power of differentially expressed features at the same experiment-wide type I error. Since features with very low counts are not likely to see significant differences typically due to high dispersion, it defines a threshold on the mean of the normalized counts irrespective of the biological condition. This procedure is independent because the information about the variables in the design formula is not used [Love, 2014]. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +if (independentFiltering){ + cat("Table 6 reports the thresholds used for each comparison and the number of features discarded by the independent filtering. Adjusted p-values of discarded features are then set to NA.\n") + print(xtable(summaryResults$tabIndepFiltering,caption="Table 6: Number of features discarded by the independent filtering for each comparison."),type="html",include.rownames=FALSE, html.table.attributes = "align='center'") +} else{ + cat("For this project, no independent filtering has been performed.\\\\") +} +``` + +### 5.6 Final results + +A p-value adjustment is performed to take into account multiple testing and control the false positive rate to a chosen level $\alpha$. For this analysis, a `r pAdjustMethod` p-value adjustment was performed [Benjamini, 1995 and 2001] and the level of controlled false positive rate was set to `r alpha`. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(summaryResults$nDiffTotal,caption=paste0(ifelse(independentFiltering,"Table 7: ","Table 6: "),"Number of up-, down- and total number of differentially expressed features for each comparison.")),type="html",include.rownames=FALSE, html.table.attributes = "align='center'") +``` + +Figure 13 represents the MA-plot of the data for the comparisons done, where differentially expressed features are highlighted in red. A MA-plot represents the log ratio of differential expression as a function of the mean intensity for each feature. + +
+
+ MA-plot(s) +
Figure 13: MA-plot(s) of each comparison. Red dots represent significantly differentially expressed features.
+
+
+ +Full results as well as lists of differentially expressed features are provided in the following text files which can be easily read in a spreadsheet. For each comparison: +- TestVsRef.complete.txt contains results for all the features; +- TestVsRef.up.txt contains results for significantly up-regulated features. Features are ordered from the most significant adjusted p-value to the less significant one; +- TestVsRef.down.txt contains results for significantly down-regulated features. Features are ordered from the most significant adjusted p-value to the less significant one. + +These files contain the following columns: +- Id: unique feature identifier; +- sampleName: raw counts per sample; +- norm.sampleName: rounded normalized counts per sample; +- baseMean: base mean over all samples; +- `r paste(paste(levels(target[,varInt])[-nlevels(target[,varInt])],collapse=", "),levels(target[,varInt])[nlevels(target[,varInt])],sep=" and ")`: means (rounded) of normalized counts of the biological conditions; +- FoldChange: fold change of expression, calculated as $2^{\log_2(\text{FC})}$; +- log2FoldChange: $\log_2(\text{FC})$ as estimated by the GLM model. It reflects the differential expression between Test and Ref and can be interpreted as $\log_2(\frac{\text{Test}}{\text{Ref}})$. If this value is: + + around 0: the feature expression is similar in both conditions; + + positive: the feature is up-regulated ($\text{Test} > \text{Ref}$); + + negative: the feature is down-regulated ($\text{Test} < \text{Ref}$); +- pvalue: raw p-value from the statistical test; +- padj: adjusted p-value on which the cut-off $\alpha$ is applied; +- dispGeneEst: dispersion parameter estimated from feature counts (i.e. black dots on figure 11); +- dispFit: dispersion parameter estimated from the model (i.e. red dots on figure 11); +- dispMAP: dispersion parameter estimated from the Maximum _A Posteriori_ model; +- dispersion: final dispersion parameter used to perform the test (i.e. blue dots and circles on figure 11); +- betaConv: convergence of the coefficients of the model (TRUE or FALSE); +- maxCooks: maximum Cook's distance of the feature; +- outlier: indicates if the feature has been detected as a count outlier (it does not make sense if an infinite threshold is given for the Cook's distance). + +-------------------------------------------------------------------------------------------------------------------------- + +## 6 R session information and parameters + +The versions of the R software and Bioconductor packages used for this analysis are listed below. It is important to save them if one wants to re-perform the analysis in the same conditions. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(sessionInfo()) +``` + +Parameter values used for this analysis are: + +- workDir: `r workDir` +- projectName: `r projectName` +- author: `r author` +- targetFile: `r targetFile` +- rawDir: `r rawDir` +- featuresToRemove: `r ifelse(is.null(featuresToRemove),"NULL",paste(featuresToRemove,collapse=", "))` +- varInt: `r varInt` +- condRef: `r condRef` +- batch: `r ifelse(is.null(batch),"NULL",batch)` +- fitType: `r fitType` +- cooksCutoff: `r ifelse(is.null(cooksCutoff),"NULL",cooksCutoff)` +- independentFiltering: `r independentFiltering` +- alpha: `r alpha` +- pAdjustMethod: `r pAdjustMethod` +- typeTrans: `r typeTrans` +- locfunc: `r locfunc` +- colors: `r colors` + +-------------------------------------------------------------------------------------------------------------------------- + +## 7 Bibliography + +- R Core Team, R: A Language and Environment for Statistical Computing, _R Foundation for Statistical Computing_, 2014 +- Gentleman, Carey, Bates et al, Bioconductor: Open software development for computational biology and bioinformatics, _Genome Biology_, 2004 +- Anders and Huber, Differential expression analysis for sequence count data, _Genome Biology_, 2010 +- Love, Huber and Anders, Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2, _bioRxiv_, 2014 +- Dillies, Rau, Aubert et al, A comprehensive evaluation of normalization methods for Illumina RNA-seq data analysis, _Briefings in Bioinformatics_, 2012 +- Schulze, Kanwar, Golzenleuchter et al, SERE: Single-parameter quality control and sample comparison for RNA-Seq, _BMC Genomics_, 2012 +- Cook, Detection of Influential Observation in Linear Regression, _Technometrics_, 1977 +- Cox and Reid, Parameter orthogonality and approximate conditional inference, _Journal of the Royal Statistical Society_, 1987 +- McCarthy, Chen and Smyth, Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation, _Nucleic Acids Research_, 2012 +- Wu, Wang and Wu, A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data, _Biostatistics_, 2013 +- Benjamini and Hochberg, Controlling the False Discovery Rate : A Practical and Powerful Approach to Multiple Testing, _Journal of the Royal Statistical Society_, 1995 +- Benjamini and Yekutieli, The control of the false discovery rate in multiple testing under dependency, _The Annals of Statistics_, 2001 diff --git a/inst/report_edgeR.rmd b/inst/report_edgeR.rmd new file mode 100644 index 0000000..fd5a22b --- /dev/null +++ b/inst/report_edgeR.rmd @@ -0,0 +1,286 @@ +#
Statistical report of project `r projectName`:
+#
pairwise comparison(s) of conditions
+#
with edgeR
+ +-------------------------------------------------------------------------------------------------------------------------- + +Author: `r author` + +Date: `r Sys.Date()` + +The SARTools R package which generated this report has been developped at PF2 - Institut Pasteur by M.-A. Dillies and H. Varet (hugo.varet@pasteur.fr). Thanks to cite H. Varet, J.-Y. Coppee and M.-A. Dillies, _SARTools: a DESeq2- and edgeR-based R pipeline for comprehensive differential analysis of RNA-seq data_, 2014 (submitted) when using this tool for any analysis published. + +-------------------------------------------------------------------------------------------------------------------------- + +## Table of contents + +1. Introduction +2. Description of raw data +3. Filtering low counts +4. Variability within the experiment: data exploration +5. Normalization +6. Differential analysis +7. R session information and parameters +8. Bibliography + +-------------------------------------------------------------------------------------------------------------------------- + +## 1 Introduction + +The analyses reported in this document are part of the `r projectName` project. The aim is to find features that are differentially expressed between `r paste(paste(levels(target[,varInt])[-nlevels(target[,varInt])],collapse=", "),levels(target[,varInt])[nlevels(target[,varInt])],sep=" and ")`. The statistical analysis process includes data normalization, graphical exploration of raw and normalized data, test for differential expression for each feature between the conditions, raw p-value adjustment and export of lists of features having a significant differential expression between the conditions. `r ifelse(!is.null(batch),paste0("In this analysis, the ",batch, " effect will be taken into account in the statistical models."),"")` + +The analysis is performed using the R software [R Core Team, 2014], Bioconductor [Gentleman, 2004] packages including edgeR [Robinson, 2010] and the SARTools package developed at PF2 - Institut Pasteur. Normalization and differential analysis are carried out according to the edgeR model and package. This report comes with additional tab-delimited text files that contain lists of differentially expressed features. + +For more details about the edgeR methodology, please refer to its related publications [Robinson, 2007, 2008, 2010 and McCarthy, 2012]. + +-------------------------------------------------------------------------------------------------------------------------- + +## 2 Description of raw data + +The count data files and associated biological conditions are listed in the following table. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(target,caption="Table 1: Data files and associated biological conditions."), type="html", include.rownames=FALSE, html.table.attributes = "align='center'") +``` + +After loading the data we first have a look at the raw data table itself. The data table contains one row per annotated feature and one column per sequenced sample. Row names of this table are feature IDs (unique identifiers). The table contains raw count values representing the number of reads that map onto the features. For this project, there are `r nrow(counts)` features in the count data table. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(head(counts),caption="Table 2: Partial view of the count data table."), type="html", html.table.attributes = "align='center'") +``` + +Looking at the summary of the count table provides a basic description of these raw counts (min and max values, median, etc). + +```{r , cache=TRUE, echo=FALSE, results="asis"} +fun_summary=function(x){ + out=c(quantile(x,c(0,0.25,0.5),type=1),mean(x),quantile(x,c(0.75,1),type=1)) + names(out)=c("Min.","1st Qu.","Median","Mean","3rd Qu.","Max.") + return(round(out,0)) +} +print(xtable(apply(counts,2,fun_summary),caption="Table 3: Summary of the raw counts.",digits=0), type="html", html.table.attributes = "align='center'") +nbNull <- nrow(counts) - nrow(removeNull(counts)) +percentNull <- nbNull/nrow(counts) +``` + +Figure 1 shows the total number of mapped reads for each sample. Reads that map on multiple locations on the transcriptome are counted more than once, as far as they are mapped on less than 50 different loci. We expect total read counts to be similar within conditions, they may be different across conditions. Total counts sometimes vary widely between replicates. This may happen for several reasons, including: +- different rRNA contamination levels between samples (even between biological replicates); +- slight differences between library concentrations, since they may be difficult to measure with high precision. + +
+
+ Barplot total counts +
Figure 1: Number of mapped reads per sample. Colors refer to the biological condition of the sample.
+
+
+ +Figure 2 shows the proportion of features with no read count in each sample. We expect this proportion to be similar within conditions. Features with null read counts in the `r ncol(counts)` samples will not be taken into account for the analysis with edgeR. Here, `r nbNull` features (`r round(100*percentNull,2)`%) are in this situation (dashed line). + +
+
+ Barplot null counts +
Figure 2: Proportion of features with null read counts in each sample.
+
+
+ +Figure 3 shows the distribution of read counts for each sample. For sake of readability, $\text{log}_2(\text{counts}+1)$ are used instead of raw counts. Again we expect replicates to have similar distributions. In addition, this figure shows if read counts are preferably low, medium or high. This depends on the organisms as well as the biological conditions under consideration. + +
+
+ Estimated densities of raw counts +
Figure 3: Density distribution of read counts.
+
+
+ +It may happen that one or a few features capture a high proportion of reads (up to 20% or more). This phenomenon should not influence the normalization process. The edgeR normalization has proved to be robust to this situation [Dillies, 2012]. Anyway, we expect these high count features to be the same across replicates. They are not necessarily the same across conditions. Figure 4 and table 4 illustrate the possible presence of such high count features in the data set. + +
+
+ Most represented sequences +
Figure 4: Percentage of reads associated with the sequence having the highest count (provided in each box on the graph) for each sample.
+
+
+ +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(majSequences,caption="Table 4: Percentage of reads associated with the sequences having the highest counts."), type="html", html.table.attributes = "align='center'") +``` + +We may wish to assess the similarity between samples across conditions. A pairwise scatter plot is produced (figure 5) to show how replicates and samples from different biological conditions are similar or different ($\text{log}_2(\text{counts}+1)$ are used instead of raw count values). Moreover, as the Pearson correlation has been shown not to be relevant to measure the similarity between replicates, the SERE statistic has been proposed as a similarity index between RNA-Seq samples [Schulze, 2012]. It measures whether the variability between samples is random Poisson variability or higher. Pairwise SERE values are printed in the lower triangle of the pairwise scatter plot. The value of the SERE statistic is: +- 0 when samples are identical (no variability at all: this may happen in the case of a sample duplication); +- 1 for technical replicates (technical variability follows a Poisson distribution); +- greater than 1 for biological replicates and samples from different biological conditions (biological variability is higher than technical one, data are over-dispersed with respect to Poisson). The higher the SERE value, the lower the similarity. It is expected to be lower between biological replicates than between samples of different biological conditions. Hence, the SERE statistic can be used to detect inversions between samples. + +
+
+ Pairwise scatter plot +
Figure 5: Pairwise comparison of samples.
+
+
+ + +-------------------------------------------------------------------------------------------------------------------------- + +## 3 Filtering low counts + +edgeR suggests to filter features with null or low counts because they do not supply much information. For this project, `r nrow(counts) - nrow(out.edgeR$dge$counts)` features (`r round(100*(nrow(counts)-nrow(out.edgeR$dge$counts))/nrow(counts),2)`%) have been removed from the analysis because they did not satisfy the following condition: having at least `r cpmCutoff` counts-per-million in at least `r min(table(target[,varInt]))` samples. + +-------------------------------------------------------------------------------------------------------------------------- + +## 4 Variability within the experiment: data exploration + +The main variability within the experiment is expected to come from biological differences between the samples. This can be checked in two ways. The first one is to perform a hierarchical clustering of the whole sample set. This is performed after a transformation of the count data as moderated log-counts-per-million. Figure 6 shows the dendrogram obtained from CPM data. An euclidean distance is computed between samples, and the dendrogram is built upon the Ward criterion. We expect this dendrogram to group replicates and separate biological conditions. + +
+
+ Clustering +
Figure 6: Sample clustering based on normalized data.
+
+
+ +Another way of visualizing the experiment variability is to look at the first two dimensions of a multidimensional scaling plot, as shown on figure 7. On this figure, the first dimension is expected to separate samples from the different biological conditions, meaning that the biological variability is the main source of variance in the data. + +
+
+ Multidimensional scaling plot +
Figure 7: Multidimensional scaling plot of the samples.
+
+
+ +```{r , cache=TRUE, echo=FALSE, results="asis"} +if (!is.null(batch)){ + cat("For the statistical analysis, we need to take into account the effect of the ",batch," parameter. Statistical models and tests will thus be adjusted on it.\n") +} +``` + +-------------------------------------------------------------------------------------------------------------------------- + +## 5 Normalization + +Normalization aims at correcting systematic technical biases in the data, in order to make read counts comparable across samples. The normalization proposed by edgeR is called Trimmed Mean of M-values (TMM) and relies on the hypothesis that most features are not differentially expressed. + +It computes a factor for each sample. These normalization factors apply to the total number of counts and cannot be used to normalize read counts in a direct manner. Indeed, normalization factors are used to normalize total counts. These in turn are used to normalize read counts according to a total count normalization: if $N_j$ is the total number of reads of the sample $j$ and $f_j$ its normalization factor, $N'_j=f_j \times N_j$ is the normalized total number of reads. Then, let $s_j=N'_j/\bar{N'}$ with $\bar{N'}$ the mean of the $N'_j$ s. Finally, the normalized counts of the sample $j$ are defined as $x'_{ij}=x_{ij}/s_j$ where $i$ is the gene index. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(t(matrix(out.edgeR$dge$samples$norm.factors, dimnames=list(target$label,"TMM"))),caption="Table 5: Normalization factors."), type="html", html.table.attributes = "align='center'") +``` + +Boxplots are often used to assess the quality of the normalization process, as they show how distributions are globally affected during this process. We expect normalization to stabilize distributions across samples. Figure 8 shows boxplots of raw (left) and normalized (right) data respectively. + +
+
+ Boxplots of raw and normalized counts +
Figure 8: Boxplots of raw (left) and normalized (right) read counts.
+
+
+ +-------------------------------------------------------------------------------------------------------------------------- + +## 6 Differential analysis + +### 6.1 Modelization + +edgeR aims at fitting one linear model per feature. For this project, the design used is `r paste(as.character(paste("~", ifelse(!is.null(batch), paste(batch,"+"), ""), varInt)),collapse=" ")` and the goal is to estimate the models' coefficients which can be interpreted as $\log_2(\texttt{FC})$. These coefficients will then be tested to get p-values and adjusted p-values. + +### 6.2 Dispersions estimation + +The edgeR model assumes that the count data follow a negative binomial distribution which is a robust alternative to the Poisson law when data are over-dispersed (the variance is higher than the mean). The first step of the statistical procedure is to estimate the dispersion of the data. + +
+
+ Dispersions estimations +
Figure 9: Dispersion estimates.
+
+
+ +Figure 9 shows the result of the dispersion estimation step. The x- and y-axes represent the mean count value and the estimated dispersion respectively. Black dots represent empirical dispersion estimates for each feature (from the observed count values). The blue curve shows the relationship between the means of the counts and the dispersions modeled with splines. The red segment represents the common dispersion. + +### 6.3 Statistical test for differential expression + +Once the dispersion estimation and the model fitting have been done, edgeR can perform the statistical testing. Figure 10 shows the distributions of raw p-values computed by the statistical test for the comparison(s) done. This distribution is expected to be a mixture of a uniform distribution on $[0,1]$ and a peak around 0 corresponding to the differentially expressed features. + +
+
+ Histogram(s) of raw p-values +
Figure 10: Distribution(s) of raw p-values.
+
+
+ +### 6.4 Final results + +A p-value adjustment is performed to take into account multiple testing and control the false positive rate to a chosen level $\alpha$. For this analysis, a `r pAdjustMethod` p-value adjustment was performed [Benjamini, 1995 and 2001] and the level of controlled false positive rate was set to `r alpha`. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(xtable(summaryResults$nDiffTotal,caption="Table 6: Number of up-, down- and total number of differentially expressed features for each comparison."),type="html",include.rownames=FALSE, html.table.attributes = "align='center'") +``` + +Figure 11 represents the MA-plot of the data for the comparisons done, where differentially expressed features are highlighted in red. A MA-plot represents the log ratio of differential expression as a function of the mean intensity for each feature. + +
+
+ MA-plot(s) +
Figure 11: MA-plot(s) of each comparison. Red dots represent significantly differentially expressed features.
+
+
+ +Full results as well as lists of differentially expressed features are provided in the following text files which can be easily read in a spreadsheet. For each comparison: +- TestVsRef.complete.txt contains results for all the features; +- TestVsRef.up.txt contains results for up-regulated features. Features are ordered from the most significant adjusted p-value to the less significant one; +- TestVsRef.down.txt contains results for down-regulated features. Features are ordered from the most significant adjusted p-value to the less significant one. + +These files contain the following columns: +- Id: unique feature identifier; +- sampleName: raw counts per sample; +- norm.sampleName: rounded normalized counts per sample; +- baseMean: base mean over all samples; +- `r paste(paste(levels(target[,varInt])[-nlevels(target[,varInt])],collapse=", "),levels(target[,varInt])[nlevels(target[,varInt])],sep=" and ")`: means (rounded) of normalized counts of the biological conditions; +- FoldChange: fold change of expression, calculated as $2^{\log_2(\text{FC})}$; +- log2FoldChange: $\log_2(\text{FC})$ as estimated by the GLM model. It reflects the differential expression between Test and Ref and can be interpreted as $\log_2(\frac{\text{Test}}{\text{Ref}})$. If this value is: + + around 0: the feature expression is similar in both conditions; + + positive: the feature is up-regulated ($\text{Test} > \text{Ref}$); + + negative: the feature is down-regulated ($\text{Test} < \text{Ref}$); +- pvalue: raw p-value from the statistical test; +- padj: adjusted p-value on which the cut-off $\alpha$ is applied; +- tagwise.dispersion: dispersion parameter estimated from feature counts (i.e. black dots on figure 9); +- trended.dispersion: dispersion parameter estimated with splines (i.e. blue curve on figure 9). + +-------------------------------------------------------------------------------------------------------------------------- + +## 7 R session information and parameters + +The versions of the R software and Bioconductor packages used for this analysis are listed below. It is important to save them if one wants to re-perform the analysis in the same conditions. + +```{r , cache=TRUE, echo=FALSE, results="asis"} +print(sessionInfo()) +``` + +Parameter values used for this analysis are: + +- workDir: `r workDir` +- projectName: `r projectName` +- author: `r author` +- targetFile: `r targetFile` +- rawDir: `r rawDir` +- featuresToRemove: `r ifelse(is.null(featuresToRemove),"NULL",paste(featuresToRemove,collapse=", "))` +- varInt: `r varInt` +- condRef: `r condRef` +- batch: `r ifelse(is.null(batch),"NULL",batch)` +- alpha: `r alpha` +- pAdjustMethod: `r pAdjustMethod` +- cpmCutoff: `r cpmCutoff` +- gene.selection: `r gene.selection` +- colors: `r colors` + +-------------------------------------------------------------------------------------------------------------------------- + +## 8 Bibliography + +- R Core Team, R: A Language and Environment for Statistical Computing, _R Foundation for Statistical Computing_, 2014 +- Gentleman, Carey, Bates et al, Bioconductor: Open software development for computational biology and bioinformatics, _Genome Biology_, 2004 +- Robinson and Smyth, Moderated statistical tests for assessing differences in tag abundance, _Bioinformatics_, 2007 +- Robinson and Smyth, Small-sample estimation of negative binomial dispersion, with applications to SAGE data, _Biostatistics_, 2008 +- Robinson, McCarthy and Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, _Bioinformatics_, 2010 +- Dillies, Rau, Aubert et al, A comprehensive evaluation of normalization methods for Illumina RNA-seq data analysis, _Briefings in Bioinformatics_, 2012 +- Schulze, Kanwar, Golzenleuchter et al, SERE: Single-parameter quality control and sample comparison for RNA-Seq, _BMC Genomics_, 2012 +- McCarthy, Chen and Smyth, Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation, _Nucleic Acids Research_, 2012 +- Wu, Wang and Wu, A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data, _Biostatistics_, 2013 +- Benjamini and Hochberg, Controlling the False Discovery Rate : A Practical and Powerful Approach to Multiple Testing, _Journal of the Royal Statistical Society_, 1995 +- Benjamini and Yekutieli, The control of the false discovery rate in multiple testing under dependency, _The Annals of Statistics_, 2001 diff --git a/inst/target.txt b/inst/target.txt new file mode 100644 index 0000000..b20ca7c --- /dev/null +++ b/inst/target.txt @@ -0,0 +1 @@ +label files group WT1 WT1.htseq.out WT WT2 WT2.htseq.out WT KO1 KO1.htseq.out KO KO2 KO2.htseq.out KO \ No newline at end of file diff --git a/inst/template_script_DESeq2.r b/inst/template_script_DESeq2.r new file mode 100644 index 0000000..ef31d26 --- /dev/null +++ b/inst/template_script_DESeq2.r @@ -0,0 +1,85 @@ +################################################################################ +### R script to compare several conditions with the SARTools and DESeq2 packages +### Hugo Varet +### December 10th, 2014 +### designed to be executed with SARTools 1.0.0 +################################################################################ + +################################################################################ +### parameters: to be modified by the user ### +################################################################################ +rm(list=ls()) # remove all the objects from the R session + +workDir <- "C:/path/to/your/working/directory/" # working directory for the R session + +projectName <- "projectName" # name of the project +author <- "Your name" # author of the statistical analysis/report + +targetFile <- "target.txt" # path to the design/target file +rawDir <- "raw" # path to the directory containing raw counts files +featuresToRemove <- c("alignment_not_unique", # names of the features to be removed + "ambiguous", "no_feature", # (specific HTSeq-count information and rRNA for example) + "not_aligned", "too_low_aQual") + +varInt <- "group" # factor of interest +condRef <- "WT" # reference biological condition +batch <- NULL # blocking factor: NULL (default) or "batch" for example + +fitType <- "parametric" # mean-variance relationship: "parametric" (default) or "local" +cooksCutoff <- NULL # outliers detection threshold (NULL to let DESeq2 choosing it) +independentFiltering <- TRUE # TRUE/FALSE to perform independent filtering (default is TRUE) +alpha <- 0.05 # threshold of statistical significance +pAdjustMethod <- "BH" # p-value adjustment method: "BH" (default) or "BY" + +typeTrans <- "VST" # transformation for PCA/clustering: "VST" or "rlog" +locfunc <- "median" # "median" (default) or "shorth" to estimate the size factors + +colors <- c("dodgerblue","firebrick1", # vector of colors of each biological condition on the plots + "MediumVioletRed","SpringGreen") + +################################################################################ +### running script ### +################################################################################ +setwd(workDir) +library(SARTools) +if (locfunc=="shorth") library(genefilter) + +# checking parameters +checkParameters.DESeq2(projectName=projectName,author=author,targetFile=targetFile, + rawDir=rawDir,featuresToRemove=featuresToRemove,varInt=varInt, + condRef=condRef,batch=batch,fitType=fitType,cooksCutoff=cooksCutoff, + independentFiltering=independentFiltering,alpha=alpha,pAdjustMethod=pAdjustMethod, + typeTrans=typeTrans,locfunc=locfunc,colors=colors) + +# loading target file +target <- loadTargetFile(targetFile=targetFile, varInt=varInt, condRef=condRef, batch=batch) + +# loading counts +counts <- loadCountData(target=target, rawDir=rawDir, featuresToRemove=featuresToRemove) + +# description plots +majSequences <- descriptionPlots(counts=counts, group=target[,varInt], col=colors) + +# analysis with DESeq2 +out.DESeq2 <- run.DESeq2(counts=counts, target=target, varInt=varInt, batch=batch, + locfunc=locfunc, fitType=fitType, pAdjustMethod=pAdjustMethod, + cooksCutoff=cooksCutoff, independentFiltering=independentFiltering, alpha=alpha) + +# PCA + clustering +exploreCounts(object=out.DESeq2$dds, group=target[,varInt], typeTrans=typeTrans, col=colors) + +# summary of the analysis (boxplots, dispersions, diag size factors, export table, nDiffTotal, histograms, MA plot) +summaryResults <- summarizeResults.DESeq2(out.DESeq2, group=target[,varInt], col=colors, + independentFiltering=independentFiltering, + cooksCutoff=cooksCutoff, alpha=alpha) + +# save image of the R session +save.image(file=paste0(projectName, ".RData")) + +# generating HTML report +writeReport.DESeq2(target=target, counts=counts, out.DESeq2=out.DESeq2, summaryResults=summaryResults, + majSequences=majSequences, workDir=workDir, projectName=projectName, author=author, + targetFile=targetFile, rawDir=rawDir, featuresToRemove=featuresToRemove, varInt=varInt, + condRef=condRef, batch=batch, fitType=fitType, cooksCutoff=cooksCutoff, + independentFiltering=independentFiltering, alpha=alpha, pAdjustMethod=pAdjustMethod, + typeTrans=typeTrans, locfunc=locfunc, colors=colors) diff --git a/inst/template_script_edgeR.r b/inst/template_script_edgeR.r new file mode 100644 index 0000000..ee6e634 --- /dev/null +++ b/inst/template_script_edgeR.r @@ -0,0 +1,76 @@ +################################################################################ +### R script to compare several conditions with the SARTools and edgeR packages +### Hugo Varet +### December 10th, 2014 +### designed to be executed with SARTools 1.0.0 +################################################################################ + +################################################################################ +### parameters: to be modified by the user ### +################################################################################ +rm(list=ls()) # remove all the objects from the R session + +workDir <- "C:/path/to/your/working/directory/" # working directory for the R session + +projectName <- "projectName" # name of the project +author <- "Your name" # author of the statistical analysis/report + +targetFile <- "target.txt" # path to the design/target file +rawDir <- "raw" # path to the directory containing raw counts files +featuresToRemove <- c("alignment_not_unique", # names of the features to be removed + "ambiguous", "no_feature", # (specific HTSeq-count information and rRNA for example) + "not_aligned", "too_low_aQual") + +varInt <- "group" # factor of interest +condRef <- "WT" # reference biological condition +batch <- NULL # blocking factor: NULL (default) or "batch" for example + +alpha <- 0.05 # threshold of statistical significance +pAdjustMethod <- "BH" # p-value adjustment method: "BH" (default) or "BY" + +cpmCutoff <- 1 # counts-per-million cut-off to filter low counts +gene.selection <- "pairwise" # selection of the features in MDSPlot + +colors <- c("dodgerblue","firebrick1", # vector of colors of each biological condition on the plots + "MediumVioletRed","SpringGreen") + +################################################################################ +### running script ### +################################################################################ +setwd(workDir) +library(SARTools) + +# checking parameters +checkParameters.edgeR(projectName=projectName,author=author,targetFile=targetFile, + rawDir=rawDir,featuresToRemove=featuresToRemove,varInt=varInt, + condRef=condRef,batch=batch,alpha=alpha,pAdjustMethod=pAdjustMethod, + cpmCutoff=cpmCutoff,gene.selection=gene.selection,colors=colors) + +# loading target file +target <- loadTargetFile(targetFile=targetFile, varInt=varInt, condRef=condRef, batch=batch) + +# loading counts +counts <- loadCountData(target=target, rawDir=rawDir, featuresToRemove=featuresToRemove) + +# description plots +majSequences <- descriptionPlots(counts=counts, group=target[,varInt], col=colors) + +# edgeR analysis +out.edgeR <- run.edgeR(counts=counts, target=target, varInt=varInt, condRef=condRef, + batch=batch, cpmCutoff=cpmCutoff, pAdjustMethod=pAdjustMethod) + +# MDS + clustering +exploreCounts(object=out.edgeR$dge, group=target[,varInt], gene.selection=gene.selection, col=colors) + +# summary of the analysis (boxplots, dispersions, export table, nDiffTotal, histograms, MA plot) +summaryResults <- summarizeResults.edgeR(out.edgeR, group=target[,varInt], counts=counts, alpha=alpha, col=colors) + +# save image of the R session +save.image(file=paste0(projectName, ".RData")) + +# generating HTML report +writeReport.edgeR(target=target, counts=counts, out.edgeR=out.edgeR, summaryResults=summaryResults, + majSequences=majSequences, workDir=workDir, projectName=projectName, author=author, + targetFile=targetFile, rawDir=rawDir, featuresToRemove=featuresToRemove, varInt=varInt, + condRef=condRef, batch=batch, alpha=alpha, pAdjustMethod=pAdjustMethod, colors=colors, + gene.selection=gene.selection) diff --git a/man/BCVPlot.Rd b/man/BCVPlot.Rd new file mode 100644 index 0000000..6b98726 --- /dev/null +++ b/man/BCVPlot.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{BCVPlot} +\alias{BCVPlot} +\title{BCV plot (for edgeR dispersions)} +\usage{ +BCVPlot(dge) +} +\arguments{ +\item{dge}{a \code{DGEList} object} +} +\value{ +A file named BCV.png in the figures directory with a BCV plot produced by the \code{plotBCV()} function of the edgeR package +} +\description{ +Biological Coefficient of Variation plot (for edgeR objects) +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/MAPlot.Rd b/man/MAPlot.Rd new file mode 100644 index 0000000..71e8c88 --- /dev/null +++ b/man/MAPlot.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{MAPlot} +\alias{MAPlot} +\title{MA-plots} +\usage{ +MAPlot(complete, alpha = 0.05) +} +\arguments{ +\item{complete}{A \code{list} of \code{data.frame} containing features results (from \code{exportResults.DESeq2()} or \code{exportResults.edgeR()})} + +\item{alpha}{cut-off to apply on each adjusted p-value} +} +\value{ +A file named MAPlot.png in the figures directory containing one MA-plot per comparison +} +\description{ +MA-plot for each comparison: log2(FC) vs mean of normalized counts with one dot per feature (red dot for a differentially expressed feature, black dot otherwise) +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/MDSPlot.Rd b/man/MDSPlot.Rd new file mode 100644 index 0000000..69debe8 --- /dev/null +++ b/man/MDSPlot.Rd @@ -0,0 +1,29 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{MDSPlot} +\alias{MDSPlot} +\title{MDS plot (for edgeR objects)} +\usage{ +MDSPlot(dge, group, n = 500, gene.selection = c("pairwise", "common"), + col = c("lightblue", "orange", "MediumVioletRed", "SpringGreen")) +} +\arguments{ +\item{dge}{a \code{DGEList} object} + +\item{group}{vector of the condition from which each sample belongs} + +\item{n}{number of features to keep among the most variant} + +\item{gene.selection}{\code{"pairwise"} to choose the top features separately for each pairwise comparison between the samples or \code{"common"} to select the same features for all comparisons. Only used when \code{method="logFC"}} + +\item{col}{colors to use (one per biological condition)} +} +\value{ +A file named MDS.png in the figures directory +} +\description{ +Multi-Dimensional Scaling plot of samples based on the 500 most variant features (for edgeR analyses) +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/PCAPlot.Rd b/man/PCAPlot.Rd new file mode 100644 index 0000000..2ccfd5f --- /dev/null +++ b/man/PCAPlot.Rd @@ -0,0 +1,27 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{PCAPlot} +\alias{PCAPlot} +\title{PCA of samples (if use of DESeq2)} +\usage{ +PCAPlot(counts.trans, group, n = 500, col = c("lightblue", "orange", + "MediumVioletRed", "SpringGreen")) +} +\arguments{ +\item{counts.trans}{a matrix a transformed counts (VST- or rlog-counts)} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{n}{number of features to keep among the most variant} + +\item{col}{colors to use (one per biological condition)} +} +\value{ +A file named PCA.png in the figures directory with a pairwise plot of the three first principal components +} +\description{ +Principal Component Analysis of samples based on the 500 most variant features on VST- or rlog-counts (if use of DESeq2) +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/SARTools-package.Rd b/man/SARTools-package.Rd new file mode 100644 index 0000000..e47feb2 --- /dev/null +++ b/man/SARTools-package.Rd @@ -0,0 +1,12 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\docType{package} +\name{SARTools-package} +\alias{SARTools-package} +\title{Statistical Analysis of RNA-Seq Tools} +\description{ +SARTools provides R tools and an environment for the statistical analysis of RNA-Seq projects: load and clean data, produce figures, perform statistical analysis/testing with DESeq2 or edgeR, export results and create final report +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/SERE.Rd b/man/SERE.Rd new file mode 100644 index 0000000..f42323b --- /dev/null +++ b/man/SERE.Rd @@ -0,0 +1,23 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{SERE} +\alias{SERE} +\title{Pairwise SERE for two samples} +\usage{ +SERE(observed) +} +\arguments{ +\item{observed}{\code{matrix} with two columns containing observed counts of two samples} +} +\value{ +The SERE coefficient for the two samples +} +\description{ +Compute the SERE coefficient for two samples +} +\author{ +See paper published +} +\references{ +Schulze, Kanwar, Golzenleuchter et al, SERE: Single-parameter quality control and sample comparison for RNA-Seq, BMC Genomics, 2012 +} + diff --git a/man/barplotNull.Rd b/man/barplotNull.Rd new file mode 100644 index 0000000..06157d8 --- /dev/null +++ b/man/barplotNull.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{barplotNull} +\alias{barplotNull} +\title{Percentage of null counts per sample} +\usage{ +barplotNull(counts, group, col = c("lightblue", "orange", "MediumVioletRed", + "SpringGreen")) +} +\arguments{ +\item{counts}{\code{matrix} of counts} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{col}{colors of the bars (one color per biological condition)} +} +\value{ +A file named barplotNull.png in the figures directory +} +\description{ +Bar plot of the percentage of null counts per sample +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/barplotTotal.Rd b/man/barplotTotal.Rd new file mode 100644 index 0000000..e3ad7c5 --- /dev/null +++ b/man/barplotTotal.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{barplotTotal} +\alias{barplotTotal} +\title{Total number of reads per sample} +\usage{ +barplotTotal(counts, group, col = c("lightblue", "orange", "MediumVioletRed", + "SpringGreen")) +} +\arguments{ +\item{counts}{\code{matrix} of counts} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{col}{colors of the bars (one color per biological condition)} +} +\value{ +A file named barplotTotal.png in the figures directory +} +\description{ +Bar plot of the total number of reads per sample +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/checkParameters.DESeq2.Rd b/man/checkParameters.DESeq2.Rd new file mode 100644 index 0000000..cf7acc9 --- /dev/null +++ b/man/checkParameters.DESeq2.Rd @@ -0,0 +1,52 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{checkParameters.DESeq2} +\alias{checkParameters.DESeq2} +\title{Check the parameters (when using DESeq2)} +\usage{ +checkParameters.DESeq2(projectName, author, targetFile, rawDir, + featuresToRemove, varInt, condRef, batch, fitType, cooksCutoff, + independentFiltering, alpha, pAdjustMethod, typeTrans, locfunc, colors) +} +\arguments{ +\item{projectName}{name of the project} + +\item{author}{author of the statistical analysis/report} + +\item{targetFile}{path to the design/target file} + +\item{rawDir}{path to the directory containing raw counts files} + +\item{featuresToRemove}{names of the features to be removed} + +\item{varInt}{factor of interest} + +\item{condRef}{reference biological condition} + +\item{batch}{blocking factor in the design} + +\item{fitType}{mean-variance relationship: "parametric" (default) or "local"} + +\item{cooksCutoff}{outliers detection threshold (NULL to let DESeq2 choosing it)} + +\item{independentFiltering}{TRUE/FALSE to perform independent filtering} + +\item{alpha}{threshold of statistical significance} + +\item{pAdjustMethod}{p-value adjustment method: "BH" (default) or "BY" for example} + +\item{typeTrans}{transformation for PCA/clustering: "VST" ou "rlog"} + +\item{locfunc}{"median" (default) or "shorth" to estimate the size factors} + +\item{colors}{vector of colors of each biological condition on the plots} +} +\value{ +A boolean indicating if there is a problem in the parameters +} +\description{ +Check the format and the validity of the parameters which will be used for the analysis with DESeq2. +} +\author{ +Hugo Varet +} + diff --git a/man/checkParameters.edgeR.Rd b/man/checkParameters.edgeR.Rd new file mode 100644 index 0000000..c992abb --- /dev/null +++ b/man/checkParameters.edgeR.Rd @@ -0,0 +1,46 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{checkParameters.edgeR} +\alias{checkParameters.edgeR} +\title{Check the parameters (when using edgeR)} +\usage{ +checkParameters.edgeR(projectName, author, targetFile, rawDir, featuresToRemove, + varInt, condRef, batch, alpha, pAdjustMethod, cpmCutoff, gene.selection, + colors) +} +\arguments{ +\item{projectName}{name of the project} + +\item{author}{author of the statistical analysis/report} + +\item{targetFile}{path to the design/target file} + +\item{rawDir}{path to the directory containing raw counts files} + +\item{featuresToRemove}{names of the features to be removed} + +\item{varInt}{factor of interest} + +\item{condRef}{reference biological condition} + +\item{batch}{blocking factor in the design} + +\item{alpha}{threshold of statistical significance} + +\item{pAdjustMethod}{p-value adjustment method: "BH" (default) or "BY" for example} + +\item{cpmCutoff}{counts-per-million cut-off to filter low counts} + +\item{gene.selection}{selection of the features in MDSPlot} + +\item{colors}{vector of colors of each biological condition on the plots} +} +\value{ +A boolean indicating if there is a problem in the parameters +} +\description{ +Check the format and the validity of the parameters which will be used for the analysis with edgeR. +} +\author{ +Hugo Varet +} + diff --git a/man/clusterPlot.Rd b/man/clusterPlot.Rd new file mode 100644 index 0000000..99cbc03 --- /dev/null +++ b/man/clusterPlot.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{clusterPlot} +\alias{clusterPlot} +\title{Clustering of the samples} +\usage{ +clusterPlot(counts.trans, group) +} +\arguments{ +\item{counts.trans}{a matrix a transformed counts (VST- or rlog-counts if use of DESeq2 or cpm-counts if use of edgeR)} + +\item{group}{factor vector of the condition from which each sample belongs} +} +\value{ +A file named cluster.png in the figures directory with the dendrogram of the clustering +} +\description{ +Clustering of the samples based on VST- or rlog-counts (if use of DESeq2) or cpm-counts (if use of edgeR) +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/countsBoxplots.Rd b/man/countsBoxplots.Rd new file mode 100644 index 0000000..525ceb7 --- /dev/null +++ b/man/countsBoxplots.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{countsBoxplots} +\alias{countsBoxplots} +\title{Box-plots of (normalized) counts distribution per sample} +\usage{ +countsBoxplots(object, group, col = c("lightblue", "orange", + "MediumVioletRed", "SpringGreen")) +} +\arguments{ +\item{object}{a \code{DESeqDataSet} object from DESeq2 or a \code{DGEList} object from edgeR} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{col}{colors of the boxplots (one per biological condition)} +} +\value{ +A file named countsBoxplots.png in the figures directory containing boxplots of the raw and normalized counts +} +\description{ +Box-plots of raw and normalized counts distributions per sample to assess the effect of the normalization +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/densityPlot.Rd b/man/densityPlot.Rd new file mode 100644 index 0000000..ee5afcd --- /dev/null +++ b/man/densityPlot.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{densityPlot} +\alias{densityPlot} +\title{Density plot of all samples} +\usage{ +densityPlot(counts, group, col = c("lightblue", "orange", "MediumVioletRed", + "SpringGreen")) +} +\arguments{ +\item{counts}{\code{matrix} of counts} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{col}{colors of the curves (one per biological condition)} +} +\value{ +A file named densplot.png in the figures directory +} +\description{ +Estimation the counts density for each sample +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/descriptionPlots.Rd b/man/descriptionPlots.Rd new file mode 100644 index 0000000..aa288fa --- /dev/null +++ b/man/descriptionPlots.Rd @@ -0,0 +1,25 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{descriptionPlots} +\alias{descriptionPlots} +\title{Description plots of the counts} +\usage{ +descriptionPlots(counts, group, col = c("lightblue", "orange", + "MediumVioletRed", "SpringGreen")) +} +\arguments{ +\item{counts}{\code{matrix} of counts} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{col}{colors for the plots (one per biological condition)} +} +\value{ +PNG files in the "figures" directory and the matrix of the most expressed sequences +} +\description{ +Description plots of the counts according to the biological condition +} +\author{ +Hugo Varet +} + diff --git a/man/diagSizeFactorsPlots.Rd b/man/diagSizeFactorsPlots.Rd new file mode 100644 index 0000000..e5b0e7d --- /dev/null +++ b/man/diagSizeFactorsPlots.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{diagSizeFactorsPlots} +\alias{diagSizeFactorsPlots} +\title{Assess the estimations of the size factors} +\usage{ +diagSizeFactorsPlots(dds) +} +\arguments{ +\item{dds}{a \code{DESeqDataSet} object} +} +\value{ +Two files in the figures directory: diagSizeFactorsHist.png containing one histogram per sample and diagSizeFactorsTC.png for a plot of the size factors vs the total number of reads +} +\description{ +Plots to assess the estimations of the size factors +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/dispersionsPlot.Rd b/man/dispersionsPlot.Rd new file mode 100644 index 0000000..1d48da9 --- /dev/null +++ b/man/dispersionsPlot.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{dispersionsPlot} +\alias{dispersionsPlot} +\title{Plots about DESeq2 dispersions} +\usage{ +dispersionsPlot(dds) +} +\arguments{ +\item{dds}{a \code{DESeqDataSet} object} +} +\value{ +A file named dispersionsPlot.png in the figures directory containing the plot of the mean-dispersion relationship and a diagnostic of log normality of the dispersions +} +\description{ +A plot of the mean-dispersion relationship and a diagnostic of log normality of the dispersions (if use of DESeq2) +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/exploreCounts.Rd b/man/exploreCounts.Rd new file mode 100644 index 0000000..0c5af5f --- /dev/null +++ b/man/exploreCounts.Rd @@ -0,0 +1,29 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{exploreCounts} +\alias{exploreCounts} +\title{Explore counts structure} +\usage{ +exploreCounts(object, group, typeTrans = "VST", gene.selection = "pairwise", + col = c("lightblue", "orange", "MediumVioletRed", "SpringGreen")) +} +\arguments{ +\item{object}{a \code{DESeqDataSet} from DESeq2 or \code{DGEList} object from edgeR} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{typeTrans}{transformation method for PCA/clustering with DESeq2: \code{"VST"} or \code{"rlog"}} + +\item{gene.selection}{selection of the features in MDSPlot (\code{"pairwise"} by default)} + +\item{col}{colors used for the PCA/MDS (one per biological condition)} +} +\value{ +A list containing the dds object and the results object +} +\description{ +Explore counts structure: PCA (DESeq2) or MDS (edgeR) and clustering +} +\author{ +Hugo Varet +} + diff --git a/man/exportResults.DESeq2.Rd b/man/exportResults.DESeq2.Rd new file mode 100644 index 0000000..b2ce4c4 --- /dev/null +++ b/man/exportResults.DESeq2.Rd @@ -0,0 +1,27 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{exportResults.DESeq2} +\alias{exportResults.DESeq2} +\title{Export results for DESeq2 analyses} +\usage{ +exportResults.DESeq2(out.DESeq2, group, cooksCutoff = NULL, alpha = 0.05) +} +\arguments{ +\item{out.DESeq2}{the result of \code{run.DESeq2()}} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{cooksCutoff}{Cook's distance threshold for detecting outliers (\code{Inf} +to disable the detection, \code{NULL} to keep DESeq2 threshold)} + +\item{alpha}{threshold to apply to adjusted p-values} +} +\value{ +A list of \code{data.frame} containing counts, pvalues, FDR, log2FC... +} +\description{ +Export counts and DESeq2 results +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/exportResults.edgeR.Rd b/man/exportResults.edgeR.Rd new file mode 100644 index 0000000..71d4e0a --- /dev/null +++ b/man/exportResults.edgeR.Rd @@ -0,0 +1,29 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{exportResults.edgeR} +\alias{exportResults.edgeR} +\title{Export results for edgeR analyses} +\usage{ +exportResults.edgeR(out.edgeR, group, counts, alpha = 0.05) +} +\arguments{ +\item{out.edgeR}{the result of \code{run.edgeR()}} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{counts}{non-filtered counts (used to keep them in the final table)} + +\item{alpha}{threshold to apply to adjusted p-values} +} +\value{ +A list of \code{data.frame} containing counts, pvalues, FDR, log2FC... +} +\description{ +Export counts and edgeR results +} +\details{ +\code{counts} are used as input just in order to export features with null counts too. +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/loadCountData.Rd b/man/loadCountData.Rd new file mode 100644 index 0000000..ffebee9 --- /dev/null +++ b/man/loadCountData.Rd @@ -0,0 +1,33 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{loadCountData} +\alias{loadCountData} +\title{Load count files} +\usage{ +loadCountData(target, rawDir = "raw", header = FALSE, skip = 0, + featuresToRemove = c("alignment_not_unique", "ambiguous", "no_feature", + "not_aligned", "too_low_aQual")) +} +\arguments{ +\item{target}{target \code{data.frame} of the project returned by \code{loadTargetFile()}} + +\item{rawDir}{path to the directory containing the count files} + +\item{header}{a logical value indicating whether the file contains the names of the variables as its first line} + +\item{skip}{number of lines of the data file to skip before beginning to read data} + +\item{featuresToRemove}{vector of feature Ids (or character string common to feature Ids) to remove from the counts} +} +\value{ +The \code{matrix} of raw counts with row names corresponding to the feature Ids and column names to the sample names as provided in the first column of the target. +} +\description{ +Load one count file per sample thanks to the file names in the target file. +} +\details{ +If \code{featuresToRemove} is equal to \code{"rRNA"}, all the features containing the character string "rRNA" will be removed from the counts. +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/loadTargetFile.Rd b/man/loadTargetFile.Rd new file mode 100644 index 0000000..4d54530 --- /dev/null +++ b/man/loadTargetFile.Rd @@ -0,0 +1,29 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{loadTargetFile} +\alias{loadTargetFile} +\title{Load target file} +\usage{ +loadTargetFile(targetFile, varInt, condRef, batch) +} +\arguments{ +\item{targetFile}{path to the target file} + +\item{varInt}{variable on which sorting the target} + +\item{condRef}{reference condition of \code{varInt}} + +\item{batch}{batch effect to take into account} +} +\value{ +A \code{data.frame} containing the informations about the samples (name, file containing the counts and biological condition) +} +\description{ +Load the target file containing sample information +} +\details{ +The \code{batch} parameter is used only to check if it is available in the target file before running the suite of the script. +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/majSequences.Rd b/man/majSequences.Rd new file mode 100644 index 0000000..b443973 --- /dev/null +++ b/man/majSequences.Rd @@ -0,0 +1,27 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{majSequences} +\alias{majSequences} +\title{Most expressed sequences per sample} +\usage{ +majSequences(counts, n = 3, group, col = c("lightblue", "orange", + "MediumVioletRed", "SpringGreen")) +} +\arguments{ +\item{counts}{\code{matrix} of counts} + +\item{n}{number of most expressed sequences to return} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{col}{colors of the bars (one per biological condition)} +} +\value{ +A \code{matrix} with the percentage of reads of the three most expressed sequences and a file named majSeq.png in the figures directory +} +\description{ +Proportion of reads associated with the three most expressed sequences per sample +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/nDiffTotal.Rd b/man/nDiffTotal.Rd new file mode 100644 index 0000000..0917ebf --- /dev/null +++ b/man/nDiffTotal.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{nDiffTotal} +\alias{nDiffTotal} +\title{Number of differentially expressed features per comparison} +\usage{ +nDiffTotal(complete, alpha = 0.05) +} +\arguments{ +\item{complete}{list of \code{data.frame} containing features results (from \code{exportResults.DESeq2()} or \code{exportResults.edgeR()})} + +\item{alpha}{threshold to apply to the FDR} +} +\value{ +A matrix with the number of up, down and total of features per comparison +} +\description{ +Number of down- and up-regulated features per comparison +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/pairwiseScatterPlots.Rd b/man/pairwiseScatterPlots.Rd new file mode 100644 index 0000000..290194d --- /dev/null +++ b/man/pairwiseScatterPlots.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{pairwiseScatterPlots} +\alias{pairwiseScatterPlots} +\title{Scatter plots for pairwise comparaisons of log counts} +\usage{ +pairwiseScatterPlots(counts, group) +} +\arguments{ +\item{counts}{\code{matrix} of raw counts} + +\item{group}{factor vector of the condition from which each sample belongs} +} +\value{ +A file named pairwiseScatter.png in the figures directory containing a pairwise scatter plot with the SERE statistics in the lower panel +} +\description{ +Scatter plots for pairwise comparaisons of log counts +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/rawpHist.Rd b/man/rawpHist.Rd new file mode 100644 index 0000000..a17b16b --- /dev/null +++ b/man/rawpHist.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{rawpHist} +\alias{rawpHist} +\title{Histograms of raw p-values} +\usage{ +rawpHist(complete) +} +\arguments{ +\item{complete}{a list of \code{data.frames} created by \code{summaryResults.DESeq2()} or \code{summaryResults.edgeR()}} +} +\value{ +A file named rawpHist.png in the figures directory with one histogram of raw p-values per comparison +} +\description{ +Histogram of raw p-values for each comparison +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/removeNull.Rd b/man/removeNull.Rd new file mode 100644 index 0000000..8b506e0 --- /dev/null +++ b/man/removeNull.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{removeNull} +\alias{removeNull} +\title{Remove features with null counts in all samples} +\usage{ +removeNull(counts) +} +\arguments{ +\item{counts}{\code{matrix} of raw counts} +} +\value{ +The \code{matrix} of counts without features with only null counts +} +\description{ +Remove features with null counts in all samples. These features do not contain any information and will not be used for the statistical analysis. +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/run.DESeq2.Rd b/man/run.DESeq2.Rd new file mode 100644 index 0000000..c888c4d --- /dev/null +++ b/man/run.DESeq2.Rd @@ -0,0 +1,42 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{run.DESeq2} +\alias{run.DESeq2} +\title{Wrapper to run DESeq2} +\usage{ +run.DESeq2(counts, target, varInt, batch = NULL, locfunc = "median", + fitType = "parametric", pAdjustMethod = "BH", cooksCutoff = NULL, + independentFiltering = TRUE, alpha = 0.05, ...) +} +\arguments{ +\item{counts}{\code{matrix} of raw counts} + +\item{target}{target \code{data.frame} of the project} + +\item{varInt}{name of the factor of interest (biological condition)} + +\item{batch}{batch effect to take into account (\code{NULL} by default)} + +\item{locfunc}{\code{"median"} (default) or \code{"shorth"} to estimate the size factors} + +\item{fitType}{mean-variance relationship: "parametric" (default) or "local"} + +\item{pAdjustMethod}{p-value adjustment method: \code{"BH"} (default) or \code{"BY"} for instance} + +\item{cooksCutoff}{outliers detection threshold (\code{NULL} to let DESeq2 choosing it)} + +\item{independentFiltering}{\code{TRUE} or \code{FALSE} to perform the independent filtering or not} + +\item{alpha}{significance threshold to apply to the adjusted p-values} + +\item{...}{optional arguments to be passed to \code{nbinomWaldTest()}} +} +\value{ +A list containing the \code{dds} object (\code{DESeqDataSet} class), the \code{results} objects (\code{DESeqResults} class) and the vector of size factors +} +\description{ +Wrapper to run DESeq2: create the \code{DESeqDataSet}, normalize data, estimate dispersions, statistical testing... +} +\author{ +Hugo Varet +} + diff --git a/man/run.edgeR.Rd b/man/run.edgeR.Rd new file mode 100644 index 0000000..fd869e1 --- /dev/null +++ b/man/run.edgeR.Rd @@ -0,0 +1,35 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{run.edgeR} +\alias{run.edgeR} +\title{Wrapper to run edgeR} +\usage{ +run.edgeR(counts, target, varInt, condRef, batch = NULL, cpmCutoff = 1, + pAdjustMethod = "BH", ...) +} +\arguments{ +\item{counts}{\code{matrix} of counts} + +\item{target}{target \code{data.frame} of the project} + +\item{varInt}{name of the factor of interest (biological condition)} + +\item{condRef}{reference biological condition} + +\item{batch}{batch effect to take into account (\code{NULL} by default)} + +\item{cpmCutoff}{counts-per-million cut-off to filter low counts} + +\item{pAdjustMethod}{p-value adjustment method: \code{"BH"} (default) or \code{"BY"}} + +\item{...}{optional arguments to be passed to \code{glmFit()}} +} +\value{ +A list containing the \code{dge} object and the \code{results} object +} +\description{ +Wrapper to run edgeR: create the \code{dge} object, normalize data, estimate dispersions, statistical testing... +} +\author{ +Hugo Varet +} + diff --git a/man/summarizeResults.DESeq2.Rd b/man/summarizeResults.DESeq2.Rd new file mode 100644 index 0000000..c2ea559 --- /dev/null +++ b/man/summarizeResults.DESeq2.Rd @@ -0,0 +1,33 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{summarizeResults.DESeq2} +\alias{summarizeResults.DESeq2} +\title{Summarize DESeq2 analysis} +\usage{ +summarizeResults.DESeq2(out.DESeq2, group, independentFiltering = TRUE, + cooksCutoff = NULL, alpha = 0.05, col = c("lightblue", "orange", + "MediumVioletRed", "SpringGreen")) +} +\arguments{ +\item{out.DESeq2}{the result of \code{run.DESeq2()}} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{independentFiltering}{\code{TRUE} or \code{FALSE} to perform the independent filtering or not} + +\item{cooksCutoff}{Cook's distance threshold for detecting outliers (\code{Inf} +to disable the detection, \code{NULL} to keep DESeq2 threshold)} + +\item{alpha}{significance threshold to apply to the adjusted p-values} + +\item{col}{colors for the plots} +} +\value{ +A list containing: (i) a list of \code{data.frames} from \code{exportResults.DESeq2()}, (ii) the table summarizing the independent filtering procedure and (iii) a table summarizing the number of differentially expressed features +} +\description{ +Summarize DESeq2 analysis: diagnotic plots, dispersions plot, summary of the independent filtering, export results... +} +\author{ +Hugo Varet +} + diff --git a/man/summarizeResults.edgeR.Rd b/man/summarizeResults.edgeR.Rd new file mode 100644 index 0000000..976ad5e --- /dev/null +++ b/man/summarizeResults.edgeR.Rd @@ -0,0 +1,29 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{summarizeResults.edgeR} +\alias{summarizeResults.edgeR} +\title{Summarize edgeR analysis} +\usage{ +summarizeResults.edgeR(out.edgeR, group, counts, alpha = 0.05, + col = c("lightblue", "orange", "MediumVioletRed", "SpringGreen")) +} +\arguments{ +\item{out.edgeR}{the result of \code{run.edgeR()}} + +\item{group}{factor vector of the condition from which each sample belongs} + +\item{counts}{matrix of raw counts} + +\item{alpha}{significance threshold to apply to the adjusted p-values} + +\item{col}{colors for the plots} +} +\value{ +A list containing: (i) a list of \code{data.frames} from \code{exportResults.edgeR()} and (ii) a table summarizing the number of differentially expressed features +} +\description{ +Summarize edgeR analysis: diagnotic plots, dispersions plot, summary of the independent filtering, export results... +} +\author{ +Hugo Varet +} + diff --git a/man/tabIndepFiltering.Rd b/man/tabIndepFiltering.Rd new file mode 100644 index 0000000..ce6d80e --- /dev/null +++ b/man/tabIndepFiltering.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{tabIndepFiltering} +\alias{tabIndepFiltering} +\title{Table of the number of features discarded by the independent filtering (if use of DESeq2)} +\usage{ +tabIndepFiltering(results) +} +\arguments{ +\item{results}{list of results of \code{results(dds,...)} with chosen parameters} +} +\value{ +A \code{matrix} with the threshold and the number of features discarded for each comparison +} +\description{ +Compute the number of features discarded by the independent filtering for each comparison (if use of DESeq2) +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/tabSERE.Rd b/man/tabSERE.Rd new file mode 100644 index 0000000..5d49feb --- /dev/null +++ b/man/tabSERE.Rd @@ -0,0 +1,20 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{tabSERE} +\alias{tabSERE} +\title{SERE statistics for several samples} +\usage{ +tabSERE(counts) +} +\arguments{ +\item{counts}{\code{matrix} of raw counts} +} +\value{ +The \code{matrix} of SERE values +} +\description{ +Compute the SERE statistic for each pair of samples +} +\author{ +Marie-Agnes Dillies and Hugo Varet +} + diff --git a/man/writeReport.DESeq2.Rd b/man/writeReport.DESeq2.Rd new file mode 100644 index 0000000..4aa9532 --- /dev/null +++ b/man/writeReport.DESeq2.Rd @@ -0,0 +1,65 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{writeReport.DESeq2} +\alias{writeReport.DESeq2} +\title{Write HTML report for DESeq2 analyses} +\usage{ +writeReport.DESeq2(target, counts, out.DESeq2, summaryResults, majSequences, + workDir, projectName, author, targetFile, rawDir, featuresToRemove, varInt, + condRef, batch, fitType, cooksCutoff, independentFiltering, alpha, + pAdjustMethod, typeTrans, locfunc, colors) +} +\arguments{ +\item{target}{target \code{data.frame} of the project returned by \code{loadTargetFile()}} + +\item{counts}{\code{matrix} of counts returned by \code{loadCountData()}} + +\item{out.DESeq2}{the result of \code{run.DESeq2()}} + +\item{summaryResults}{the result of \code{summarizeResults.DESeq2()}} + +\item{majSequences}{the result of \code{descriptionPlots()}} + +\item{workDir}{working directory} + +\item{projectName}{name of the project} + +\item{author}{name of the author of the analysis} + +\item{targetFile}{path to the target file} + +\item{rawDir}{path to the directory containing the counts files} + +\item{featuresToRemove}{vector of features to remove from the counts matrix} + +\item{varInt}{factor of interest (biological condition)} + +\item{condRef}{reference condition for the factor of interest} + +\item{batch}{variable to take as a batch effect} + +\item{fitType}{mean-variance relationship: \code{"parametric"} (default) or \code{"local"}} + +\item{cooksCutoff}{outliers detection threshold} + +\item{independentFiltering}{\code{TRUE} or \code{FALSE} to perform the independent filtering or not} + +\item{alpha}{threshold of statistical significance} + +\item{pAdjustMethod}{p-value adjustment method: \code{"BH"} or \code{"BY"} for instance} + +\item{typeTrans}{transformation for PCA/clustering: \code{"VST"} or \code{"rlog"}} + +\item{locfunc}{\code{"median"} (default) or \code{"shorth"} to estimate the size factors} + +\item{colors}{vector of colors of each biological condition on the plots} +} +\description{ +Write HTML report from graphs and tables created during the analysis with DESeq2 +} +\details{ +This function generates the HTML report for a statistical analysis with DESeq2. It uses the tables and graphs created during the workflow as well as the parameters defined at the beginning of the script. +} +\author{ +Hugo Varet +} + diff --git a/man/writeReport.edgeR.Rd b/man/writeReport.edgeR.Rd new file mode 100644 index 0000000..0f2ee5a --- /dev/null +++ b/man/writeReport.edgeR.Rd @@ -0,0 +1,56 @@ +% Generated by roxygen2 (4.0.2): do not edit by hand +\name{writeReport.edgeR} +\alias{writeReport.edgeR} +\title{Write HTML report for edgeR analyses} +\usage{ +writeReport.edgeR(target, counts, out.edgeR, summaryResults, majSequences, + workDir, projectName, author, targetFile, rawDir, featuresToRemove, varInt, + condRef, batch, alpha, pAdjustMethod, colors, gene.selection) +} +\arguments{ +\item{target}{target \code{data.frame} of the project returned by \code{loadTargetFile()}} + +\item{counts}{\code{matrix} of counts returned by \code{loadCountData()}} + +\item{out.edgeR}{the result of \code{run.edgeR()}} + +\item{summaryResults}{the result of \code{summarizeResults.DESeq2()}} + +\item{majSequences}{the result of \code{descriptionPlots()}} + +\item{workDir}{path to the working directory} + +\item{projectName}{name of the project} + +\item{author}{name of the author of the analysis} + +\item{targetFile}{path to the target file} + +\item{rawDir}{path to the directory containing the counts files} + +\item{featuresToRemove}{vector of features to remove from the counts matrix} + +\item{varInt}{factor of interest (biological condition)} + +\item{condRef}{reference condition for the factor of interest} + +\item{batch}{variable to take as a batch effect} + +\item{alpha}{threshold of statistical significance} + +\item{pAdjustMethod}{p-value adjustment method: \code{"BH"} (default) or \code{"BY"}} + +\item{colors}{vector of colors of each biological condition on the plots} + +\item{gene.selection}{selection of the features in \code{MDSPlot()} (\code{"pairwise"} by default)} +} +\description{ +Write HTML report from graphs and tables created during the analysis with edgeR +} +\details{ +This function generates the HTML report for a statistical analysis with edgeR. It uses the tables and graphs created during the workflow as well as the parameters defined at the beginning of the script. +} +\author{ +Hugo Varet +} + diff --git a/template_script_DESeq2.r b/template_script_DESeq2.r new file mode 100644 index 0000000..ef31d26 --- /dev/null +++ b/template_script_DESeq2.r @@ -0,0 +1,85 @@ +################################################################################ +### R script to compare several conditions with the SARTools and DESeq2 packages +### Hugo Varet +### December 10th, 2014 +### designed to be executed with SARTools 1.0.0 +################################################################################ + +################################################################################ +### parameters: to be modified by the user ### +################################################################################ +rm(list=ls()) # remove all the objects from the R session + +workDir <- "C:/path/to/your/working/directory/" # working directory for the R session + +projectName <- "projectName" # name of the project +author <- "Your name" # author of the statistical analysis/report + +targetFile <- "target.txt" # path to the design/target file +rawDir <- "raw" # path to the directory containing raw counts files +featuresToRemove <- c("alignment_not_unique", # names of the features to be removed + "ambiguous", "no_feature", # (specific HTSeq-count information and rRNA for example) + "not_aligned", "too_low_aQual") + +varInt <- "group" # factor of interest +condRef <- "WT" # reference biological condition +batch <- NULL # blocking factor: NULL (default) or "batch" for example + +fitType <- "parametric" # mean-variance relationship: "parametric" (default) or "local" +cooksCutoff <- NULL # outliers detection threshold (NULL to let DESeq2 choosing it) +independentFiltering <- TRUE # TRUE/FALSE to perform independent filtering (default is TRUE) +alpha <- 0.05 # threshold of statistical significance +pAdjustMethod <- "BH" # p-value adjustment method: "BH" (default) or "BY" + +typeTrans <- "VST" # transformation for PCA/clustering: "VST" or "rlog" +locfunc <- "median" # "median" (default) or "shorth" to estimate the size factors + +colors <- c("dodgerblue","firebrick1", # vector of colors of each biological condition on the plots + "MediumVioletRed","SpringGreen") + +################################################################################ +### running script ### +################################################################################ +setwd(workDir) +library(SARTools) +if (locfunc=="shorth") library(genefilter) + +# checking parameters +checkParameters.DESeq2(projectName=projectName,author=author,targetFile=targetFile, + rawDir=rawDir,featuresToRemove=featuresToRemove,varInt=varInt, + condRef=condRef,batch=batch,fitType=fitType,cooksCutoff=cooksCutoff, + independentFiltering=independentFiltering,alpha=alpha,pAdjustMethod=pAdjustMethod, + typeTrans=typeTrans,locfunc=locfunc,colors=colors) + +# loading target file +target <- loadTargetFile(targetFile=targetFile, varInt=varInt, condRef=condRef, batch=batch) + +# loading counts +counts <- loadCountData(target=target, rawDir=rawDir, featuresToRemove=featuresToRemove) + +# description plots +majSequences <- descriptionPlots(counts=counts, group=target[,varInt], col=colors) + +# analysis with DESeq2 +out.DESeq2 <- run.DESeq2(counts=counts, target=target, varInt=varInt, batch=batch, + locfunc=locfunc, fitType=fitType, pAdjustMethod=pAdjustMethod, + cooksCutoff=cooksCutoff, independentFiltering=independentFiltering, alpha=alpha) + +# PCA + clustering +exploreCounts(object=out.DESeq2$dds, group=target[,varInt], typeTrans=typeTrans, col=colors) + +# summary of the analysis (boxplots, dispersions, diag size factors, export table, nDiffTotal, histograms, MA plot) +summaryResults <- summarizeResults.DESeq2(out.DESeq2, group=target[,varInt], col=colors, + independentFiltering=independentFiltering, + cooksCutoff=cooksCutoff, alpha=alpha) + +# save image of the R session +save.image(file=paste0(projectName, ".RData")) + +# generating HTML report +writeReport.DESeq2(target=target, counts=counts, out.DESeq2=out.DESeq2, summaryResults=summaryResults, + majSequences=majSequences, workDir=workDir, projectName=projectName, author=author, + targetFile=targetFile, rawDir=rawDir, featuresToRemove=featuresToRemove, varInt=varInt, + condRef=condRef, batch=batch, fitType=fitType, cooksCutoff=cooksCutoff, + independentFiltering=independentFiltering, alpha=alpha, pAdjustMethod=pAdjustMethod, + typeTrans=typeTrans, locfunc=locfunc, colors=colors) diff --git a/template_script_edgeR.r b/template_script_edgeR.r new file mode 100644 index 0000000..ee6e634 --- /dev/null +++ b/template_script_edgeR.r @@ -0,0 +1,76 @@ +################################################################################ +### R script to compare several conditions with the SARTools and edgeR packages +### Hugo Varet +### December 10th, 2014 +### designed to be executed with SARTools 1.0.0 +################################################################################ + +################################################################################ +### parameters: to be modified by the user ### +################################################################################ +rm(list=ls()) # remove all the objects from the R session + +workDir <- "C:/path/to/your/working/directory/" # working directory for the R session + +projectName <- "projectName" # name of the project +author <- "Your name" # author of the statistical analysis/report + +targetFile <- "target.txt" # path to the design/target file +rawDir <- "raw" # path to the directory containing raw counts files +featuresToRemove <- c("alignment_not_unique", # names of the features to be removed + "ambiguous", "no_feature", # (specific HTSeq-count information and rRNA for example) + "not_aligned", "too_low_aQual") + +varInt <- "group" # factor of interest +condRef <- "WT" # reference biological condition +batch <- NULL # blocking factor: NULL (default) or "batch" for example + +alpha <- 0.05 # threshold of statistical significance +pAdjustMethod <- "BH" # p-value adjustment method: "BH" (default) or "BY" + +cpmCutoff <- 1 # counts-per-million cut-off to filter low counts +gene.selection <- "pairwise" # selection of the features in MDSPlot + +colors <- c("dodgerblue","firebrick1", # vector of colors of each biological condition on the plots + "MediumVioletRed","SpringGreen") + +################################################################################ +### running script ### +################################################################################ +setwd(workDir) +library(SARTools) + +# checking parameters +checkParameters.edgeR(projectName=projectName,author=author,targetFile=targetFile, + rawDir=rawDir,featuresToRemove=featuresToRemove,varInt=varInt, + condRef=condRef,batch=batch,alpha=alpha,pAdjustMethod=pAdjustMethod, + cpmCutoff=cpmCutoff,gene.selection=gene.selection,colors=colors) + +# loading target file +target <- loadTargetFile(targetFile=targetFile, varInt=varInt, condRef=condRef, batch=batch) + +# loading counts +counts <- loadCountData(target=target, rawDir=rawDir, featuresToRemove=featuresToRemove) + +# description plots +majSequences <- descriptionPlots(counts=counts, group=target[,varInt], col=colors) + +# edgeR analysis +out.edgeR <- run.edgeR(counts=counts, target=target, varInt=varInt, condRef=condRef, + batch=batch, cpmCutoff=cpmCutoff, pAdjustMethod=pAdjustMethod) + +# MDS + clustering +exploreCounts(object=out.edgeR$dge, group=target[,varInt], gene.selection=gene.selection, col=colors) + +# summary of the analysis (boxplots, dispersions, export table, nDiffTotal, histograms, MA plot) +summaryResults <- summarizeResults.edgeR(out.edgeR, group=target[,varInt], counts=counts, alpha=alpha, col=colors) + +# save image of the R session +save.image(file=paste0(projectName, ".RData")) + +# generating HTML report +writeReport.edgeR(target=target, counts=counts, out.edgeR=out.edgeR, summaryResults=summaryResults, + majSequences=majSequences, workDir=workDir, projectName=projectName, author=author, + targetFile=targetFile, rawDir=rawDir, featuresToRemove=featuresToRemove, varInt=varInt, + condRef=condRef, batch=batch, alpha=alpha, pAdjustMethod=pAdjustMethod, colors=colors, + gene.selection=gene.selection) diff --git a/vignettes/Thumbs.db b/vignettes/Thumbs.db new file mode 100644 index 0000000000000000000000000000000000000000..d3c47bfdebb7c1a1af7ce454087f6cbd85cfbe46 GIT binary patch literal 16896 zcmeHt2UJwumgWTnL=Yq^sgxiH3M!IAiAWL=L7*fHNKOJ0gn|-OkSrh|A|R4PlH^Q@ zB#9-lkQ^mL6%U&j2gJ@{Xh`){Nq zJUaU*j^+MGz3YG!APrc8kOu6)b6Y?Tgf(ytJW7FgtNHbmvvmf~5y?-A6|Kz~2jiv%&`Zs-i?{?v}s^@c7@+f%*ICDo` zOC6w~I96`3kjDX4fST&Wi4#=R;EtM_nueC)BrRB&=;`Sgn3-5un30RAbiIkp#Wc{rJ-d!d6JQjjhT(_KV8V*0OpfGDg{7A z!4FU}Q&2HekXw)E6bfpvwg1v!@S2k91lY=xbf@US3YBNT(Mm-{3AP{X0QfWzd=8vo zre?WtO@)S4&x)4cgY9~7T;@rETcyqH`u!L|No&s#x>FpSXU}m7T@)6%Bzi+iT1Hk* zUiG$`x`w9Kod*VnMzDv*CN__4?d%;KoxGlU`}q1j^MCR3RcP4j@QC z%F52k&HJ4HrL4T7vZ}hKw(fgNYg>EAkIt@v!J*+1^ysg#*}3_J#otTIE34S;9o+8T zKK|hFxL?Qp{G3iI9<`dKxuFcpV#CQ(^GBFQEErlj)=TX2u`vFI?o5{ zs6bygK;1t}rzzLws`j&8~CqG>yWm>IAbv z7W+4?rIiCT_CV=<$+T}`Jl@}w6WO;S6lLr`YR%E7`Rc@KMY$U zTGn?}lT&)5CMU&SFST6k+%lzz2|}_*y|;gTd9o5{@4NJ_`N}r~#&Ll+VpZ(HvOMoC%%L>n;|upGx#L@( zD`}n0rs-vW-E;+m8gKf3xR-IdKmK9-xg}=B=rfZS+gvzL$q9J^OH!G+tNuYbc7Ke2 z-oM#@4$LX-XK2#n{|@%w>YwBP|FiM`hmGh%R~A8US8&R|%wdliGCXIe{M?8m3(NU^ zPnoB2<1xpxJSGDM>j(zI2a$N2V1zrfPyr*S4e`i@yGohdz8raZ|~oKj;I|xtk>H zw{MEwL|)ALa<|*9P_5`o#^krO=e(%=Ewi-15h}qvtxHC**SYu2g88G~DF2CvG#+E*f)qOqxw;oc^SJr6-0%DvD*jt3jj`%=r+ST>A-*~W4#lrAhkx-b)AWPo?X zw9Hu~=3Oi*f~(YFChH{LjpW)$gXCSVjUtQuw?g2SAE%Lf`uKSbv_$f&ZDSUV-5DO4 z8L>{N%NCotsmTf-Yh_AS%!1~`?K2N+A>y+#XSL(O5M1CfsIK4f4vcECiei$I6v57-JZ3X*Jyn&^IX z+vLmVgLeYdzLtAciSR%*f1`xY#!h3lO*wgHLcKXX)|$4eZ@L1|eMT*P2~OJf?(daj zV))thJpu{{DNYe0g*qYs;Uw`_C1JFjlL*U0SJX89XxWhNF=mgU3Xy|YP{TX0ruVB& zwM*!~G`1JpJ&Z2AW1iSd%j3JpzDr#AJ~rSSy29qNggI9ds>qCGnQwuziNno>K5&JN z*(IZl**L~F;`L%D-4UO+9zMcr)5r7}*=?cj^>zNzDk{lKR?WDE5}kd5d@z{gt5X>u z+}N~}G@>XocbZEm)#&Fg_sQwa#5`o864EI$gjGs)0eR@y#{OpHBP!V9$HLiwu<1 z9Jt)ml5H8gpM0Hwn8~;rQ+ddV*=$#m`DSwZ?IY?)-IpG#_gN1UEHe(PQa?K!zOD1g zw^vG=xKgEQcdIaO*%>GK9l0&GV!6A>M0zg9$e*|}ST&Z4S$CCBo>pBf8P_1)F-jx@ zkDii&{y|tMx;AJGx~*SdQ5#QjE@~V zl3F-kV-(%vnx1NYhbIQ#hd8y9KTE0&EmDB%RBn&roUE1e!n5cJmxOmFh*X~<#1>+U zPKkVa{*ORX5!QgWG~6cbk1lqgtxEdd6NTztxn0SrM%44OL2q5NOiLFXwCg8m=R103 zYF%-A{5pRll)ot}>>~0RM0e;QMj(t;#h@bw6nNbgm&&hTsOq_27G z&kS}v^!t=Vl_B~@q&9|j^S9W}XEI>hA+Xct`fI$y(eZSlsF8@x!=LuD_pgf0xSu4b zV>4d)oZAGu_Ra9S zNd6t{3K?Jsv@nm&iS+b{&OwuSW|c(=yaP=a{Bj3PhAK_#4XZ0m-7ivGzN7W&(>=ly zMCpH!cnI{E9&}YzsnB|Um6ZLo_tVK_;WxEwqd|sUq#-0T95%~qqcZ@9ex?_3Tkk2W zYtOyAB@n74%UiHWgy|1SIinkgwuBpM5siX<#<6qG{oiK9<{Uc4Q2T*#zvoMFSBG=f z^GQ4PQgFmvuW5Vw#TvBtGhc1tZXTP>mb1$dN;HV;I*%BZVk%%TOhStnzM$jKEsm0= z`$n@6n#`X^_r zUlo3Ny0+|LgNbBU8(bS4BCLF>kB!VYy;~AihmP}!_Q>t&SWk{nHm2Rz&0^B+s_{Qs zHMBB1^~Coso~t77jvK3ygBVU`c8l;@nU@TFHLoT`m=Ly0er^F-8Z+jUGWvyewq;+H zRG6>8$pACq%&z^PkwH0eJq}A$6ek5ucy9Hm~STJM3#>!0dwYCeOi5LXT4TSu*e(okXC;>jcdA zYI&bl5vs-z>q&bSp`P;XVF&5cCP%m0N5I?SFx!cI!#`5Rr&Rv zqn*LF&-`4xf@_WJ0X1nX-IVMmSaXv~ELIzQL^L)E#f+0RO zlb|?wzGi5^Ikg~H%+--brZXl__~SXABeWvMa8~bu$ME`7Rqk6)C)KWSEIAt2ks3eY zgT@rnybQ|+GvoFIldr!XPCfTcxQW`Yi}Z8aSd!p6M>&JA{Le0%fb}2_y@nr zl#%i1v9^g%&9tc?N=7W_Bn|ET%ri9B{<&0;qQ57ZuQk~*w7+~YX71!ts9O9bHhK9E zq59WP7WPWEHbI7V1JVR^M=lpv(%-+k*MEmg^Z9w+Z=P2*A<@s!&F&G`@wwZm7M)pb zqLeER^L^D`2=d7A8@GnpQjyGT5OP*Wb)GXE*C_n`h0btMyp1nn$?fPS>v;h;+0OGM z_77hlgvr16y+eeuYR+3JYtR%4yCH*DxIm>otqQ9hxa=Ltopx*JYfV*zts}qa6JL>q zM_7}}ZRzNFV?`$He*4&4fA*=bb`5o5UDYnvNkc^gMkD&EooU$!(GL9KksX18a>$u=h&crMStQkpKQx?h>1_WldlU1? znK#ib&e9mkX8n&I#8uohE5W>+CIbECapd5pvQd8&Z;tjk)nFepf2B=M@K z$4kt<6U-=J*r)-B=jDnHW*Rv6q?s)XC^sjA8XT`RRxLr*u zuqeZLT&m~3AC~&KIxJr=<{4wb@+H1MPPMZv9YeMa#5Nhc(IiV5_ad+3af` z_BuDyWp}%aCvyKlU%6_V;LJZGE_+03EO_tE{JK8XJvpE;=G|pwzAid(*TQOHd>97# zy-8u{N=YTYc*@zK;a!j9TW(KX0lwVQijwEkr(J}NW1_G8dhn)NhG7jK95tK~0p)B^ znlT;4mFP=X(!Ws#7HH2xW3sVRDFM$e2K#1lJHE6qZ?Vk~`w;uq z!!$2CPen^QbM?{4sT^O9gp8cNrmm5(t^qM^1;3(=x%;JUEY!F4egd+TIiDhSQfrrs zm?(*{WMIJvt4n;TOL{J<`wL+Dim%xx4MBbkyN}N4kT|eVC3oK-MZ@ zb?098H+81cq~D9YS|3jF;zdk0pGrri%w7i+tLtYQ=zW(A%qLF6Rq!dZ(_x)CZ6!i0 zGX0XbUFVqgrI+Lcj=q;=$(QD+44R>)27ioHTYo<>au+; z_=f0D_Q*uOJiXHJlyI|9Vq6A-33kA*5Obk8Q&LtsqPe`8sB8saoWX05T9Og1}K&uwx`h_q4$U!{!8H zHu<$43c~-)jj334L?u( zSh?YBF}P_vP>Gb*w14GQYuc6T8tgq5HsXiQQ!2-k>R7*54N(h z-EYE`Pf-(X{lz4jq(*7E*{xYID|CD6VDVBe{sBoXP?*$^Nd|sW`z1GDtKN&o;z6gI z4B(sw6`781F29!vbHV9yI{N3N7v$ZlDnIc;A>wrH`8NjGAG=>N%I$I>uazb7=vIzv zcP|%z5|3I7kL)qgV2qpUSGpfTW{!J}h75LE_}z!L@1HWZwkzl=f0Dyq!IvLU{+ z&N<*zPmc#?!M8Pj+m1s#?s=e0j`*6M5#vz=Lkn46(hw9Ulb1aF^s~_ic=axam6lcZ?UXvCd8^NtU$X4EaX?Psb6-y>iM;5uiy&*nILngP^7cF*O85*wB)UL zQZP)%qb0|UaZY6Eb5TovMw{<%1rI61&aM0J z6+Q9mdKXa^HHCG23bT9Iz^kYv(narh@vvTCfTZ+1h!OwpDa0Cjc&0=O?>t+N;t1;8 z$XwYh@Kxyh87j-rE$5njG|wK5b)U~oosFb&(S8no;WQ$PYv+G-Y>Zu1Gs=t>+ zFvBH(UV(GYmoTi#YmY0e*K2x~(Rlu{YS-+$F8Pgp^U$p^8=vsKY<29)f^Sm%_oK(U z_Zd}qRrX5wu`(UguMV4HJ?sNMO*_3_zFA#Up;PEFDEJ{TK<|j(;V_$za(GihAc>w= z-P4tM=f`+Ri&DM)SQcm4@4U-unzDD+`lqY)zrKqxDQ)@g28NL7HRv#tDKoBXHKaUg_P7 zGNi})x1TS(Va*f>VLKDJmWQ9uStW7dqq7KR11^yJrJ;h_cmvxvfcpHp!MxJz{o*DI zv!{zf*@M;Tk@=myy}>W7cVED^hjG`NMB(P;*yLbu2vge(;*5q;S4qOVV&;u!N`-(nC(0|{U@doMyFo*bAhyBn=A~L5>VWSiS?7slWN+X2?Jp zj1cJ$cEuZ=PdRpfJT888}?@CIdOl{c2C= z_VY<6JZGizNYW0Z5yZAgGYL1Lx<-OLU1b;3RE4j98mNh>DcJ&Vk%%+g5B4CPYp53=j*T3wJH|lMmjn zBJ5EI0#kfsp!hT>POb>CPB6N}zYNL>XJ4R8Qol|Hm{xr&XEMsR2+%`;iGoRnQV^YU~{v__$pKQ34B2Sru;z@{$a2A<(Ia zWy94v4Z?eIkc=x5m#wE3XavPa5VRxhk3iSHkO5JVyl^lsZF0c+3OuA9L}Hdm_Ykcao0Fhsc%YGbYJRoJSOF4S zfU$2g6nhm6n4y%Y$iM!YgZ}6K{{arT-#8i-FM1(=?$eXbx7LM{gP}&W%MP@eZV^Ao zz{|WAs8cAK#Bv3)!CKMDVCfnQk7k7b%#stj>PxZz#47?EjBOjdz7%`34g=h^$#g|tvlI+7%mfm964dqno3rL8CK$IF>&V3%rXEveX?mmZP z>Ag|RB0E1MZ092k{>LboFjhE$Pna_-`aV9fHYFZv7Z}Nt?0c%MJo+Q+hlSvXO}K=f zg8b(hGnm$DzU0OivANq=;nqAyLokNMQTp=QTl(%7i?2FeC$LdGOSz}%=XAsGCKcE1 zN8Xt%HMN!yza~i)!kY1JA-KceaG3$y@KoBm?2@N|Iz z`S6SV?v>o#k=^-lc| zawmz#{wwD1ALg5qf+T9wX%E4)LLjmc;?-}-fcWk;?|#Ums0v!hOGx||=&T&Jn#jm^ zfebXiohBGUR&(4B@J+uVYp#%kJreF%OrX`KLq#K(pJ16t3}6rEKpXvXseKwJ>la1_ zXdWQ228b_-$fdcXcZi}2#M1at|GtuqT3%$YhF<)p=`Q_~f|D_u4$1b3=Vrtx_9MM^ zOQH~eq8n(6VLaT|$LnNSI>{t+ChH%N9`F2k9EVkWZW}8MP~a$Qycp7R&9%qm@_b#D z!6gCCb7DK)%5Wfxs|>k-)SG4_(0b2pQCEjmr1?3rNDbVxZz|_1bYbJkQdMBL`WdxA zNk#GIv?r*TG>R(E=2D}rbzk4HeoqtG<;o)c&Q1rv)05g`Q(^q8{&duiMX#D4OGx6@ z2+`8ey_>MFgzwXWGoxfo%(g6-xn*0%*!!&W?p0~Bw~aj24-(liQ7t`| z$!A%ZKRj`DBr%~EFt{fAkqXEu-Pzxw8IGn&zHtKMrSFHvPfXl((A;61*mccPPF8dH zz4&wwfq{0OYxDe7G=i#(rD-IlmvFbeXROb~@6F#cKPUXK=wqqvgWnoo6Zs#pMlsT! zrx>e4aQew~@W~C`aBkG!*Tm%am+1c%{F=6V&F4gEXV?z^b9Ifkz&AvLF88xtQBjLh zy#QhFwSeg@i}ag`sqvEq@WNpv=WoA50@u(c-1M`v2|5~Cv9XmTyIAge#YR2c!f1P< z@SQht<*Wd2ppNqXEJIaYC??*{`ohd~Y(&6_hrLR*@$Y|`ph1b}cv9BumL~w)) zm5~8u_uiPX>S;neAzsm9f5x^A-HK$yw-Fqn30d7Y;#W2t`l@f~h3U7Oo$hX7$$K(` zw(`x!7mtWdGcC)@gcaS;&bnT-?c~|);db$IFaLN>seV^qE0dJ(BNmvVFPpy8T5>Kj zrqCCiTAUF!<42Q%p>6XaiI9dH@Sl@?f?SHLGB7c>V$VDof9Vx2({3skmrP^fyV+SB z);dkSID7+}XN*th7US3^^*djDo6M{elB=&b?Y4AbwCeh~efuZ9{=00Zw~<9$i1(J5 z-q@5$c^=Jo$;P+E>ZR@1hI$D6!w~Xh6G!ii z1hcbA!x(`pe`uq~YV1Zub-d&$SL??NJzX7%_c*RnmH-qhPYv^>E@>A`pyG_hr%u1R2b!j#ypm8r^3wr1_=ScPGUE(#A*w z6X_fp9E@z3ebNcgAhj2)%LYI~TV`{(%C&Z&oe=KUXF>?$T_F)i$M zVeu|R_LpGBir6U{-`7#{&+iZy1?>gS{a7(ij)AMqn0s5r-XTp4%v1s@=gMMSh!YDb z+sZ3SJL$vqV1{Z9v?8bc0>}X6MKFqDfM}5i%~41?=m>rbN8*S5E3X{{-igSoDz`-2 zbY%tiP|mbp3B9l8d_=Ooc?etmMu`q=6h zZa1=H=&zWZR-ZMNC~kQB$m@GuUTVlqsP7^zy3v6Q{D~d})7K;odS!WGNDRL_N$O#M zOYy#;|1<&*fj-FkWmi?6Gcb-Y8Q0!&%S-YLUFGgI(JK1=-oZzMOY4R(m84cwA_Kv!2<(^57}ww^14F~iC6q`<-@SVYL%5O`945r2 z<_rchV6}E_rCu4m7?}_=S^^Loy>?34L8Y__Rg>1+ScaoI zMz6R7ST6khfN6^|eI+JsD$~k4egmMW? zrkzYB1JN+vyZ2!FK?8>DePa(@M&8N^=#2l|j<(x5Qq@mdr{S}kVzOXkS}|A~1%G%* zJN7bEbeRw{x2FkCv3Pq+a5{d_GL2^jZ97+S&F|5ZIqZ54p1OY%4YunLJtLm!<%kKp z8N(~>oGA9BZBUkW<2v@6)FSdvies^hNw`^useTS-;2EdQOi;eo1BJaVi*GpNvZ>xM zO?t0V`=yYqLR+n}xIFdYXu`Qo%x}x^7V8|*ua2IT_^ix9@!b00?DxYP@2;80euxW3 zoj3dlbd z)H1TM<8g!U>0YV;ag;V_o!H*Twts$PlI?*tm_vU<&8Bzr-vf`#3tzSqO~5f zKU+{+%-x)^Z3IVLlQP*q$IvQC%0s5 zI9DxxZK1uEu4u@dkCykqDr_Z)`@wD6gG#GtGfhiOeHJeCy4pWeR99;i? zKSHc8-i=T^PN=9Rur^)SDkz>#16dM#b`Y+I6*e;7)qFbXd#{3SFVb0|_q>MOMAr#@ z@$W%~o|+HRI)|RCj0_T-?F;?x>9BNk+)0<<7KlHQ)U(Qbr~dAjD|^iCzbI%Fqo8*^ zR&N-Uc{mmzOc2R)J@ZcoSBqA5GPl%_(+2Bcr?RNCKhziaI}8)Ok)3>;!cjrOtLgGXPbU>LuoqLaeBJyBj! zF7NU}*`^}9U}H(ydL1|;_Um(YVsZU68ir_jY?dLZgR5RZ&wKrDl$b-bX-~7Da%EnU z&YQjstn%ue%{D#vFvM0;tK2s!Oj+v|9XGu9KyDJ7UW?^d;?UuDmSCCe2&7csm-mT@ z*Wtd)De!LZgnD3h;$`NC3txK!8reps(cUE8tw)OmgNLQ_@~@U68?In7Zobe7YfCms z$GBObv?H7|F4GE`GX%hr?qB^#X!QTR|BZ;~;6lCL#ltP#=QJ;b(!>m(Ij9Gd63tn$!WY?!alv}x7gskQO9 zwGAV?Vk!NJBgSrNzaBn4pYTS*DX~Fs&aUCe#xD})oDPM%KbGjwO_iIc+AbE-=CplQ zq9ngU22O15)d)_0BvdY23@dVfb~GB$%C{Ib)jrFB;qdXkb(1OaR!t`ru$2DpcR5@) z&veDVwaTbk+nYr+PIR;7v3+z`-uTBaX#bc@pFT8fAI7lC-cv5gao>}obKx-+-hfs6 zGU~UEzz;GoWhaPM6EQUM7DDLMA)4h1?nS!)YVSs&38AnNB!zdcgbrTyqe_7HqGtD# znP)B&fw_9=6ga7E!@#g3`8I9>*H7*CD8zuAC=;Qf+d`Oid^PSj$@V~UYF2l@6Y+}V zkb|9``-JD5&Bq!=N=23I;Wwx(RK#x6Q+9~Hb9)h@X)fulWhT>{O7-me#TL;RURCz);X_z(Z)C(dD|~*6--Sc z6-Z*L?G;xjz`yJubF@VO5$g zHR!G!MUAjSNv^|ZJ%4i?x_8ZfabT^eTFpzszuTv1p)e6iL$r*Wy$aXIPKU!KZNK29 z6tDJoIG!1Dlekc05dZAv$-VA+-^7|FmgnP|Kl;>sui{HgB9^|J2u}anUgP?}-2M4X z?4ro2q6<+AHs`gg&u??evQPD!bZJd>7l9Km!u-bdJ4P`7S8RIv5C5INtteXjWPAx% zVj=c<1np8X5X4of%#WAQ#*TzDCKQ-isGN`y^DJ^&Eqo{@a-!^OJI@}j5J_W0;^>I2 zT_Oqng6uVe?#9@z{hbPXTD!Lh+*@1u`vtUg1-X7ppvM=2SiS^CO-_PAgJ`{DA9N-I zTtk3YlwkB>-f;b?5pw+O*1=&PWZi{{aHtV;ys8ieR~6vnZ?6&f;bS+F?%2x*BTW1d zl9WY<8ssl;v>b6$fo@0Ke8!(@96OqSPZn0%ud&yzKnnUwYX3$CP}%>gVXz!Ov6d=j zR!nbR7;eQ)sIqdpbn~oaT3?^10mX2vTVlDI)$5Kq59Y-J>Ha2&M2l4p>JFU}_`hti z>Q|9Z5x_r!og!9)K^!8YNhZOdE~Zypxp|?Y!y&`}9>$;U~*KsRAk<0HFxM(#=-_%HA}QF&mnryUHy zv+IoZNd}`H9H0%IDpm#8w;nfCXa$96eH!*dwob<~`w7gQ;A6|USyp@9($ut~s7QmY zfbC0m&-qcaSAtPY4H~rX_ac^dW2x|B#v`1@!vx4IQR9Jmy}NkDsO<;WqZ%Lm37@5i zMcD7il_B?8=lr%+1c&!Y7wfvOn6PoGAG|MCwQ0un-s})M`_M*~5?efvJq)bz*zBoZ zZ^qbMp>fmI(q9ny^|4r?k1^^WQfvTMN9hg44{m+(Pt&FGhMD**{lOA$gAv@*QU|PS zj+x+@dtcVu?!L`64+-(dqSaf4kvA02V4g;xyLCygDma=kegzn9vNeS z#Gu{`_uqr4fic~9(pK^sB7JOQ;aOW)$HfNB7C7Yfa@V^U)e`XKvF$qb#YbXzY6-w5VnXWu1urg|UywSVqz^ zlQm*O_OjK`P?l`Tdq(}A_j#Z9`FzfN?&scn&bhzk`~98oIYT;OZnSx$_(l{8wb}Ui zF-sJRhX{Um|KbHtbTZH01{Xf>BA_{hkjJ4fz4t0)$XR@WlpG06Bb%k+ zumc^YqDpX(po2XeJw2BoSK$n;({ZbcZ4>Lk4fzybZt@ULr!D9u{}?)*j(_>{Wq!Gl zo`VN=iHcPItZKBqnWgaAxyNoq8f(`dfvL*xDjo^#*m&rRt&;t-<;=Go_Iqx5c-NUz z*Khqquv9P~f61|RUipxP&v1X5*k^gAyV&No5LWM|l=R%zhT9qu?j#Xtod@-#0x~9u zyDXF9Ob`y8&fsjQ!nZ`W&aa%jKcIE~zTWGhw5B{7xp~mPnLRWTe&xR^e{iSn1K57rVEQvca(;$Wx9vTROQ zp6op^d)BjaFUL!5M8lO+;dP_xlJB^)-LK8+8f}$LJeBFr9WEiy@{7${&h+K09@Qzi ztsb$s(Dw9GJjK$}5TAYtwtWA9gO$H~6_FRKd_zdNf)AusC=+Qcc}f{= zF{P=QdaP&tZO(8LJZ^u2M5o>SgYjA?4oR zCP%?VRwAa^ArS8^!Lq4F@IU7*xi?Y9^>pN_b?F?C{F-=*1m zKC5PIU!EDFPS3XI++@3?rd)tg|7aHvISxohrYIXLhTB&ipj&n_wu3l~) zD|ixNPqWUOz_1akZ~+W-0G7eJJoAmQ_Oq2;rSZRU`2RBhH|_r$^BL-bFEPb)iV*d# z-mm%B=H#t5dnh*^4*A5n4!vR^C74mculB`cL|J#QNG#(REJMj9Dzvu!2x96Y+HQaU zh44bOi)^xFyi=^($frkBvfTHbg3!4MJpH7mLl!wwUQ6_RiY&J2ok{r@%J_3dc_c;H zuLYVt#UAbD&X@W`5Hd4{o$wN1`hPVUEFz0Vc(;hMJPyMzW~bx@%xU& z1DbDZmbaZ^?_5|~toF>`06rkV5@ROVvde#FbY>4{+<9+O^D9MpA+BNfnYjgnRd1(B zxvp7^V-~H`ZUOb-7W;)R4WwsGG9AI-k9&|sQzYxQ3~lDSpssFmJo_=}Mp*NbKXaT! z9#y7D;}b=_WufH*xTZ$03nB_4tQI{RX2Oq;$mZpA>Qr)P$=sX{^cIHqi}_%hWhfs^ zR?z3~4Q{bjq%d&p`2pcUb))a&25vSJ30sh{azV}L#?9E8^2zW}bDXJkeN*Sh>wR;{ z8a$z~VH@}182;mkI3njMOTk4FBsS` zxF|*+jH`5gDhPT}PI=%{)O=xvIu0@jyLANC9Hop$kQXmI8Z{1f4W%-0pN5yU7pL%| zUCVYe${&tAd*PeWC%`)HNw+2DwipdOH?&wuUDzE~V=o`iIb>_FQfF40_H*iffu)Za zksaqrcP8fEWEeTJE!{r-EO2qyqKjkjuWCyFbv&c3!Bpn_=nUc$U-s$4j1{!NEOuEPsyf zKmA<-`Xm6gd!{d)jLv)|Cn1eyc|<7~2Y6xnSKyMq$r^d|iNTE6UHD)}mu^=>c%Us{ z(x%HTRlG$FuN5QF0$rdJdfTexgc_2K)6Kr6tTm(i?nAQz)nDNC8_U&s8{~&%c8i=K z+ba6OFHc%nVAj89CG#Kq8UXM-fsF<~iT9WmrX%8>9v}Yt0z%>pP6rWyNmX2;J0fyuH5~ z!v}3|OWgE0d-A3a-46gk6S_t=#Ne!|1x5S+dN!W-fGP>`xQMb|*i9HRcYgL9n*#TM z+q_&D?VB%@gwPlwAD3xZfp{kiZ~^JNMkez8>9ZD;xpVHrWPmpPe_l$4a8sP}cI^9J zr(!I3K(q>Fc5uRnrof+ABagkxwcu-(44h|<25@x>7_rUBE*lH@^kEIdAG>9&jzp$n z4qy@bMu=J5wgdkgGAdTu!@;ZO5F-CPsy7w3PuvNL#9F#9CPxDWIDZwGh5|M(-g8>Fe3J@y0hw0e36C{ z2mm3n-B}L!VOHE-z%7Z!HWDbv>~Z(x_{Gx)m(RwpHTy%P*u05>$6hCAIBu#g!qqyn zC`8+u=2UYqRCz$9GqHX!bVf|qnuCMH!HWwtr=d*r30P7xw986fPlsu>AhcmeEnxEd zdNs{vHET=v2XkXT@sRS3=CpU*_{viQMV1(2r#FO_UMX>a+2_#^DIH|Tf6!GvH_PmY z=DGOHBtt_LysCVV2R^nk z_If#E?Z*-%@@S@7{V^DY6@)H`ezeqMH+(u@2TB?r-IpaU!_CF%s9cd<9`Yfz0w95W zF2Xp~7zN3;ga`{?@O{_>ru%|Rj*=0KdLznm&)dw+zYJv!Gr@ltn`$dd$u|THi$SxH z#F(=wp0_~dm3Y?%g@8()>zdGBUJrpr(kX|kpjEK44BNF^( zCZ<#lTy*D!c%}t$qgro$;3^q}%is;f!%#mAMLT-fZv$M>J_2`5 z4>#-mNCh4;)M0r0frqS^-yGoVq)>Owm3rPj0sw%H2k#>wrVy>qUO$HKMIDl2O5*|8 z=VACiAvb*DJ%=}*lBkshINT7W0EZ>(N$$W3U<`$D4*Cn#N+dAY_!#`MptjZs5aQe} z$e?iUrs=dffYz&vt8>{$fqhB;i;K0jAjf+^@%wnGPp?bnI?4Ytk&k@AhWfoK!FU;a zlT1Z(4Y!5&bqr&YMe3_AWF|wXtuV zKD^BWzE`#BDp~a~^fs$ohLZnz$!+GFpt(`Y*49QNjoC#>-Gk?gV$;UC>mEb|pz5gOr}EK@*uDZ43oiGrAFf5oqI_%^>8A8RyD%TUIj zl_141h4ph%Qxe;*=I|$X59Yd!Hv}Bqbg(1QC}Isxh~XNJI2w>kk#pKc&e1(Dp=ABMV?y0j&DP?5J7Or3VZ%wlO zKmxVW63vGxPy1sD!46soZ?i`hu7dr*W<-1MJ1&?-MQ~bY2uOw^iQT{jERIBYyn`E< z)k?=uFCo*v#S00r4)&@?i?~(*#;4u^00Oj0vD_qwMa8>4;_g|Agd{-G-zi%1?Nymt z9WF0lvBSCN_14uJQLo{438J*Ojt`)K04wY9e^hsFRkVL!7a;vh(hT7xvf)QG`+abu%cNakCXd9K-YtWX zdCKP@=`^5=u@4e(2i;@Hs0UppOI0Lr9WV(2bE5=U_RN{*F-c5ijz;U&{DxJ1mej=W zIOo+`jx4u)AqZEuER`m6nXL_R1)5{yTw}n6lFI{o|F$|m_nSlVSPgS29_Gu|WV3*& zr#RRrM_5VWbUN2iB71m`-_t!fIp(3@qUOo*5_SWcMXO>DDkqTF-!m{QMmkRKPRP7J zWent`T_9`Q>hsJbjjn+RvpCzlU7)^TT;|^@zqSeNy!Pg-50Y}7TxK&csISntJ8N~7 zsZfJ9S5;n^XH_;>TqhA^5tFB;ddoJYbsIKbS+Mi1?f&q_+)tsn)$xeayKQ@;?j$XYgXqIHw7^wOfQXG|3avHt6Z+V zRN=o#j-tyDqO-61tZ8hAfijQB;MBj4CN0^K$c7@Uu%Y#B2dK5@jv(T`evQDr7RxoBG)T;11NN-$XuC)tO+{Z*BbUQDk?+Ay;44v``V159+Mg=dTVE` ze=6P;5ZQ-Iu2FfcCm_!sey-Arv9>oQwV8DUXuqpR`~1s9Okkr*K5|iNbJr0P+gtOT zmM=Vgjr_)+tJu_^1E8*M(*BgwRe`fA?Z9NJ7ygT6<0?HrG|PGRa``C?C>Zf*+~X`A zKv04<5=on>I7K9*5D;=fevtsgeGfg~_edXefqN&-K5#ay9$H zeYXqu6%~H8h6IGwgpP_qTmzxBOM%tvf3YfiX&p^Y2xiY`G8GQQ* zGdT6iyQ{0qSa#MnVrl42^aJVf=ZCG9C|5@<&_kkMxMijG!D{Md)ue^XJyOY;-!5nm~Rcr}0mg&S#q^D~AI%3v=5Yb@9692O(i2kFBN z!2L}A!0VvTzm6h*A4vQ@hx~ICndpv>?a>+2rD_Z%pD&gQ(d>12&OgrlVsy4?%z~{> zKH*0$Cl6}eOI!QB>SQeQ$DM8LJN$|d6Nch+9=qQjqSc}E^>SHfkGwrMiYRVn%=`2V zArSJK|M`50j@*1%ag)09Y_`8RoPD6l0j>{vVr%wzCpvto0=i;1?_Hc1JJpWM`ITvY zBUgO`Vlp)PbyrU5U37D8ar~xME#3Q(8w^ER~dYt>O6WxD13i=&(?vPirZ55ZoN5ptXrpLOa_Pz_vSpJwR z$rM%Zn6$0t>5bL@H^=#eRWZb?O2A0NVRxy%!HU%0zTaNDCR*H3Wj zogMNwjPONul%)N%YuLPDD=hfl;Yjijc} z>$VPg+U8V)<7WFra^wd5mLPw(5XXW=)&AOc*J2kN>O#+K0QFHe`k*{+%rL6Yo z7Q(wiwZGORKhgZae=(SmkeC;j?dmCC2YE#G9M@*QMPoc4E0tT^@O04RrzuJh7-y@b zEmMoyYSMmAJGXHGTrNlHmt?&oD)iT#L=*?h`Z>dd zm{aQD?dG0MZWUkUvZo1L8T~`5)+*p|G3og|p4pSU%8p&dX~8N*W#TGmjB4>jd>Giw tV)pSWqyH1w|MjGednOOs$H@6`){|o2jE5yT;9mliv4Qz9>XEb8{|9zgS0Vra literal 0 HcmV?d00001 diff --git a/vignettes/batchcluster.png b/vignettes/batchcluster.png new file mode 100644 index 0000000000000000000000000000000000000000..7cf6765534c1a5de51a6e772c9c0d440e05a1886 GIT binary patch literal 4216 zcmd5=XH-+!77igJ5kab;(n6PBM35kwk$`lNq6TC{lu#oBNN7Po2M8i?1%Uyi8bpd9 z#ULOEN@$^|D2NH*KtKUQorDQ3FJa!S^?tqeW`4c9&OK-Cy}xgt@2qpry?fncNBeVv z{73mgAdsMyCDI85;sLlfl$Yaay4tC|X6FD?410b@AL^g%UW>Ww-1wd2wAQU!*!e(=z94}|!1UXhuzn3t7Lp!GNJHe+rDxV!w*s}M8j8zO;p^Q{Gk`5t zJp4f4QK;pq+qo37p2S<dk5ll z2ZQBNiM6{VP@NW18^trNGhM|uGJZ-NEoS8SVT3ADL<^x@5yX=wgN^cqVPbN52&9%r zwu~7ew?Lo^Eo98ae-~IV4EJ;l*Gke-s{Wi>vb|>W&YM+8 zDhe}+$WgVkUlziCc7J@R&EvicP5W_Mceoy%QH*X^RRrEWQ1&&tH!O*7eXgb&b}XtL zL4Bfkq*Z(?svDte{*QNj0zor&k3{dI^E_6qyaDs8f9B})LKR&|fX7!~X-CoW$Et8K zCjD~HP8o&qjJI!nY)oTJTx)09tX2oGj4Wa#my9bH36GD@*VN~GztGO?Q08EvHW_n!(D= z8{I=|ubiR{UN{SzL2Ak_VD2eG^qr6nw-F$P%~xhXIz8Gxnn&UB4IvymI1lqs^q%@D zSj>1g^L^&uK2H&d?MH-|8^hux8;9Gx({jp!L1H$XX$>4Vsnh8Q`&8dNiy}&wXs@3L zoeSgy%PkC)HmNGv2>Y}bmwy)cG5On8<>{tmWgt^w)5kJWE@Z#Reu$<*e4M$G@H0FF zEO8&vp&$yqgaq@XSi(I2w;&**P*aX=Fux&oK=iz9cZQ&3vz?}8mu=w=O8bSKrW10+ znlm^eBQ4V?UDHVFfX}Y#lJ#BTU{ftUVJHGQe23CiinOJFY5QVWCn@qzB>ac?UZ}{< zBLcN>zfx)Jg9qHqzo%z}q?{q(Q|)wF$NZK>%6<@MFWbl}&l=XroAnOYh9n-l!G%z}B6HY|`Q`rNLi_r+(^fegvZw!d z5=lqZE_Oem314sE<<>hmW|t#Y8p|I9 zft4{A=5PB3YY1FIO}RuyqMIf9M_;UR!qZS(c-uYs-+Jn^FUr{+I8V3ccmMjw(!K(3 ztDLgWS>nW@)d|6a$mberfdwZ#n#zNI^K+Ns^bFFqkDXAQKrvzO*3S4R#N@qo zDmDR%m_3DG5GNA_=D3nEUR;OFvRMap)&5Ua(QAm!DfTZ2uP9mV*DI4yr9|lUCs29v ztu1HpJUIigiT^oq_Bc=Jyuvh75Ayz+b^18;7X(zvy8R@&=aVSjVpt0AfA?)(DOs}T z`(cU=Nc4nxj!xQmJBHRew(77%;!T#qY-d)1Lw2)Hf`nb=d(Je&VMi4`hH5ZgJpP8a z`T;}>G5Lolo*O#GAOA2X z6Vc+d4yY`Fp|mn};hx)hC24lkUU+zfGq*`2SXR+=UQW zA@`~6H<7y?Z7jD0oT*3P@*OpmzTMS_hvnpA@&>Zw+s+&;lm1;oyEC3YlH7qh=`sck z?ep3Yw>;}!^6MJK`so!qrPpSP78Es0GL2V?bnjC`RJSg=c6dbhUk&M)}qT zSaMzp^M1U3%P(dgf4g`4E~qOly|cU0DtTcE1@8M^D1|1 zaL*8_6XrhRA6Q0-yZq28$iqC*aT)jCX0+^UN-1z2g_Dx=OOID-yk6THh=Hi!%FDa3Iju{Y zRKG4RUi=!%V^lx0<8y_Ia=l5Yx*PNEJr15Fc#- ztl>qW!V>buOIGe~w8hp~iJ-YB)a%JYjg>wj%c(0&SDt0rrsdQ9u|f$WIYHHJNOD`Q zG$&8llh^9ue7L7ra&XNi-`bV}d+MC?@3Wc?s>B3apGfTbF}}O$*J~3ZU?HjKwS#ld z8**?%sV^a3Jz4i{40~u=m$e;DGeeo*zH9W^yk~FrMC)m+>#S9}v+LBaN7L?qenhT)nAomcUYny~@|>#vB8R(|dFEpi@m|mJ=0PuBb$?B`Uqm`mzA< zv0UhQtk&#o@#N@PW*o!~A0)F*N~^*(C0K@_rBJ5MS+zBRYkgm$8S_`i=Hn%tMrWMd zd(}h7GM6z`4H2FK_ixF!5wGFr*O@)Qz=od=)&Xgph#xLMc&mI3d`?}a8=c8Bd=|6u zK5~t8t#kca^Y@Xx;9a+kit`0CHEyk0=x8PX)D$q-H#Nz# z8k2S2`Vm!O!Rb!Y&vw!vZXb_lHB@iL3^6As)%s#xcNN?UbG(0cq-&Bl@UF_*P-s}` zf0Uy0qH!PQbW;0lgFC3>m?=rW`Rck0m-?r!*iPVv&|%@R!cqE&0b{#Yv0L3!Z>4!? z*x4l=%qWv;wx}Ps?V>fdL`7X7w=}zd)eR4uRC2%0229|g9W^^AE(Y(U4o^JlYk^Wb z(#ujn^{tn;o)ekV*!)SS#kmsiut;oYszRqLH6(ksOOFQm zm>n?7_e$cj*5cgY@{16o`F^Fky1M~M5wUV!B-DkQSuvwQdCmt$u9?cfNX8Sp?pMhs gVU>0e@MAV_dPPb4qd&SS+&>6b=Jv>%v!1vA1;eVTz5oCK literal 0 HcmV?d00001 diff --git a/vignettes/diagSF.png b/vignettes/diagSF.png new file mode 100644 index 0000000000000000000000000000000000000000..72ae1fff80ad7e5483a7b199b8f23690f4fb9a8e GIT binary patch literal 25113 zcmeIa2{_d4+c%DiO5tu(5$>{7wvL^Eh|mzS zjC~ifuVZK4>pLU2?%(fu{>StD-{12d&-*^dF%E~B?{!`0@;N`}d7alad~d1T*t`4a zZWF6}& zj)I*x-=91i_uw^BQ0>jPx8`-Dy|YV(LdOM>N2LdsNoWj5N=Py-5L9mGJ|*yrMvg@k z2K|0|DR>9;=iz_NpEk8P9e?dO(MbF1H>Yve%_kXFKXqAHyBGH@PlsO~vbh&Omt9yP zS&M3QUbjfJE$Y(EnxK)hfrMGm%-4`DY8C0c{q|IC=jn_#Mq;T?7sEHrR0~IqDn=ro zYaky3zkgiy5ME7MsuSUsRd1r>M#FU`LV(5#k)(6(;LCxb{FSt}la3`EIXAC(G*ML?(!eZvP#QKEW9h}n*Va-kVbpfErTXw^-Kk!A)&2ffl1NBgv?v7 zFQg1%wDhDgyMor3-xHR}w0q9U3((TY)!3&ZG@L)=eP4P|?6H!N<%X>dMO&_pySC z+>)}3!>OI!ko&M-wW;?w6AD#$4UpcNBt9h#0 zs$r17HoXVR*z-O`Jnrn*> ztL>HU#Y<(YJ;Zf?;)tG*meDZ<0X6P3(tCNb!I)pG&0G2yjaS{e&BzV)O`XCE%(6{S zAE>HzTDS)y+RQXQh#Lw%QXpGM>&G+d7u!4KiB?6<1rqL~Kik@F$J*MsClJGPff)w| zI}-2rI%+v2By6KO)5oc@X0m&aC>3k&gb&T7gdFe_7K@hE7dY@xk$BzQy!6cie^F#Ki`_dqIXAlNhYCTY@^Ns( z3T+TEj`m^-@qQHs7V{Iq+Gcl{cnO~s9}LeAHMOIw-S4`K-Nmt*-(X}@&24D7CSv$P zVu#n__K%XBRt9C)K+wx-U|m`lMh3PioORDxzTRz?nDP1asROFj=fj$1Eo!0w!cHaR z=@C;7S>(cGbs};*Ejx#YKOCOgrQ)o72uuaP$Z2&@BSL`7ClOiZ=Qwevg{(Nj=1J4y z5$o!lxA47pf$eWi@-Yy9~714>`-jPAj?v~fUg!)A? zM+$|aNj3Cn<*KOuNC%A=JGW7X^oc2ZwLORDo&xi~P?d|H1V~w)Z@Yrf+}Hu-oEQWld(ropUM&mAGDB`oUMg%Ltr%Hi{YbiGaPvm{a;6UQSj% zcmTN<%x=^CyM?xi!7d*J&g$rN`wpiy{f8BS`>Ow4({%EhPx9Z}rSVVZ&W=cyz8`!G zGLtv+Qx0gVKObzha^ALk*|xy?AJ4$II!xY)3mAIO^H)?udt~#PXX{N<#2k|bMTy+A zE9F!x;&p;~?f}0RckbN|+|V2Jn;SYG12>#yKfBfEA0>aAfC@Q#h9|s?Uv0kr=7tG5 zV2~qBN4C14{@6~SMd>ds@cF#x=RVuN@cGen9R2$zO63rmJL&&(Q&Zubw++AJYtI@M}qYz5o5~y^gv`XM=0x=ogm;=Hi!RN#;f#`&Xwx*zaHMAmphgkUgKN* z`bpK1D*skskO-js0LIVU;@1x{t<>U zu8-mK0(DjA-zNZsa9!LEBAZ_rJ?u&c$fmCP|D7qLIzcW3v3+KBJ7Boj$O{lZzI_l% ziS65gcOixgr||TJho+Q#MtTtki1gk6JV^I{&^!};ZzjPeU+?^8SEB$@YQ+#fgTCH-Qs3MvEj+S$_{I5tKzguLnVycCZmV zU>o0=#EDX1(mblvF6&vS+18DCJWx{Cb{{cI>j!Pq)qirlJ+(F-I8r^|#A@OYU@c{r zb?1VCwUD2xoOU=!=C7=_!$3K6Uquf9Nvo}TC@?i!O)?^RKBUs(-U6?=OU&|@fKna( zk89USQ}&b7t%|XNGPJ|+zlrSCao4IQ>hvMUH@EcsaJTZBU9tKjp53~y% zTNXn1WfwJI9*k(JWk)1-8Ba>Jjc18D>It4d&4AFin`AwJ@=}fX2h-e3*NhMdMOkNx zr9LM%z$w);Vg2^FE1qvby3G_Lw!WG+8__P5gi~`E&m$44vm<*sa<&=5GX;dc9~pD&!+ey6j(jK`VPctC%!_QW;X)#xX_SI7;)+^;XcKl zDl|4z;XNz{BN(psU;6gT=ue95T93THKP_5xg*)8=2Z=4e7}LN-~HkOAe<$!9tZ3pvm3_NvKIZcW&L(h zD@U^awt2V9wR4-eY3les%Ajw5TGc5~rcJa`g=Wv72)#C1;y79vem^E593`ZwZ)quI z2!M5Ua=K(c(PyBV)fwzDT*GuYU#b?%4J~%zk2OM6Cl5hkS|;i|OzxIrzKUi(@smcA z_-zNpPxqHvwPSJP&KdfG-Dde#GVismvz(-5IxiFn$k3V^QNZc@&*JGdmb?$zG=B?F zo{5San`?2cjk3f$-+ZVT!f8n^yFWDbt-QDO_JG?UpTq%z8DtZQ$Xwkh_mM^W7ff-ySs2BSjE9i5KxT*&~(yrc;f z=X`rD@f6m^Qp2{za@ulE=%XqLZ_j_O0fHa z-FP8E>A*G!(LvJ%xScF^6ck3vw|e?UDQkAKvGOYP`i@*4%F_bTa)~JfdW{n)LsatNX|2A%g z&XDDYw%iU@PAz~9PgOIejsG$T@o$rOcj2D0H|7dJRqG(MbSBHj|xu6PUHS zTclP2-3BCZrV3@MYaiJ5#|H%Jc?7Ue>{kMm{8Xoh;{!LBC&2nLw zP-~o?IOnYt@oarow4HOtSi}Uo{-H9jsO;mQtN7~Ss`mE7lXkj*c8rcws6POt7}0Q;(8)kWZn-`EVenhN+@M2@=}5yHetbiE1GcYGIgoLjc-V{cn9R`Sp1a!ljoE zY#8JlB}vJR{9z7~RL2gHf4v0%sMB|8KLps-y{qAe@bQx`nX=0gMaUgIkx}Ao9PK{IUVLw0Hs?T?Ey-8{5gP)};cYc#C!a zI|=pC2#1?OoBQ$i?dO8h+RY)V10ZYEV{Z$q^7^bXDDl;-Wmh$aX_oPE^-&YoU*?ck zS{Vr>kM$h1moTP5vbYDjZi^}dpd5KVg+h4WGYNk4-dXbp5D@)Tlo`w<^ z5I@@fT9mZK&`j9|L=l%7tsl)y+&YX}Jue z>)TYej_!iD7)rjL9YI-f_Xiu;p%? z4XL}w>hFhPx?69ut9l!9N&{4LFA^VS{W}qZ4iG;iM_dp8g9(^HR&7SAAt$-qP-8PS(;E z*3B&yBKn3^d;)`Njd0+s5B8h*&Wi>h$b!n7`4EeheMBW$XS2Z~LBxsO%!j zaV!GX{L>%vm;;A(wgh9Y4)?5IUibhDX6C%0xg?D9T>6}TA|qH_*5k*I)#v^DYhOuy zwW|k%mT-BCZ9c()vQoGKCy)KX3<2xiq;CJU6uT{QM&Gc`Os$E+tX`XbT#|x6=~l2x zwx59KwCbd+&zGp9j%4_~J^IT8Ru03R1_r4ZMBH?wQ~vWM#=n;H@UOvu?d@klsj}15ZQR9vSZjzJZ@bP)9?h;c5ZCE6IGL*NNiuG? z(rwXqos4SN$-FAU)ladz`9M^8+0bde$~7s;ZnMDYY^$P6-H~FV$LftLfCC` zR(=11`&u8vD5vA(8EKKFpV`so1+uA;1O6MfxPWtAi!)}r$(?`Uwq?L<*;ZF|`HuBp zA4re%wdnhN#gl9uHENh5R5%|TtSX)75wh?czfzwsepSTqivrZ5P=~Hi;H`T-CFYrH z4U#73(4+kQZe@TCp45Qt9M{Kb+)!Pa>(XoPT=Mr^GH%@I1$t7~b@1Vx$cuLhFNXSl z`-K1{{_2s}+^q{ezF}bxjx}~|XR+GwwRE69`VilW)iZ;{Fl&yt0?ie+VFj-lTvW7U z#6i*0Uy_qEUE?F$9d`2&y7|N3hOSSP1aAO)zSap7ikpmSzyu7SSSg-3=n69X=NxVK zjO93Yh2BKQ$>j3IaW)ec`^~jIRv(i1SCGCdDA$RR@yLUzsRcT+YfFjTOWw9??%>q& z5rnP_xS5ru%CH0^L*To045^&>FtEEQTeN9!*kk%jL;N7I*DvOkPRCUlGrJEC?tI%q z&XrwTcB&khCKsG^qX!C6Xd zu9Xp+{;EEtZ`tW0yaGBUchb{hpK51sz2r0dIy>nT)SCP={8ZHNM5&qhP(ZhoNB)SJ z`00#MHe8J(xXlnL%pt6@kdF>r-2Qry>XD;8BK#@lVWaN$9t{%l=Q<^~*+xQ!)OT-O z9X61WdVO<@94pM}_P!mC!>mtDf&4M%S+KmUk7#RW#Kn$!UbuVGV^mU7GTurJXUV9M z-QMPWH~Ui?_8NNVR69q(Bex;-FMKZ~Hdydj0A@9Q{u}4G?NU9-O)T}V1!Yo-W*ZVF zQSuysqXrp3BvgOQ%(EqNGLI`Ehpi2kZ5G^Nezilwe^J> z+b4a3ZXQLFbMH!Ow=7QTTpZT`wP*hu?9iv}zmlnz$;t|GjR>9EbKF%sCbMo_kVPQ;)g(^%;k z(c`|LYPkf5+MV-SmkD>^NoUl?fPko)x>Br zEAMr4AxS|!(}G=G9c~~IT|R1n;8^>K%P@E+|1dz$RXFu8fHsWi3~_dcaiiAIN`9P{ zrdTndZ73;cTMw5Y>4|xe?YTD0(e>J5U>tP}u~_8$JaE0VOv}J}-g&ezZr)kw z?C%za<8$g);fR)%^Ds7zM*HbSzbJ{! zj%1WzwZ(8&o`zP8vs+5(lwCI205SnA}nlUexl-4`oEdm+RFqveDUA`VWTuKxd z%cz~XO!7TeH|#r7GQ&}aw#ms0>hd|h=PwS5;>0fB%UJIqB>0^R-cK-A`PRMI6H+N& zNet`O&+R92ghtvWyktvEs`XJaRx}=E!yl&He!?#!~YvrY>?v2B17GNQOhDU)+eH{FGr(#DsW+=b}V zNvE7#^G?z~h@xih)p=H5$Qpp^(y87 z_w@!sqo>!7j&+r3)0v1! zhSPK%32KMRmdms2aSxc|5V6H>OlIWzEvcQ7pSyPIyU)YN5kKe9;uEVppk-Ts6N(qn zP61K%)a-J^t(CB6tEdF1hZNLcQ5(yNhpkl>{ZJ2#_AoM43dXf1p>**onnM^S24!o2 zu}>MdL2=TlIO=Mb&^$D#PtF55dssC?lHI^{U@jJ=$jc!$ zHy&#sz^%~hSFt`Q{F1q272}&&f!1Qkvq=PPVpDoyP^PLPy7H%rZm6sVGFQ0YLNLq8 zJqD%McByJmdJ#jihfW8|ilz(tcY5{=d**fIiL`{Zw8AL|QkX;OUHX<2r6U=ilfaEq z?H1?CsA!>=pYJkI<`q9WwEnTBB8Cr@sGGAoako3?mfMpaI;cW)j@Uc*i5NGEcU|i9 z;nYAAhlC0itIYjSaJ$tSUUdH7?{m4_%tl8Dad@WZ>MIM1j~XkT%g#@sf}cqrw7P)@ z{Ag%*fWr`$Dbe=XpZ)+bqxr76kJe&oi>{)0%~L36dbL?ACA}jhgQ1iLbj;A=T<-|m z_-9Dw!;s8!G7PwDUS1m_j=N+6Rv&Zy{2Y$GTPRi>_OltDV*N2mtE#Lz%PjA1YLg^e zvY4P#fOV6GPtEjvk1A`Kp@lbeoR=_xj+5@>RYr22$EjzzenWVJi#DD}x54kj zsMXOFe$fKkl=F_wo?FCmVkOGtELN`|39>Q^%Xhn;L7Nw<5a`L(x#1N)$Mqy@N_^xC zFO3;A%d!u(vmP~J(R>fUMyr~U?`IapYc7yo?sq>ZtRd|=&^?=NQg?CIo!}lAwYG4~ zowyPfApM5h)8_uV_jGi7*Y+UtENBkADuw(6=zF1BpAl0_#t(R_`%1%#3n@t@%&NeB zU6l`6Vlhw|;Fs$(=`q_fT*Vo%&@s>JWkaQ|2(Zb(AXrtTRy)JG9HIma8!0J4eFc|+ zij+q(6gkE$)rG2hu1?^i-R1^|IHfM&$1seK-%$%CX!+ay9SvnsOwvpK4P>MB@j$Qz zn4?V6I6?}GzHlbp!@Q*Rgn=+hWYW9bk_wD?9|@MEnp$XVE1h=Ig z^;%!bD1<5ik~I%k)0o|jWF+5XTUFa z&Is1lWVd}deCA0H0Qt;F?4T-nrq?$zW;n3OZR)XAFtKR4v{YT(yJ5WojR{! zsxv3{!I5sm9k}R1Ei|X@>Y%&$Htk%Q(3Kz1CUgo!L~bUoaKsdw$I20Ps#&LBI=gTe z*LzCnL%y)AMo9#5qg?0U?(Ig3TB*^VBN+-a-j&RziGT*z{a;0mP-aBfOO z6R)T%*rwIrJvx~?uKbJNY@oQQB#!ithMkIXUHqDkmt80|!!4I)bR$NJ*G9G6JPTYM zjHHAY4F57*c8zS2u!78rR@fogwtBdwF1?J))Q?r{D4f=17uDI02@D-3dg}=Z&1iJW zgy$Sv?RjlzQ#|(>+7uI?W1?0n@0=BJ>8S;AAdfuouxPoNPZ8CP_oy9`T^(e^>m3`l z?vnFPqta&%VEB+SbG+Mhi5)RAI)AE97$cbHFKUB%0x=|pxWHEHK3YfcwZzlDTfV&H zWTl?-Ue)=p;Z)Gn6LDLiKl)?#3pi(gAki^#pt6f>G*G3U?Sa+B z>-0tr1$Lmg7o=Ce)pj3i5USeTx{G(^P7SRfOHL#htVuM9OeoXzQi+3N|L3!OoZ<5G z?(2<8TAuTr1gl_N-pqV$RiMXd1*UgzIcp;o*f#dOwuxX{v(VD&Y2ZJz0=V}lwZsQ_ zaG$JC59bcQ;5b}}3vqRcCFCtFg#b_G3ZDYs6nQn^=<;h9tdg=in9#jg>YGTO`o>Id zck4IHsIczVIXe2XYO|Kh8F!h}#z};8-$<$s8CTYR)0Bl4wbx6UZ&4iIvV$4bUS7Lyqpc}`X8`?Ta6wOr*-;fQtDdqgL{RaN7 zJhFr}aIMQnHUG&MnAH|x;iTrzug1ulq$3~Vq8nqHxU{9R6w?B# z^HI5W)X^i!J;gCeS4JbhEx=Ljo5a^g5FuWukC4^k_?|JF@~B3W{^-+#lwu~sYA8W- zWPrdOF3>FQX`N&1zRpJ+Q60o!zF<#2Wd9)n>KXtZ!>`n;$w5*U7fUcBR-IEg3rgp! zB0bRz7!gpCJM%$Dp-(oSiX)r_Dt$Kp79l{(2-=Z#43vs6xqa!Ba_A8T7pPW*jz7;* zl8lMUMc)#T)fr!|9q6+M-x;bR z=jK%h3Ky+|Gu#f6Z_|}vqeV!hAK{+}`!Wqzp(Pw5lWCzY-mm^5xCS>&WfCA)BBfB| zQh68YsS0uc)bfIkNU3eY>WiCI1sIgAQkb8w3%-J|(|Hc^#RSnep0F0Q+L0KIH3kV?|r%MIF-?zYy3xwMbp%+g3v;k~O zc2k3J`)TM9ep=-+1>;bS!6ow;L^3$-=~6HO7bU(<+^Osns_XBB=Uxy6QXo*^Kv?=NN%h|#pFx~uhRD?&U`VO)jJDH4)n2hX|qgH?Gi$bUa0qXY11xH z0sumSTn3>3%P#*iA=RREfggeFGKV26wMHp!P$4@#uy^7PTITD2vp1ytXUN_=`KheS z0(b*bo-)BNHUTvqq&!0`Mfo0(T7dFk4+tr5_;*kM<&V9BAY`wKN}k*YwjwMjuB2ck zU;)~8ddd{BX~66dcgc(BLY@LA`#di%Io<=$K(`Wt?&m;rgClvR1KwjLp&M7O%4{HK zfjb4EKs6shSInE!_S~8*B zL}_V0X9kzABUL>q-;!Ijh^sq5@^GZ)Mt)~gc`7>Y`hcOTHEx0Nr7H9Q zEkpv4h;)Z+!mc&lA2<24)gVcabI~NBux)X+wB~$Pmfjun*pW7e;IPH*(9_@!3~R2d zg|gvzjgf+}c)+(er=`OLt7nqpo*ji|l8d$%*KkkOO$qgjs?t07a$qFIq|vyFn5Amf zj>4r76sK>)u7fq7KqW`UV_x$hxh(TwBIpE92~fQCfE`)K!4xi?lW>t8C8Yii0aUqZw?-5cp~tanu|7a^kBS z!Ni4fABEfC8{TQj1UltVUi74(LqR@Q-)Zxqjmy+w>TJv>IVSb*XM)!IscH)a^(c0S z*~P?MfAd$+&Cz&gf3@jdkS>M=k;xOn^a*|;S~_f~)|_~RlOLiUoQ7+^rf?oI zaT57L+D;1g9<;$p-LklWypQf2-wQJ}7tEN#4RDPIyzf991%(a`umR&v(0)fmCAVc~ z>4w?Q(pDIZS@;=PHPwbbv~tpcK?AlykIA_`c^xH_KW@_u2UYbuQJA@TiWYy_v9M!; z(Yf_)E$R@&x*owApkoDQW5h3?1KON$iKvGLb%?@7jhdE?l#HzmvgS4}#@fN^hdYII zB8D}TltsRS$3QJj`33PN1QcFS98(1S4hWB;*b7Q*nPTMt4b$qV2GI&DCPBg0w|^t% zK;qsBsFhL=K)>DtQltbw$OFK}^}u|LY7|5Q=^$r9Gi})a3*QD!e4qpgC>@|!i4cHR z_kzp@(lFI}S1IC(!T|fKa6)qaQvn|XE~eCuo3cT(L83hd34%fqzlj3z#Z?iAp@3L! zbo4R?u8;s==}kYRH2{1;{-a8;VK7k;dqB;3$kafs-)y+RL6sWd>s{)H!yQYna`OF< zX3)Hg+kZ{(eTIsQ7pfFJ|2BR7YT9%0)4$%_7+A7MZe=?NWi6_afWTYLxnV1?D&$NP ztnG|mzT>vPL_tC0N=&dH`N2S7L7NrHWIJZ-2fzGHvG-Gus|B^UxM_=E zPeJylM37|kAE4s-g*W~Y-YOg$@Kw3Y7Ok4zio)s-ZW>AX&~HG*QsD7*8X@DisTeg% zAZb{xgL+_7oWBsMpauL)eM}Vg;yyJvHr@gh1Z?+jgK}cvR8ZNP-Ctr;V&KwYQ$=bz z{Y^Y-oNys1PFmbv{{~`S2sB^&+@ncx?37s`Uo)SlK-&AwEXD0qb$QG|Xi;De)_|gX zCg$JJ;jKCqa2xRpWBd(+R4Dzz>PCE=T}*qvAoc4_3VyD`BAgYxJB8m7Lr zB|N_H-A8CGe7|mrg$r(ZdwN$%z-^gVY2ezg8!Z%MPD@RN`Q7XgF>Mg<2Uf^JJ5N|c zi2Ch4$P*LQRc+HuTk`>A-h)4!#>akV#b5HvYs!Gu>Q=yiHt8;Ra++o5G{e4=`AxB> zf5f?cBG@kfis=nZQUVMJTkc$_rQW&?v%&7x_3&Iv(cgvr83|xXQ2-Lwb{gxUy7M1| z9dz3JSFmp-n&5#ef2A&D>W0^#{J>HM%}23ESNSrpa_HY;n73*RHYgt1Ln<1N0V4j* zcdzW;fDlt^0s*;&;i$#}MtU{XjY^z+3}t*^PEi=J9d!b(_ODRx`i)lB`KqxHJSfkW z_eQl|q0F--q8Cwo89oIJ2~p$|pA3J=!WHGzib4)C#!x`BhKz5xd`Q6fHh_74^mB}g~& zSNK(-0q}hqDlRyDti9Ttx6q`FU?B+%OM&jz6GJi(*ohkURv}*UaX8G$yDZ_>vmQn zdSPIc!-2BxnX0!C#V`24XEB}7e6OstDS~jrz_15w`y}HhDo=n#lPlkN!*r&CzdSpT+;;p=d2DEP3;et3wec?1;oL9I%cCv} zMHdi~Z{3<+PEVDO?tHLQA@sh-_;VcVldw)nQ&d2LH}dGB8uH-J1A=@BI8%KS1Se37^IMHvyCJfm~1YZuVX|$lymcZI^%Xb%8(pBVh=zf2BY;;|M+N1{m zMYd|&6fHODpI6naMH^mDr?fAjX%ep=oSYelTQ$G-@&dT zN#}X|wOovzS9&(SJa%c4^pka$B5?D|r7>omjKq4|&)E$?&?ig@SG7Fez?Y`~uoF6R z1Sy~p+GTF;#3P(W1T(RRGpG*MtNlrEm7lyb%UTzqDAb3c9C zDLGc)X6HGKh!Yco7q^l)y-5PioQhWyqr}T&3SO9{Q<{Q1GSz8>eY$piccm3Jxv{Uu zJT)hHuG6B?k7HJ)-pg@!PuzD&T5aK;9pmzs-Rh=+sc|1Bn6(<(%K{tFe) zI^UB-o@q}HENuv`KaQ;%o+^i3lat)>-ia<#znHP>_0zUM<%&`d@yd(o!B(CUDm5Z> zn?utj_wmqVs#ns`yeH9>w7fn^`#$P6Jq=9?D@%F6)m=2>Z_d)tm=t=QapYE{F-dp> z{#n`Rj;HA-)ac|ScRslBVY`>3 z;t?8}Onte-|MI0ibI^I+2+C3~o^_+Sw10=TupBcl;bJuJ%|9A%_)=fRE$aNZ+x)o8 zDFI2ET^8P|?r(OI)Q*GKE9pkX6zYqvb^GLp&j~V$PnS==UOxFpIfq%CCiQjC;)zpT zk2)vMW%oDoVC(~CXg27DJ|p%y-()H=a9@m!;zuwx{ke8x2<4@5II>4ibP-5!i<9Rz zCr?Ssm5r4ePrp*05<2QO5_|Pi;65<+-dN1WN@fSkd7WAJ$^g-*qMw<+MO}#>xBJ2V z0zY~b5Dg3jufvE>)RjWCebRSzUHj*1m9+f3YW)HJNuU>(JQn(&fjhmq8KFPV{44*w z%v#HbpUdY<@d>sfhY_SS-w4JnOGrtZI?(SwSQ)8$-CI9DWMqy$_5tVRfhVgDQibPl zzFc{?vfBZ>8yEa%?CJKG4(Fu&Kh*TJ=_iAFI(Ca^W^PrXWvu?p6aRuzmpw^mW<7rb=V4&DH#r~Ro4;~K540&K0CJYpgXEF0B zXc9R4*CYE_E?zk1ap!pSu!KbJ80J24wu*35VSch5K8N^*w7cJ$rdYFhSSuuCX?ke7 z@#NW;V^QoRrW1tSvd>IBWfg~BeZxHW^WBGWja4;M#~B1GzWF$;Z*|&0>}kLP4T~y$ z)V2I6^^45>(d({7Z>JU=&n#KwaPLt35KiKbaF#j3YZWhD4T}?b^E5_nW$f(I zGM4>vU8_mFyp%ephWsF&t?TV`oqjpP+AQM3yPlKEo~zGG`)1PKCU~BqO2bXqy;ypY zE$Uiahnd(&r{{^E{!i!GINM}#3ZIY+aG4uFA&VTR*j}6%T>#)5`uIzj#c#x7x6pRTp!k*Z-qNIS($z?d-lh?h0gps zA+e$3=5q_(VcE&P(a-LjT@!jAXVdzIiG8Q-wh>NZoopK=gSh#ptUGHSG_QWos;$WA zWCj_lX4T=1Y)~&K5$0} zSD#ZXC)TOTBuSiq`%veiUF_E`g4!e(GPxM<&^@_kjP}!O*1D$Fh7pYKzo_4o*xV=Z zJv|kse&O<-NNMCb!_ZEEfN1SRyN44 z)=4M6JL#1geDcR9Ckcfnaen_Uex565B4bxlb@EzB&Y`0!7;kiDTb=N{IVR7$d$8iL z!Wi4-gNU3TGc&Pz(|Sk;xoTv{`IN{)6y2hCtjkJBK+Zc+%bCdsVXc541{N7$F`f>=t__k2ho;ycfrs&rzM= zWQ{!7{i^2TXjBomxf3$v4H~yf@9_rVte{!?L|M4DO|7voCmO9%#Ea3)IO=d`U|C@I z?RU>ZI!#K%k3=u-eCy$3<&+PuJ*wxbWOK!0nq4&Inrqdc(ehPs9ECI!bwUe-=2w%Dhn=VulM%`2jXe;c!`YMDh zV6;Er^l+nA_crnQ;)%Sf4ANYjU0n#HU^|ZRa|cTx$u}5&C9v@KEJnKGLlz5R$v^_h}CXXE%53h;|Q9V*CatPhF zPcgY?P^=2g7{?hS%*Hjt5vo0fjF4F3lWNuo`Q@ zdccQYhQ zk-l~dMxAzY_HnR24g>0W;zh8uk+=!c3=^&uhY@(B6mrU6(-qD@uM zBh$N`<$7vjvPKhnUo+&3u3<3AvTY97u~rUZUuvrwKVF4@NyhD9^_Pa}Yqh(aslj51 zO#a@fQc+rV{01Xm=elrcN~_&=VoWN#Eo1bF!@LAOGuYde+D4nPyecD0SsgpG( ze6m;*DX>#V>KOA%=cf>p1^QTm8!3MYK6j6luQHP_fP^J$6V0v_)+K-6b1lXG6k6c( z(j&JuB|HMTx&Rf6(Op98wtpz$LO;OcHQdN|y!8__md@5+=f=x7x!LBr>D8_E>oaxT zSYow{v`}9zP|IIXpD%F89$n{IlZk|5q(G$M7ALbC(46(;eGaw4d>MiV#{kvF4|`i$ zJ-i+IX(4-bET(I&f}CCVAVhz4%`Z};&>F?(ot@GI6hj49&HkuU$s588MPH28k0Ymj zS`tPpx4U74%Nuee_IYs3wV}1%72JCiBjAaziAFD))AY1fd$QRZoR2K#_!(Z3*UnF% z%R<60757E<4qvoXh(&`c#O>p8Y1dVu}W@sAAF!o?BXj3p$~Mg%Up%8s=bBYbFA&4&a9Z)C^Q_yN&R zqGysq-~;OjsX{58?L1Yp9zInQ*js}Z_Q6I%!y#@60&^SH6M2*tXvX5zt5AB->y!YemQm|YCok60`+J$f7`VH4G zg)f~IwOAh-T^MHah#ZRis(}o7W`G`2WPOF1sIT7(vmVBlpcUXtPIe(Vg4z7xxej%s z0uQibf_K*()v^KI52$0Rkkt}Op6#~1+%(%G$DZcy9PVvrOyeayF^tPqFgpDDq0sp3 zqlvamfxQaYBy9!Fb=P76e%N&}pBH?}b{`xkD|&4fXUloa33eLS4VPr)4Ze*;o)~k0f(*ew&F+;zr*JbwlEobR`dr=DK#3Wbu))U8(*z^!Ir6 z$qJ>OVHXm>6eSMrI(Ky$I`n{%XLDt+8 zWEuR)FVCw!qnw1iz8jLBm1$9rJbDpC81WyboY1|g5b1|d0#U>)cm2KT}w!{E|3q-W*m<70Zo z_hK&)M)pTBr5bDKX~v#sj}Q`1cRKRiNHnYMoxjr;MJ6r+-H1yEnlWR&=Q}t5f@oFl z0($He>jb;4!@Y*_q$v1!+sh=al|?PqrkmJ4X>)};7GJT)!r-)Wf{_e;w=w&0D_TpV zEA`Hw27Oj>!m>A+n-HN^^6Qf*#r;oSx?e7`L;M`stE_dc1@BcF%8ho;n&}Z5O>MQ^ z-+pMkO}G5sYnpYz3#K_z?m8=LI{-fyJ-?f>=)tP$DP3D5yp}%Hp&hCtsOnaYc|Lr3 z&vR}V0n?Q5bt2T5aL7~l-Li)ku5ZP5?McVW$)ylRtlfPW`i^ngLYso7W62qO@)gII zGw&U+Vzq~an(u$wSrx$ptFpqj<8@PCzm5XTb4SGKgF}{X>1B`}NMjezma{R6b%wOX zJiDafFWZ8%ztU!#?E0)SER0<7Jhkh4j&*Ck*Oh*5bltaw_`UZEJdZplvZGxiW1Hy| z4yCP0=si*(&phpoCNWh?E88z~$`*NQh!eEQ3uOY+!Uu&<=q1e>_#{rkj2(&AH`>{J zM0fCG)YFRedTPeaT5>F0GD+X&OCFX3Y*rp**S22nmlf;Edv@#|X@))J6XH?f*PhcT zEg#Y@%@Kqq>AG5P|bzjIQrw!s?UZyb&%1q&WI?Yogl}M!?oJyk$Ib`Gf63xga`VjVV z4B^V630j&S^umdOgWdhZ#HF)RZP|S*(GF}!`C$2>R-Yu)b8`C-Sv7L^$`-t}tW7-V zt-9%${DYIPn@fnw;&Ks3o)6HiR?VccRs_L6+r5Ug5v2wCE`2mfV$`5wFmEXT&^H@sNO{GkV?OQ0vEd%7>i}4 zshn$GEs79X=xadaVWWrc<<4+pLgHi`}uv|KW3iKbKUoS-PhrJeXskuf3`hmAtov( z%EQAWhB$l1j)#Z$4*1>sh7U+i6`~WsLnP?z#p^sgyQH{3-aa~ZgNH}zCF0Bv_BTf7 z>8C;_&+`1_N%W2tq`G=){UII1OKsw3nu||~pOHS1f5WV_W{NLbNK*n|ziNK4-*m#m z#f4w*g4E3Qvit8;(h6TjJrz;LBT^~KBFIz$RRlU!8HV6X`|lzt=>N?6B4P=^{D-s| zgKyUY1Gb_niO2i83ie#|5AR`r`?k_^rRJ@_qqe?3-9@orzv9@^;LJMlYTYxLNSs1Y+rz}5Ae5ZY-P&zsm?a~3An9OPD&)>n?iPNKntvn8qhlT6t?=C9ZF(*0qvD_ar9CODMD!BdSl3B?4${SCeq{^7Gi* z-ph3{WY4m-i_i+{3b`I@J04;2s<}XjgeI0fmVt86qKbD)kCK0PY7SjD9qL+Xf3&f6 zr=rr(yN>woeeCnnYARgooL7G5NXWjiXTg2}VNJ~>m$25*as3X9?=|oCzPR6ZY8OEz zRw19XShoe0hFo*lfIcyQZ*<@seX~WCuk_!qtw-;1IAuz=Ng-1FsnOyL)5dseiFxJ> z=3)XiD0eECxVKNfl+ z%r9~Qg|qJ^7c*NH`=`v9pKL3F$E3A}#b6Er_8Wi9SXq}1Hgb0&2Rq}W`{|`&StX-` zPnsQT?K$Sg;~URPSIx>t&FV;tOk<%1=YlJe9nCs(wSFnL>i9^rvU5?7t*l1k&S57a z2=8K3&FTid3S}=kE2JFpEd6e~rnMid7O@A}`^QE&l%s!cKQgS=@2!u4gv#u~lL_-~ zpHTw7bQ>R*61B-@-7J|uePSadvi`~9p^z2A#B1F97A*0=12NL{+Ws#!f}QNyw__iQf+7#$ul9OYk#qSyoU&i?mqT;-zxo8d;XAia9i-Yf8`3f zyw@jRuQ&W-Y~$clnA^sX|I@kfo1?enSKoXCBO`^~bwAT}F2=4h`8F3S`AI*dP58fr z+Z2S;HO6_$CHfrrYe$n@KmR!H`)J^kk<;`3dUFTY+}c@(U9?HZ_p%Ls7FV(f`{!bK zVaKf{2b#t`BJYzIYB+Ooin<#Dor5&N^#dIPn!ao%AANr@=KIEwvK)SB-6<=+EDABy zbpNV*#+~rm+i53fo2L_(?B?Y%njCY~17d_&9?)R%g*Mr%xW{ce(TJ%7#ONKPR z?n54NyJ@h!AKBR{kDApGdxjyv=G{bZBbHQv26WD>ZACcCzBcN8ev=AJ1W6V%xTkvw z=!Z{Tz}&Iq?gSsV#k`sej(6fzTC<3SO?uYw&>JU^wIy&g*-GOnV)iPn;I=reO(w4m z{--^u%(1~yMGryOGxD3c_vXP^lGRu=*~R`6N;KOms)Hh+3S*8DD?iTbHs-x(_90Gu zE~;c-Qe)90U0`?-WW5Q}B14O3-XUt+z~htgFl)8=`8)QsWQXsx$bXI9vrL&?^<7(JtZn<4=#6OiU}0T?B-+wwJ82`2XIe)^ zh}PU(rPzEIhr3pAHw!n!AA0lS9wb@{T4)@N4(?X^b@PvzMYouxW^-uGtlS8ZuqZ4I z36L5p*QggWQ-6iU7&= z;Sk!K`DfzW8pYE)IbQSIh2#SksB+^h1+;51!9qZ4uf~}UlY>6C}#GJ z4#*xri-0P40u5i6aB<4j2c$U$*O6bLi{9Dn>k_Uquz(9}Qn$P`H?2ELxZ3_kJ6OTh z0+bb&=>BCA%D16w*5`#FH+J^u(FzZ7u1b^~?7(?v{~B|!oisERuEy@mqq_6a;2`Z> zb-*UfQnuIMCV)A@w8ApYzc>9d2>~X+O}X*<%Zjf)?fCd5NIOBW&OiBg+_)g-OC!jU zHfU;hZ#VWTb0a|)CP^cF62t@Ja>Z@Xsj`B!@E4;WbDF~%&cy+@?;57NQ2eZN)RFD> zX(v^a?gMphVzhuNgJ_uzX`oJF7Y%#I9(fD|!|r<}(uUyk0vmCLfND{wTGpTlA8<)l z428KDT0a~m&I>6giK2oOP&X`To}avv5q5U4lj&*czdw+N;Fp0n{Su`1$2B_y096A+)3$xS%&N#~;_ahP4aq(o<$`O#J6Qx-OYvTIL8nUY zpn@BKR#;@g2|VKGmJdmj$|}NX`iW`j;}7z<4|!y^ej<$@po7!|)AS|_s)DMVuMZ3A zU-ALTm0MD3?cDlMEUp}KRv3WxenLn_B&qOnHrPv3iqw0mQ-6k~24Gkp-Hyp?Zw9Ev z*g+hGCc9&Z}ys*4k5%FRU^&Crp{U}Rr@qWo-WpC(RniWPZ z5%>x=%Uu`0n4Ce15#Kh>4W2N`)TILCF-4&co_v!-j3(Hg50MVyNIgENLQdya;$z1M zC@s|m3E7vU&d`4x_-CCUf1QYH@Xt;=6-Z4g)zL-pR0`zEdOkn zNhrbLd`+->x3`AP_iArw73vmn(J*je-xAtyCQUfk))Rt zT7R0E6o!gEFOKY+(V$;STY=zJe z2|+`ZYZQ}%q8@}3Sf=}70w5ka9@L+uV^1QK5%abI!-~7-JGA}x!2y1yTV)89>BAQ@ z%$zvCnH=m`hzLmlY}!CcpSi~@9Z=X3J~N$b;mfkJddB39o-y^TXGq*x^HR~kJk>M0 zEHN*FLRD?1rF-H|hc^D!pDKHm_I~Ym1z1mCebk5rcV^-G{ zqJf-7@qP*1$9(jv?Z^X%DK3X|?FKeo?*uCP;xT;#vS0emzuCoY6<^H~_@`O+tmz}s zoNu*#7XwG8Be2Xlqv+_)ZH{=;9A_nFvv?XPBc1m&C=KPkQrX8Dy7iFX%mdeD*m8Zz)S=qTnS*rE@;y5y zohHSnMPr1N3RZ4ttzz0A!{{Ta7? zD>RRi!O4t|=3PdAD3!~N6$lvx{{A0b`tEJ@l~Q@&58Nr&60EXw_18dHi$(hmZD{Ii z48>!98OY2OLp4W)i){zM*Q0L$HTAm47ZW37*0~=x$HVxMIT2G3=_kz0sfIP|kQm5T zy(sC2#g9=BX6Jg^)@*^G6OS1_)$rSd1h+%7jnj9T+r*BKtiBv!7qEB67UMbYfHCbdcNmZ}f z;_sZDO@(&@=%`fR1pn<-H#2u53DV&jR^O=NUrRZg!XnQNmmEC=HnX}S(xMDL=Yg&N z?I2H6+rhS87o)=}aH(dFmRhdY5UM!fT{|vMz@jbRiR67C;KrE}cOe;`_NMe%_l)Ji z+;*A-GR;KWaV{+G)DX5uG*i7Py|1-M9o6R&C~ji{v)*bP{@ABzCEUDC8@Yzja#?o? z#B$|TIF+q8XBTaiM`_Xt>U93PY{N|o>)wGUI)R%yx5lbSZ;Uco@$-c9yyF zbbOV1>Co;jCDP-b;-dUOg%l_!zyf@XWTQ(tV^rhl$nxEcUgQeZJ)9zQUiOVne&*w2 z%nw(@kx}DjbywGx-P)cGGj2c76h>&!Op{<4zSWkU4*8(~eAPUV^E9-c3_avL8uOeh z3lY0?9BBX$>MVqGtL*SGnu3IWfky2$W=am|04ho&D3YpForqK!-t%K-jJ0=NJ=+z? z$m98s?eo1RrJ%y_j|x~M*LN zIYV z5#a3$Yhm(>W7M7LyP_8vB>98@a-LuR$%m?Sev>Lx`cDVTbiypGlfQGhK`WrIHv5$K zN?m)e-|kCSo%@K7N)lB{64Ms3B+P>Ca9bg_AOcnbU{x@mGy$K4cx3|fR+c>QpV1%r z^h@Jmr)iT*FMT6l=x>c)7><*@^((%S(8q!R2Vj6jR>(t~$?IMle4(45sSCPF4E)Wj z+F^~KFC(NP)_z~G``0JsOFkc#@}(SRT2Glh5^Uv6rj1AFOZSZEt>g~9OuH;S*1YW} zz;9pH?T{l#@x++Rm{MA*h^~m6S70rmjo-??$BlMUx|Ix4K+|GJ? z+h`qvRUVHvJq){g3QbX8Pn&XZsyUF9@z7FEQ1?EQ>`+6Kq0$R&vPoaPuq zrQe)RpU<)u=(4*s^z#s(zF{2d&4U3~XIHEp@sf-=%C5ios1U9jGhjqlh{YXQf-#}CC%15@nYxixc|+B`eXWSkw`q2 z)K91AkjAuvz}>X<>JW8Z zbO&({#P6Oj#RD~Gy}jjIWk@OlEPy#((bExY3xlQ79P$vY2L+lszjiQX{E?8<^qsXw zYN0s^y9juFDaft=8AOHke6S^kj(RR!oPeZO3T+K>_hBleUlCTRy-?w5zS>b+_Ew*_ z8@mR6A4=&P*AK^GX&;&R7`O;97SZ2n329964eQS7(=&p3Dr^yGv&XlHw{@A>TvSq6 zm;|2mnWP|IZSv{=B{+f2Y-Wwuuf3({zc-1+e@Syl8Itf>^C`_8JvT;9c+q4dJ#L|3 z{Ax_xyp&qnN9=o_jKki!R*OTH2}|TF9<|Qg6DG!&LDt;wTGMPJHExktRHtBko24$g zS(`%RSBD<9D+6;1#_~M{ro%yVR}d$#ZL+e4oZA~xEoW2N&SW6hc6nOnLH%O!>a&PZ zC1C3gE5iY zNsroZZ7r#R2fP&Xmp(;e&T1i~5@nksMD-2(-fcC1jL^GkHGL~2KDV72nhb&D$Y8S% z-!CXR-@6~@RkQ$aj4znDxlJ-w?D6h;4HZ^1{`S27)y0j|GK71-rPB4WmgSzbRxeTZ za-X9?MDkwS-xu2VRj(Q+|Lwy2$nHR@rsE5mFCQNE*%IbmBLz8KE&$HnPr#2~P=a+x z6Q#%B(w7_%Lb79`LXv|vVS!s)wh;`JxA-L^!18Vp+3PV%BJF)x zVVo{8Ip%)002m$ZY56VIX*G&9c}(kB#Ie(PaXT=$i6ogH$_1~2E9E;RHvKGM-v1vU z3pVnw{8yDK!xlz!_<&say4pBxlO|Z+kXpo#aOGs0!PpT6vCiv3W?*Bd!?bmsmnbiB zp$DO&is1(DSSjfg=LXZ+p>%xII3{oH=@ioNja%ZKuvEgT^I+$ZNNCZMma+t^bA#kq z%?{L#SUp_I_$fk=N6+IphHK~@R_$p{sUj~O4#kMEnNOf;cRahZaCwMDbM_%Y+;s(T zVg9%}p>g#!h&JE2prfgv4>GqsncwZ%-|`)42(4D{H(xH83est zJ0t7cXjH_y_Ktg#kabaLoa@mo1w(a;3b-T>g;HRWkLO{74Z{)^hk6qi`lt^kDmjPCzg`_eSiA zew<(;v+RiaZo;6Qg`f6*7AikvU#F@=u=m4jz=cN?M^kwWtSRge)JGS=Q5-3os>$3Q zzgj8N1?uWFIyZSbR1Jyhl~JbpGFv8+TElToDH0+do=8vwAMJ594^zMfE3FOOu_ci; zg610Kl-G^I%bF#7KOb%e1~>ua-sHV+C27?@jw;$Y({IqBaVz%E35y?(W@K1t*ZMRyEPD&vy3ON2}1jG?R@ol^1g~pjR3AL`&*`jh43kuC0H^XCA!K zb-Qis=bkK&)myVE&SlRVj5w=40iUq&fLY;@v?5u4liq;EX~q%Ge37Z~CB^Sj6*dWw`>XE)k@+1=yaKJW_(ivXH>T`n-7CX{;hcVUQ=`Tuah?ABIwUGM6mU2u)_#?zqJ<5eMj zv1$8tGRjX!ketJoBtt8VpXt#lFOrW>N)SsD*p;48pw_0FtJgShc^A$rG2L7~4Y{36 U7vw>u&A&zqQyY_VBk#-q0J3diasU7T literal 0 HcmV?d00001 diff --git a/vignettes/inversionpairwiseScatter.png b/vignettes/inversionpairwiseScatter.png new file mode 100644 index 0000000000000000000000000000000000000000..04f43d16eeaa34952eb6320847e0831837b339af GIT binary patch literal 22437 zcmbrm2UJs0*DXqi&?R&M1BB2NLa)-K1PmZmnnDz!Akw>nl+Xkrgcgb-MT&qZAWZ=Y zAiW4EC{21%ib#=uPtfoG|9i(9Z`|>oBRL@_`<%V@+HK7_*D>~*ksbprH!TSX34=aD z+mwWa97BB5kOOD>u}mP~19J^&t^<6MkSLRoph=X07nTH#CP4$o8sL;NiL&z9TNzuU zj2%YLK#<^kDjPw8&Y)p@=6)EIo* zy>hpI&G6@Zp7Ir>D(q`dA8(+BFTf)y|hPmuvW#H@VtKX-PyIJ~~R5bUTyd zk%x9C(+)`+Q(RRToiGWl`ab* z4gO+!m++et9|W#>a&lVJI?&;L7;+E0UOvV2bkuDFGZix(=<@O55-eM7_P?i_AHQi57O$Yree^&b@%ePXS(2;xleuK?tL%3+g}9Fj2teN`|924$hzb9 zCXG)mRbnptuBZ&l9jlz||9pzNJiK%s5C2mjS(5xn^`P;5@UgcH$|3`FuPf{I`B(`z z8s><_>2LGR)6FN%qvHeoraPK<)Ny@Ze9;^9<9udK-e9ULn~EjdiSLXPBVIjngO!N^ z&ZHE_mS38R67iCr>G0n8Fh20*NxtzeMTLezZNu&V^;PS4)6`bHLK>uf)@R!CnPMz2JjE412<@PQXnv;N7cm zI6>hEqJM})SQ+rE`zFMyJ}kc(4e#2Fcp`T_vhvO@QA)mD%8^8R=F$}vDnd%JBJ&HrImiV z@djU8Q2nH-&_bc`N3s2jzBo&vl2n|PI78ztwYQ*ms|W<{%XggDF!*-r!!%|6a$H+; z_$@8d_K6bEFxeCJM+F&h{#x6Y1rytCRzf*1j7(cUwKWTrTXF{xi|i zjH@zGyZW?1vbP{A9}}4|a3LWRnWKl+O4hgje>(WzBgk$Zl(4oR+<@+tz521>R%)Ts~%#W+IagRI8RZY*)cbcow z$H;@|W3W{oCt^2PqRDQ-;wi~*!BSGLkUD!HV}>PLLh&Fhx_3A6|+$$^T6Ym3@ zm}PP=Qcxgd?j3f&Aw{Ns2g*26l9SycP!$_EX!K$#!%3yUy2L{_%W?$04msXKiBuY_ zyKo_dp(F##_?&S_`@TH`MTvbTM+Jj)+M^@60<>1p{x^u>#BSA9Yi1Oi30es!HMs!N zYcU7WcY2{d5CHTzhZVQsa(p9~%bPIU3%W6~U7F=F zr!bTMUSFrczIhdrH5ukrOW~j;3$)a{Wffw3zgYs&mK)+q5;Y(0F@{8N(8pZr)jn$> zEma$m69gPOuP=VP9yvn2B4T;%W!CHFXI|6=?6Gw0QI?}|Kjd&vN|pOCb~>*@ZX}wz z|L{W>$!Sg)1ZH{@o)p?w*xN{cLU!IPknYsA=7!l_GLWL3a5+$<`#54OEyR|D7A~T% zv`7gDB^|s--6FbJwf#Pu8Sng`OAFj3c0wxCE7pf&1z4UX0i3z^&gvfI)CST~e_dwA zl2QILA@v97uJY;V<&~}@r{4vlj9<9KP z-Oaf4K8&l5bNKg*y*y~9Dn8RdgoHlisk!#&ZzrE_l)JpGn;XApA)Pz^l&QZbod=Yt zi1qv!{4IEMCwoq=CO#qpMXsuGxs{R>Z@uJ3~qJyKO#uMhWQan3vX4#x!QT zdw1Qgy!(V_mvYNuHWRDmUo~a;#`GI^kX8ZwZ0z1WX z`|jd%Z$dXNs?b^HTUsmh|8HUs#sDZxDHZzQ8!$Cvv z9LY+$W5TE-`+`jJc3;RUt(%6WfT{T@f|7b+FwtMd7N5Hy97R23Vi2_W z5~ljjy}5yUbo5uqHQ_zu4T(XsIWyH49QMMrBb5B@c?^>nD0aI^1FneYWnbtL?TToC zy|1fKKPF#tHx8+*Woa7-HeNA8cWn1b<%!fB!yEZK`h;(nteQ{^w(~CNWw!l?D+(>*YJYj@Fu| z4>doHd!Y%ouMd2};3&=KEtJxix(2z`n?Z-7CGMVL=MLiYHc}ge#UN@$&$If+gx`nc z5LzP&Fm`(j5d**b0+;%OM@La%Fxi+AFop9AgUzu&ucVL7 zS|bvdiWZOjj^{=_O|@Q8Z#`3)eE(==cF@zH4iCPCzvew96=f4fU40ed6tu%s^i*>6zn}0g2chkqc4ZQRD&1>Fx$H@Wzue;8Oq!~ z>ZOI>n0)KYVI-Y2$-mzGN_q@Zao?f(aTLHkhmr-e%Lg|pp;p2cv$E0g9Z#^JDzz?p zX`|0{2q0+SOW2*cqt5GkpIwou7emunP6zkVBY60VoX3PYD$%?&GWs1p8|d;RGQ-b% zR`2l9@Zqny{qaE%9&~++HVsmluh9>|){(psX?~c=!^5T=`yHEoLr7VFV7*VJVrMpR z?74;v%B{DRN;=^(#=B-z-^wzk@=H|O0tYr=E;26Zaiiwk;%%=JS!>~y7n5W^pZa@$ zcOHBGL7Knqy`WW?h|3|5r|0f1MC(_{OAy)XF=!~|u;bH$(NXu)_1oX$4#$69FBC$Z*G1lM%y!XW9Nxgv6Knq{o1>A5%2 zhux6&ImkGolg1R+@!ojoX?9vd1e%cla4hc_*B|7%&m;3ZFDg%+jjk}_iv4pX3lDle zS@1eKe{z2DsCsHHLo2mnjtJTP zJ9@9e=5>?kqa!ID!)F9d48#OZ5Lc)jYgj8>wK^uxJI+^Cn&m4f-S}x<*>)9JMcU3& ziiKZmHim~mA5gW+Qm{ql5C06nOO%Fg=b$o}Skw>kFmuY8-;h>Br}VL?4F8;>Re)=D zR*pg&tE8w9+qqVg*^B_VVJN3dbuY&5Dzl+=>ctzy3Kk-&P7mU=MN4ATTIZ+hv0CGp zhtNQaSpqHUc_-Dc9EW@SZCe*QF~~VcGAK?Iy>O?uEfY6#isDyJT;vu938owfbsQ{=Y;pT`|CxKYkIh>Hi*h%pKwtIk>T_&r(Rd5y&pA=~V;Qgx}fV46MZX2?f#WMN{y zQD}nHUA#6p;Q<*4ngMy2bN=JIZID(-=rcI9g8HLfI2^S_H0Vxmy^_x3#=?MrsFo|* zOnTSFFoLGW7eE~z0VWq{>12YhLvL#`gJEa4zLlK3x`=@M?oZpd0vlT8k569!QE%+; z4h)exhHW>8f5WQC+vuyjm;&rjlcG&2y=l-1u84%=pp<^7BO%I49qDJh+(>>0F10B4*1^#lSP!y%B?+Wp%YSG#6XbgjXv`nqr-`K>{coytX zx$(t!zm8YRX+5u5CM+~>(2jTyjNFHWY7AW#I@B`~af`i1%U@+Wexf>5%%0F7@`?2UNbcAi{L07)kk`S(;C1&}Lq)mQkc;Um;n6d92Y1DvsAs`= zmZ-hn3|a15BtI6pG;zUL_vPel)Y8dj_tR(1x|N^76cC(TaT_(bJMG- z_Y9ZqE9fefSq_B5c*P^=CF|f+<(5LgvRggAd*>e3h2MCyIqL;8K?x7u6*PWgsZfD) z3M(};<;vv6I;*m8&@K`qy%e}`nII>2F6nie!%GUP`5B}l@+ix?!yW{UN}AyN3bD40 z^7T$Lib>nMR~<%$nOH9?-qL z!yj{rbbQyzzGi|g?O%T5#=X;%7w-M6DSVx2qqx=r5=Qz}tcsnMzVixxkW#V%{oYN> zpKRpVe#Yl*jHB;S$(tKh6kmCg`89kFgaBK=t*VYLHexFi+W-*TjHxG~fhp0;+!|(Yp z7we#5**!}2$&Zx2vlY)=*zRm$IDgrfd2OA9YTg7%Gdw%ug8n+Vty%q!oD)ke#gB{| zF2zTJNHxw*lzjhDs#16Fcrw$&q=P2w*6$??{WQoL6B`y^@XAj^Y#6y$_UPTouY2mE z{G$SeUd{v6*B?ih066?U3n#YZs5zwO6o+H+vKT50kwZCXpkZw!XYD2d-!iX|N8Ht7 z{#$Z;fIA_ac*a4Cm3W4^B$DfYzNEmh!lZH|h}OyEmN{r#LoRy&L%a*W zbMJ_{dEy5|ZAFXHR@k~j5dr*x4vBTIO0~wsZ0+tF(r@CZ^PqwCp2U;_3yfK_c%lz; zwPBrU)0Xlj8pM@@%#vX6!G&o3QkoYbzM5!MaUQ7li?%P~qh7-ks4AZ;?C5uN2ArKf z<^Z8cTL|>NA_w{+&yyE4v+~afp?8x%gpRRCmigSSfHRKJ>)rWP95jC6i{`dQnlB2* zn#$ezknc-z#-L8zqn8$yyS4)uDg7US}k$z!;K#)0@Drpp^FUK5Hg0~GWCaJvs3}AxbkQ!xfA~hi* zP6YMPvu<6D&bM7Km!H{G?;_C)09ZV1A|vLr+|0VuOS#gvOChBZJV^z2 z+A@I((OL=#s^c%rrBFZAYGrr^dN`mf2l)}|^)`Z@E6|R>SSZ`4hzyU%zA1)d$ksQ0 zirnZ+zAz z>cs~g9l)>Lm)1Qld);hujT=rRyHl#g4AiZI2k8h+&mL^3jN0Q@ngebGUv>;RTxlzm z{G!} zg!8K=OO;S>OWHzQyt?)Mh-A!7{%}~q6l@eZwldpvvdxs;ytdcpep(|t^Hu%fIdc8J z(db7j^^Y^L+hM7Cs;q72wR=9c2NHKU?N`%Tb3s3zPENn$uV|#M+gsTAtX|hR(&%q+ zflf$|&{llUx=ADMo=61tIVgIEO9vaJd;yx)oVFQ zMj5|3a`=1i-ChYrr-{*xtAe!;)xiBHdCr%hG7(<%9G(Se@!|6rZ!=|m&9ha3h|jOv z%gRZ7=V%NWXsl3s)_lD8U^**1>bbY<%*LK)-S!D9iGF8tjl!FYIS*ik7byF=aNcZL z**I^f^Ptm%uP6CSVgf$~v!%Ip4ms5kW>f?n=?e+I zbFZS2LTp5q>+>dY6>=(P_pBm3RKY>y1 z%uaj*ZOrhX*Nd%xERrAcZn^5W?M8=lbsRFluO~MN?5JK9pL@qdt>aAwV!7VaGS*vZ zGL)oK+f3IKw|&BacAJ|{KK*p~OZG(RD)=VN`A98l<+;1wi5Rm(VRWah(3MtWHIyE0 zfwS~&6C0O|0)g)0X^EqsMG@s}zM7~B%V}Z5`rpWN$}Ezs*d*U z^4Rz-^;*!Q=12RVPNug*jso~NG?rjcSC+*vSM+@0zJo5>6~^<5T`VE)D&oL%pf4!= z=UAvOLV^fT^Rf>vzf;(F|0Qn8c})m#wm*(qiD`T6$C6n-(H02QdT z^C0+mcO#_6-}4cjZgUq#fvIqPh$FOY$)u7|QM?&W(>qxduSwfzrrScog0`Tz1hlcR7B;_4GvpbmJATFi?Ji#(l4EkD4GNun@!uW^}Z6GxmIi z`CUmn2%~z&&I5o#puMi42llgSbjhiaw-4{^4x=s@i$J!t@Z`N-@#e|GWosZ6$DNJ> z)=$d5oNJ!ezpo~dqv0j)XBD4upE3dVUhxt*Eh>~D9n_=5`O?vbQ~5nd>P5uq%q^yH z{Oh-qjs&-=_v-h{Yzv^-F!t069?svP@Br**jA6g^EzkY0Fvl1jYYno@yob!lv73D- ziG!Hq`jOg}<-G@cn+KUe1RwL0;4F`22Ra782hY2tw<#Aa2rVB-S!Q1+i3r`dNo<_Z zYV5)+c|wl@V2oiV`+N5Z>k+qqwqG%wn57-J`$eEnr;AH`a(fGc0og{vem3jm@sdlf zQ3aJpa)E;Rz($u6?jibZhvq>#J(^^kxc@p-w&gnPzWdG=06gmo$-Jzy5pa+Um~^uNx;i z6MptA?;)1wj1dGHoIAlQMjBfO0Ean)*Ke%+;jVrG?zVAfCjoXgFTZr}RGC@*EC zC+qcOE|anC@1lEZQ>%A9T-dM5+DHpK?l_=RDQH5h%*jBgO11Y_fuiEFGe0TTlf6eg z;qOAsHW2)bLm28zrFl^}za&CAHjYbHxM+e8BgJ1K{X^kb0_zq8wcOrFpY(OMP8Dw= zB>py+okOzOHC0LR8Ze)FYjXVbXg!%1^~ds#${^Ts$VNQ{B?95%xc=nLD-`NpEh`KE&^wB*W$kUVgnT!B5 zg@}DB4e@}xLD)^jI4XPnD_akBMaU}+q{{=_G{bD{RI?x#`mk{b+T;+`>Aho6qv<+= zpG=xQ;#a@+anft9MO5&>>u1JNoi(6{=g$>Bs4AKL5HL2PW}WcZmbSWfcgw)E2?PeN zA(y^Cq#&TDTo2ldq+^KavrN5Mo=AF$$XP@T46K7~QQHP9k5qe@7x(0(H&5@L&rl1ckZ!tEV~jHm2^vy=^pFwEHL3TfBTMjCFt$NM@0&o!C^lgU5gB0qmk*N#YRR&H$e4nT}Y8qrB0TQ3YE^f#R@Ep zSNav{E$@2XI~N2O=E;wE38|tO#;Wcx=MjSs3Kv2jYby)|)M!DorqkujoN_rpA96d| zyhw#OpAMyGR1a{luL`+i*8EbPs|ZC^UNLQ93BH?tL;Szo_IGfPmlKeRQTc=-uKQUY z=SYLji$&+RCN#3P!w{R_e0ZZhn4Dp(pU-tF9(v79T%l1yq0j_uJe5CE@w{*N)9zJB z>y;(FIUC*rOVe|cVW30&fL;ijQSA-686L@H8kf*s!j!|>ZSEMkMNHh3gEG>BI82ur zkyf^%-UPAmf9NSemjJ_`cKX%76P%G1roUUD=|>Iu^=XO5QQqv8ZeEJH_H$S~>SIs( zs$Bh!Ma+Jn_s-}QG|V+q*C*sXRV53Dag`Pdh>G6=WJs6pli1U;qaC;D0nn%#gtBsU zw`fem2m~>F;yFm{0<3Zj$eYj&d9d(VBaA4!x(M%bhtLH_bi=y7WP2D{kUHKI`x4Ku zWr%fRg{#PaCYnq$f#Ejb1pS~PLL@-6{r1eC;y0mz-xwGk;a$$3L4zO7FLI}wYh*er zTX`Za0rvVDc)d}wJR<2DsbgucjmR06=8eb5?ZVQarE`f?o$R1)MDeFHD!Ld{JDobZ zjwmL}GxQonn4ft`4QZ8ne*mKl`38kIhGkr{)T5Jmt5uQjA#0UcLRI+|Zgv$!C-YgK zXx<1WQ`rh%B8&$hKe0*Y3*^VrK&982k*Oa5{OQ>;*EKsFInOvGYA9v!=$e)48JR}7MS zae0G_(oE>A); z6J?rGhgA+LidM{T@r<>x!5#(h5TNv<2<;X>{9Kn8vWSAaCOkJnil03O=uS9<24)8U ze3;njLF0Y^qeSndzCovj9(;$6JkO1gx|OOA;OvDb;sqB5jQgBA!qEgCA{oQsr3m?* zwE45?Xw1;C)5-#}Lr0N!pYZnawVOqeq#0R*r(^m?oz`ZYJL4o_B3Xmwm(BnFusLu6 zt#z{{bUAadT&?Ewr@Ow@w)i(5Dc9kUOqkF4ffRk8#J-?r?E_AG({nA=!2ks2XGT;j zXP%loD-Z~GK7>Oa!+dOweG$xQ&c9nxGW<8-kRaH^`mT8L#U=Y4P5C_385dHG4jN)vgSmK*{RLAO$mOn2vipi0N&nht zVOV#VIt1W)W9psjF@Bn7IT|JR2-OOy zdic8m;H9B=e8CQ z6aBjm7Fa(X)mN;4TZJDKYRfZxN-BclDI=MsO4+8lENXo6pO)zo3J%pqWpX8AexD|t zt!rnN!BOA(Mi)wXR5_8~AL9kKt7$9cEkkG`si zSaF;CZkxUvYb5Y#`y3MAbWx09wiMcd8Qn0WQ$!G_+}=Hc?a!~289=_;QIUc@b1oxn zJ689BDWdKk2CZoV9+K`O)Bm=Qn0m#fCOtCK@!QNh1RyF)pC4rx2sGRVZ27bX2+NO} z_ioHO;|~pCuJSh#{QCtkAB+%}5_9@o%Rb`I=)mqYQAk$?E;#tK_8+iq^*mP9pX(!n zHWtZ@_|w)c$WEU}Eu-l-+#to;_cOg0LzWj2F!0nZ-(lm8=%QWb=Ikz-$rk!^jwuvW zawU-!(X6NSByI^-?U7>31-Y^Pb-#RXF3>JGBWrX1<7Ijl(48la0oT5d4My|vW1Tq* z7Vgr?C(&30EOi#XwTu0zP|A|B*28_utY5+yy4l58RFQ8LDTyeP9 z{vq&F_Uq{q2oC=7CMT>n<`k+i(P?^(Ca`ShJWgMpxwh*jw=Hz;xbyWk(|d2 z%nd+j-9vU87&Q%dz*E^fi%&;`eyKjqX3xUDHtBRDyx;v8?CSMJQ(NO>hO;gj7y3-E=b(`=XsKIi3}A z#mmhI4tq~Q7kt7Uwj1dt^vtWPKkrBj_xulu0q4A`(grI#XD>)H!JyTJaKwqNT1#Z7 zSYhBKHtJ^8=iT8kLPO6Y&&;hbr+p{r8bX%!(MvSpe9Pck@5hE87iAmY6y#KYbk@4g z94Bvge_F17Jfat3_=XaN97|AiA5PBooxIs}`Ta38A(C=OMQ~-&?l@K)Y%au%mP)`l z877n(&{_%#X&6{(AOlXe5JgD+wf}#*zT_*X|+|`lUNVX{`(_ooKY~VqULAP8TNEg+oLHwYP4MOnYxz||+(i#9Mjtn6_R@?In8B`wW6wIop7k^>S*URaD! ztulL2@Fd5Z(y+PN?Kz#!;pa~krzRz9lKN5CbI{k=H2$>T(xPILt`1J~#4d~}rg3Y+ zQR?H5GegwRz4w!Hc%d{y)wQu#5+KmEB+aiEA^wV`&=czmy-2PJp9w7f((d6;`)a17 zN;wF*^#SI%x^wkq!TDp$G|iV0+ZVh9>CeZS4%MwBmP$>B>|YM2^R0c>g%Ru;%^@|X zWF}?HgyT;=!W=K56bt=~DPwVUfkkcA4rM7m*JOp?Z>Og=GdEv#GEt-M3T*grtfhMY z`!F8c6E0*SeQ%E&Ql-o@>Zk|=WQ?m>C-uA$M_sp$`E>TM4LThT1O{Bt9R|4gG)96( zhYip(KXfle2tkg8|A4WVphm#!<)R&flef1DKr()`egNDcK9BPBp9=%10A{3ws z5{K&%6u6!l>mfj^+0tZLM<%B_(S9qkvw{o@czYhp6Hx~_GR=GpV!G^u{rKc2D|A1g z#v{cZ4uOKH;&5*bmp@J)KYX3!@@K{K7blpmY>+R-UL4Rkzp2q9DnW;}?KUAeP_!>+ zZ3dFhF}AHyvl*a!%(db7FHk{01LN|Y(j-4-aZ?(7 zat>y8Z-5ok(dm_(MJoEbh z*cApg7_}nI`;^gJv!BINEP8}Y$YFa|)ftMSB(HQMl#K5+HvNf*Uk2n^3ck>oDnL8| z41=>+3LszNxG$a|la^4PM2tmL3))&p;^Akx*52xqY!C7RKvcCO2G-aDB@`P`!u=cZ zs=&5`AD#A;7bTHc^M;}&OPRW)!fA;4$#4j@AWEkpU#ukRU!dGMf$^+We_CCx%2xXN zVkU+3=h!~b?=v6x=@DMK9$JveKu62KXXfuBtvK%28u(8}E$_eYDLd?F%|}SuQyH4BQciRA&^3t+?hT z1sd_?Cy?p$H_C-x>g@#^MD*%UXjCF!fM!>D6wWMhMZx|P-vbHP|9n)2zZQG9+`#=jcqKVWWBEcG?~KVZ`2)&EdB_#03qE;{(BIm!A4rGEkPQ?(Iq$lQ(j#>1~MB{ z{|><#k(ZY`+}G%l(6=rw7Ya%OIzNW@yN{}eya2LbUP6f>dR`Ns%$d+x`0m3oF*z{{ zNKQOwR!@X(j%Y25H0#v6xDTPo$SjZ$1vyLRH4dRNmiWR7hFdemfXqt46@MAU6>XBdyP$>2nMU_`l84|H>t^M}O;1@AvRAokk$oe{(eh(3Qthcm78@ z_qDig-gr~%%??Kwx&E1UKl@HHEa%Mhoh^o3=dTl|0FNIk%;AX-L z9$K0e!pZR&VBECE$*XDqm5Ci;*&IJuH791dvOvd)tpyeZAk^js(LJa61b#5z*ob=C?qA?v>wIWPcxFsaDV zWXcQWiNCII>g{0k7hY{`{!6+4-|!-uP^^J(WI02gy_oU1N?4e}aCdqY zN#_$cEjWb9mB7HXTAr|ug)V*o6c*yjH;*!`yMefjT}W$_*SE3!`EMcO`+^z&m$N=^ zVpzu(XG#6_Jky!Af#%>{N?zIHZ*(%w6_&_cjW|c?WpVyZ4hJ=h&j>9dVcm=s#zwrh%fvJZdRCnK|0pVxgcw11Q@$34TOkEmc%)+r2#`Wf(=L!->B_YZ@|tL z#z0PR!k(aV?+rDHb^QcyK#Qg&=6EO?QHvVd`yh-uCx{>(Nb*2!!L7)lo;tU?e~#BDYVBttEQO;{T6&LlC^Z3e zJMQpX@yEfjU+09a(W}3>bRYJfp6~tF`|p{4c2xOKt-Th+!^d)s`{n57AtmRj397j8 zGeTv5?Lco-i6#2h8P$Px1sm~YPsZ&Jo!{DBAEs=<09s}w6p_Sh#!3yYycS z_rVRl->(udE!b=wK-)wUq20S2fh8_5da$F6jtjIL4cT6#-IciGxfOj(RZl?dqwD+U-IU*<5kw zKkpk`G;q|n|MhCL^UOhgz6*?=$7*bQANd!O5hq?jCm=t1+SGZmMK*!1Mtm^EIzTT% zO$?NmS;`vUUcS#k>T7FJ-gGi?GVit@32phE;Iw)w9=iP^3cf^-aoXpC{WLnacA?zP zxcIi#^;J!IZ~`xwJl52-VjOzg_5M_lu`-NaKYXdyz8w9)sS;9-F4FJ`HhV?40%K({ z9%s_!TT`0B;Iaf`K;ye|H8tKF%_WUF9Jj%_076=+S9gvZ`;pGK{AC~9TmOX7O;u7Q zM-jG954&iG6D74SeA@rDzk^h2X1_=M4C_+;+!{0i=S{V8Mg^hUK* zeR2C_R15RRbi*V0*kWK65=1b^vpa^P2;h-Fy|-{`+fJavmfZ4f`3CA1}I9 zcBCSoE3S-s$J5rAlVVI2yAUVDVAN_Q`^)U)mcwG$OLuX)~`L(F);g6#7r&WcwvSHiln+stN$-vS^)IA!DRhpInGLCp?=6-3(*zyPU zv$F4TEf`=gt*`wFUamuOez`jN`O}Ds>7eE^IrEaD6&2wvz%({zf{MU(EAQrdw(5}q zo(Hg>=O5i1mQAbYAuS_9_D*2>PD=%De~Fu+6aF+H=FVs|af zjAAxmH2Ut}T~1VP=7d+YQe()U>Y&06wd=6{W@hjC83P&#n`j208v@`#cm&n%iGSefHKxP0IqSl9^?5VmaWAC1+ zmuKu_3LXyVoIcDAp!3go!;o0MLay5;bb4k0NiESM4DfuDN#Tq*tyyi|XY|+*f9Yig z$N%C7Su@}O8uYxQ#02O#=I6HxmeX&FT-)d} zrdfNxIofl-pj@<7_f2Aeeex!dS&J9YkYd86Id3>Zdkz1twq6`CpA-Aj+uyRVzxbWJ zl%K*OfQbGt43R)8Hl8bs6bVy2ZS8oTCws=gPQ-zN$1hN6U&hW{QWs{8lp=1!6oCGG za5*ynZRW`$p_fKLNkNXd_X4Nmpi__X%Jh4xTNOJG+PTNiL_#7s8#Jur1VvnW+zsf5 zC$l$JI>*i^cpvb3kvUa~-v2OS`D2dTQ7)PEdsWvV_aYSRKQThM8^Kxcxig`rf@%mk zQw)zjqxJnA3BTe^EU<-qf00Zv%x9+5?bkb4=g(gw=gmuiXS4yX{hDC_ewmc0n*jod z+)Ov3bN|V+6-ZTw@82+AQ!Y2}9|Qn?UPV};oMkDY6fd>^V;99|%sUVt#oy#SbFeNs z_I!oyHEOOLMC%#)3I~#E{V!1e#V#sRxkkg+sQ?)aaMfpwEuG9NBo$IjnQ^AS{C8FL z)L=_k1W_&ndW;cSJi49w*PIrG+*|+S26I6%k7id%`NGYU&k$KL%$)TfaQkm4Bqt9)JI_tYMZNYNF9vg?m;)?;n}0MobC<@e$PKs zkcYcyaWvU*Fi>FzFhs%YZI`3_-PAMC0{o=Y5=w)|;!9)BG$ugrgTo>03B)E2V<3k1 zTy5-VM?h_I)=DX?xE###EpPCDLnnRe$tyEXFxiZh|C_^|-U*qQr&1Ag(n)sB3Ks@i z_H%S#NMm3Wo5`azAOpM>1$RR=3QJGbp5Y0}KVOjLZx5Eml*w;f5~BqZ!@#Qzrn`DckqhQES1 z7?PMZCi0Hlfag+80L|(QW*}~iT%|^_A&&{f?<-*!dt`|70eGfVpBK7>#S=Riuwsa+ zOYCoMDe#c5bc}5<;t%U>({o?L5*%T61{`hw0VqE;^rXQOPC%9+?=tWh@C(-KdK{FQ zL^KaP+FJnVO=Y5h+=D7OfVllq`G+8Ywzxn{JkKU4V1sCA-OkWiy!@Ysc>fR724%y3 z&ihY+ArfmzK;Q?oFCf9RJThZG{f7|SC&3HL3>lzw(=lOLunP^bH6?tVAN#e9{wq1y zCBilREJy}?X6nRz9`u%iXd;lxUO|60!2oZ$gu7nq!v579iN8^|38gp?LonhsI@k!Z z{}}P|y^hHy5qMnw9;SxN@F+2W91d%`{C8y;n~D$qb62s@IzU|Fqy2}{bYGIqvgDck zgIY-#&Fs^}vq_rh^&ABFQV2BbPi+>ZY4cxqYtzInM*M>>fWO#63)Yk!@tX&Y^-T@6 zIGi_-4IsmF7Y(!@_YB-Ha@R-fbPjlyA9w`)k8gGgwiL&=paJ&jfW2=DIsU4ytfDY6 zcswGvRRl@Onqnp6cUot;?vy!LQJb_`k+ask_7FtVaK`!M5HQKq59p}mRS%0Kagdbw z43M!o{l53kdpabNN$r`6!ZVTExjzwjo;Z0ru#IB(B~Gld1EJ@8 z%iz8I!-jLe(A`Bls#Y39zlD%AS&^Xos3==FWY3+@6#-0*0DotOKpFdKkAdS4A{4lb zILiBRT7cr|?@ow2f9(5=!N16ZFMx^@6paG%i_HH)F0T#P`^cve)E%gbQ6bJFH}@A6 zy^g$D5d|ml_lFek!dThqJD+B4Hb3id*j}vuy4ut6d(>0v*Uu(5$%|321*@_BV*n6xzEoDOA_m(KuC#%ALgYnVcB znMwzT03Fm?Eve?8Qmv50P?>Y%6|jg9!b$gefUZs$CVaJEorLL1z8L;8F|Qfq2Iw=r z>FE^}Co37|Z%p>JaW(fCj2mcjmawcBmf!Fl2rH$&5y@eczI}HgG1Z^U$@aOZ4C;7(D`0Tr3RWdxbQ&zg^JD-YGa8dX0&o}v^O zy?*OmPnrF6SwDIA&f5o5C*4p{1#-4OhGjt8sg>nXYaTu=aNg3m!_m=( zj;QN)km0;Rn<-f!PA)f^)X6G^9i^{Vlmq4}_y;P>0-$C&!;&J$%m5{?@D&$$|ziWA^x2hfKTSM;wAefwQHpA!ax-u(P^O?eS#C=O{G{)^9Pt3cnioP<*nvU zIEZT(na+JHlB4UjCC(^0_dDI~)pi z=HklZz2zQyzD z@uHmI%2l;1d?svo?DZG0iF!dDUgp!-ddj#mVD(_5?<|sRKPCp`rs|AYKHmw?OwUQun$@VMoS zFS>sxC@T#49!SINTy+BLNT0DnGWfw4lSx{vZG9K(jx~p-(P#Ao*Z!||&OIK=tPkLC zC=EW5L27eN;}#7@p=6RWrblIv+!E`4si~yg7c;h1NknE?#1LDoavN-9tVPIer7*9i z#gI#)$YprX^PsK$^zP^VDft=npR{MN{;Ulw5} z<{Ws41~D(03h9fzJ#;q$>j3AzfZ$1bBgGzMP4aCW-+S7$)sb3ySOGY;7WOJscHpCD zSr%;&L48?slRvDaGpSj%vUdea5D)|`fQ$Qgly>tP4ah`ZX`*&RWXFTd!!WweUd*`P z2r;b;u8#ir;$sUc;Ov@SC2etEjPW!Q!c@jh)q~761UYU?#m;w0|1e&YH3f44Z~L$} z(5WS&uTKt9rjTfuvS0-`VCJX{y={3@x2tR1np!5Vzg3S!X#hNUZF% zg#Uvwblorx-fT+E+X**0Dlt{N!We$L3_&|fKt1vJpRjAtQ@rUl>-#_SwJ-Fdo2IOR zdvOu((*Op*N7xCL3Ltp^8Ua%7^QeseG`U$pzheaP;fUwAX=E6b;3_Djk_q8_z(YMR zGo_PUqA!-ZX{Mb5UII16HvW879S&(hUYE0mBR%W$o+1kda7xE%Jp%_kCwybV+6%#) z1H6bZEB=YxDe_(ApC0GSyXm%yIUT>Tu7`K<(7({beuS4WUbGmVt^6oY{MrI&EVnDVEq5g)=T2faqC?oA}#S=#EX zQcT?na>xMmyQPq)Ue!+z*USvQYQzOA0sSi3ZYbxe00in&T~3`~n)x?L(FeJzr((;m zdl3*x*D7cSn`A34^l&!fLXUWN_DHCGm>!~^Cg$+Muxw33EsqA;%`Ng|1u|;(PY(2r z-S7w0qFBM@2PIP+k}byn8PCsQ?w6_pdZD9g$*su((c(BD7ToBQc{g;O7CS$HWcC2H zhR?g5f?k|E^nR!pluY>S0l7?T0)Fr<`rplaeS6O3$HwpZWE4FGF=Q>He&w;W>rq*> z$+VUBC6fyc&y8m<%cN^f4_GqN%hseE=?4_>i9avTn(p-IfU|dp*PftF&Kn#P-eP0n zu-F;Q&SX|qnTyKs;^PvhMwHh5Iu>YZQUN^>VD$P39rY0n= zuYacPK*?06>Z4cR1(q#A75Hoc zM#^)+;2Lh}`xLZOj=--mjW~|Ds~S7K=u%$SZojr-b^@&sJCb3qJUb}Y^v1V(?+(-T z{B0tNSHj+mUB=8T(B!N8E6MKuJ#S|V-j9x+ka-`|sKXIgQeoF#DHah6Qn$_0ug^as zOR>dcLVm~$o>~{%bdOHnw0XagjSMPuvZ6WBXzfTHSf}lO-Y=%acf<)9DpYHeTeR(n zb#43A9GTb_N}bsu$0Q%cf$Xx}kUi-6>|1>_X_*0u2M-;iVQ`YigSjq`&i( z46=E`&r&*r9-1My@NQDVdUONkrpLc~fXuM#NrS=oa>=+FXk}r4&`5~P%g7O zFq<0pV(H?+V~+Zr+OFYw^mGw7YiNkT6hIO0IVcxyh(GV^LI<9hXvW=Wjq)Wd_c%%) z(km^w_S(zGuf9LxBX`b{<$%^anA35sHN8o1kCyqj7+1nJ@8?U6l}|vi9qq)pz{K70 zn}+z)bpy{(cjUhvsqNy^HJ7$InstW7=w%JSQeiWsEXYq{Vrx~B!QphAXMWDw$Oc&w z#Z`hWE15#(mSW41ow_VYcLV~0XwJNBpX`9__+|bUO|avsp!`Qegofl5E#r%tfRLb} z63z-Af_<5p3r1~Yu+(#fJ8k>C^~7ZCU@2Dr^VWr+VezxV35Q9^{xhNb0y45z;^QeG zbmdh>RY;J)Jf@s~C^MXU1x~zg$Ea#KzwvX>GI!d35#N6DQC)eNxdx zQtJD8LuH%I?BxjeJHY!wLQnKbVNw{_k#wVG%8Dk9F=t)`g93@~MY2ye`CILWv-mv| zZl{nJ{xW=Nh#6|141GR)5nu~wK(erZezlLU!r4En_N%J)Ysf-rwe7Z-}1OZtNGK0j?6&;QYAZdm1_ztks~kf(QRgp$;Ldtg;J ze|ZvPNz!)oHl7iA=!)5u1bACV!Q7qY0u-n=y5rAFvu1wUp^AT>gwhV_?YkRnDc4$` zSI}>%U71a6ge*g>uCv`8~4saBtcmAvHrczH#s^dUqUeUtTA1W>R-Nc5mB(r5 zjH|N&okgBjZ{&Esmb-NdX@N$iG03<}&KveUd39gBhb3}0UrfrKRBOQkKOg%xm8wAh jhqE&Oeg^;Rzr66Yko_U;A0^+w$L*ROK4wy2d?xZgc4%sq literal 0 HcmV?d00001 diff --git a/vignettes/inversionrawpHist.png b/vignettes/inversionrawpHist.png new file mode 100644 index 0000000000000000000000000000000000000000..410df97ac2a8b689e39b04e57a4103ccd5808a13 GIT binary patch literal 4044 zcmbVP2UJtp)()2-L0UkHQsjmzh=QOXp+yWBhTbv?N>Ct3N9ocM6oFCTf-)vm#{q24BS@*89&)GZO(f*j| zcDd~^7);a}hjoU*_@J#9!OvqFPj`IdiBd;fS4*CR!O$=m5r*bH5R6FUDQFlP4I`qr z1`i=Lk%;E;5QK&p#DBT+D4qwjW#J77BGPC?8aK6n9zw&Q?{gp`Y-qTa2>tmNm1n2uZ&l) zoa|t}oFby{Q2v^4^?qb=LH|u8YFOe&)8Y4Z@r@Tmnqm#^#n$jYv&%B`G<{UuvRky) z`)Eu^U&BFv8s%9{_sL%Y!1*jlv#QshH>kW(4tN#Pgszkvjk0-hiK&)N3|tyH{t@z( ztJXZ%c=F?z`6OKm?XI3wa}Lqkf|AfdK6C^WHa=9j4-b*YZsg-Cpu+^#jP$c9^!TmW%~6)YVwO3MokJX zMOrgC^GA#m(p(J7>x~OWf-4U-zFt|nO)*z$QieTSP}OXW3wpxQyaSveKK;knFm#nM zrkVv$3z^U=-$OktrpU5>O@FB1V$3}3b~mzSGBbqGDK>4gR%ja-LXndz`?d?wRe9l( zIo`uIsvr7QDb8($FcyIA>P0%5uAwoRqr*m6Y`EKu;N^_s`#oAoDx1&++nNBP4IVqq z?Q9+mJ%+btXU<8~{p{aYb3tx2uTl^wr-cIVRlaA=tSC^>`2WkKzskE;6>PGdq&(p7 zvKFcdkR|>tMfvzO7e+TwMYSKayuc=-O`?m;gx|ToAMazt95^Gzsd?wwAss~-rYXYM z;#{g`^b(0?L+H^lM_TY|DGuB=~XP=_X z)p~@_cJP6cSrEFvM;Q_#DU{GnD8+e)4nYBhr7#>dr&W41_Jj++KQ<8yQUVp^Oe=1N z>YdVuELi>0ACOYuuZauoXq~`=V1mXKvG8~DQ=W#=2im1uQa@+Ji-&WIgoSKLp=k(J z^1nk4;GfuJ1IqT;38@}jfuGq2w*cXk9yRC-9itcT@pA=cCGJt$6hP5^rMl%m?)d^qthWCSvzAZL`Ng}#_Nha z(CeH^r?Jw~2a+KFMVU8o2bP8z_w_)J194KpzcuGCgc8(W$|Nu`pQ*e0ev`Xib=)@0 zWH^`KDV6`03ejD)>}pa&M=pQ%(VX2y?yAilUpnwFYU*1lN2yB#c>ERKUS-#h8 zk)&Hf#Eo|awTfN%3{gdIJiQ=5tjV2sm3Bs}6RYL8F_nRWLtk^P%UdE8n5Gqh!uG!R zC9PcvtLXj574NOcFspDGIfEN3m;`DT4!sg*kxHX2w1P zWa^Appq0Z%-qNxP;hFn7P3_d%!mL_P)g zWntzB#W8|mq=E4bL|Z?~fv&;lMql=s+x^9ld7!W4NQ!%8vJCGNL`1%?kWW=b-aVcv z?(_1xJ!nnty21C5f9B(0!SH=SP_nE27ckjC7<}-ndu?t^Dcb6J7%8I8C|mHCg~d{r z(!3OjgF8XLhFqzk(#T|q1|wK@z%SdR!aVV*Dpt>l0JavF9_5KY#?nnT5C_*;x3zB+ z=UIZXDiLovA)0w@$BK?3rq5jJCybhOpZ(w~(bbM9ZaJBsphBJaZrHdeH)R6~CI1~{ zRC})Lgl3 z;w^c%tPh@seu7#ti|j4yDIWh8LqPyI{`pw;z&x>=D7vDF!j zZY#7#`bpemSP{D!Q%< z^D4`aB6$`kUM{KSBUUbXYsm%AShjTqQXZx{3Q4V0+N&XdzuJF-`^?Fnj>!=7i61cJ zouwpcYtYM55^+2m4&Z-RSK5*Py=bq*fG<|S$rKAW&TG9MgQE(p+Z(8o)Xs$tZ$)BF zZDZoRK-RqQdWVJx?w(zFLaT+6LlZVKdeYeBUh;OOoJ>EuQ%|noi+i5}qCNQUM#n!T zx!Tw!$QI#e8|w>WmmW6@d`$fEX82w}F3XX5YxMB?;dyh~sS9me@eqt?qbSoysW;!n z#BiqceVTGN>joeq`Skj(qm53Un0S~r-ZLD>Z1M8K z^f^=9LVUb#ef5^?tbk9Lui~)-W5Jt5*10&`RU_=ZF_fiDz!R6GpY45jKKj}2G18Gft11R5a73nJ0jKb;b2p#2UYQY2ig(AsB)cj{N>#yjoEjCQ ziXGTQcn6eVrsXV5_cPC*pEQq8_%ivRPOCd$HkdB4*1otIZEdA?j#RC`#7tEQpkcbF z)+{3h4{hIE8E75Wc<}7mXQ*}f1G`Q=ogdy9jq8)`dV8+?L!?o$-<@}Bw9N~r8}CRz z4a&)<{@^SXp4OR2qaGrFXVkN$>RS?C-oG=%HcxEe&w15+>l#jWd6pnPrzsrWR>-S* p3fD`GP)OZJMqv2gXgVvo{F%yP4!U0~E4I3FYfF1m6g`mZZ1Z0p&gD3{_Du644i>g3d}ea`;&cfPZ4 z{b*+;yBWP1g+j?%pRhQCLWw28zeFi$-qbZeCl^a>_g%5FB%Ul3dmYd(vMQw zD}R?v+_dS3fbV<|4j>3NkkgfMiJGggEV@^W)wvUX}mKM zD6BNnfNXITT>@^`AV)D2PaJNQ#2gd~5QAHvjuYw{gfvPjVy?({?I>%maLT=SZN0(9-Ewfz))-*gSNo=uG%Lv2Z6>jra*t~V zKK;m@zD04_-4FKB{U){njD9eWmiFG-T{X>PvI1MU{CEeAWp|{aB~7v*F1{*+FnNKb zYo~41)Y-zB%A(h})-DM6_%U!GKRc9+za988xtbJo2T5gVyFUV{o5^A*EHaV@Fc!v zuB4ZlS#|H-1dImaxG5XCH#EH6m#aXd#&K?B1=tQMVae-$@dU1N21(o4f9aT#QVB-U__Delyae^jn^wZ z3c)*LEIyqh2w|7gRnD3aaE$G@T3s1{gF}RFW)SqORQu0UERpCEtu^h)>ZJ6~Iu9rT*XgBod&~rDNwh*(o zuKuJv&mJNTj(%GdysBTewX?u#*o6Hge47|=qD)hE1-lIdi zR+H#U4c;otyKx?|nM6s#AZT+>a#Ol;^VLP(O zKf3zS7{R-PpNgv)bVNgf_oo|PZKfTo$)?Wz{HmF5`Yiu4XkNkf%AwcQ;dgQ9g^`io zY>RsbqWp{=wAsY;q|l3T-!g9LYMg%GL|5?HyO_1$rt8%czTuO;tx%nR#=D2CVwuDG zRFYac5X487Ox_$6YGEPrfchnr?|*ys%UZ-nTM1syF~9Kp3Ejk8=7STvYKU|RtU2xugyN-H&{hj>e?d|B z6u+nYbOhp9zhX@CHYKtc9aDoIRp6`S?`2Bx26TAh*gFtbx=#nbrzin|N|nT4a6E0GZ5U<#kT@)5?=ZOP>{I%t{lDoHyHRZFfTpDP&>dHuw z{HcUp*mPM5>(|W_fJ#si6;pnt%QTw$Tp;s~NL+9~A&df8=bV@`tGOn>bJdY?c&VIV z!rr{n#*KrZxlu|N5GD0{p(gwMtP(sSKLws(iS*nxKB!Ts$ZSsLtI!Khd6VVVry5I6 zc^_|pC;ccR;K9}0(ioZS@>GY(Hghr)kK$HYbQ*ZSdGPeuWwL(=(r zc|bi81<)==Dgk26li=w#DHOcS*WmSr89gcfyMGk{gy%&ufmxnTRTiy#Eo>~+@Gc?hLBT=1bUjk(pIE{M?jYg%>$QJA8^|Xo!c+KfOQM*6eGU1X1Y#0~L#ga6 zQG^PSCBy&{zJET4^q^q#MP??_BA+OHOa~mj+eH|77}l?z=UTzg(}Ke>ZpLe@25U*x zh~wyZXx4z`+)UvoREzh4$Kr_4As&AJ2(m1yuXmQ{iU6s47oLk>7Ozp)i7GkAef(Sw zc{PNtc()lZ+_)Q_bB5(Vpe+fIqNJk5{&t?)Rv)RlMw48E12KnV?E_C=L{`Fn1bmwJ zIvutT828;D4PB!#`R`C48z`fIbT66VV4vq{HyQB-zQiwLjYV zn}hcQ|3m?bug4-c_AZkZ__I3bYtS<%jb63R$enfoXU>0#`d~SXE&iO7J|3iGfCii{ zvIB>p_a8eqv4S1+!1jCR2mDyukK=#vAcF5V0 z!tl3T6Rh-(JCMNr*5NZi+l>P0^bj+?2dlcbX>3JUG2p|+N$+r=Q;UxmJG5u5#Bt7g zy}f>f|B=|lIpeZYv3RxFbbTcZXn=E4Z%kRvFSr`V6YS3!xa%|pUbW&0>WC%KY%^Wr z?mJ?^JCfCaT5cRtHQur9=EZzo*L3i@+}f8KnSKa4q4 zDTRa(z-l}4?Ofv{;%8Q~+ob?8??8f2cD-ZERdf0ew8&g&G3e^ivh`P6ZMh)CtQdRV z(Zrds|1xRXd2iNq?OXv289LRj*We^DXc*}VYj(vsni|dwndiwkcgWFF1UsXv^J7H12qzF}ZqUg>*F8d9(yCb&xKy4tEz3Ec3u*bN(-e_KlQ zJmfLq6IF3IzXJM5D9d~&vv_stT()K5+d|xGSw-v6TP2Kp88j<3K30|7Tt&v|t0m8u z-Z1STjvpF8kFDch+R-Kk`i)-Z1da7yIIA%^(!HBjODx80A2E+p25K|30yOL>k%W&y z!xSr#A2Lq*!;yGr5}a(w?zAXojy4_UA}XhhU#l2U}-(BavK(- zmMo!}bic=g883YkCa^H0HMM~w)_`J>Ml-rnizy*v=0X;fav-^BOn2j+&%N*4CwCAy zGq~wk6S{EZ4l6Ms=b$0VwtGuzZ85CNlMv=7So%{NOC5JJloSte05h7 zHrcPt@YZ`g(JMY*HjGTyIzIH+%qm&+9som{&S|_E6Y2yieaFM6Hj|82qwe>Wl1x~k z^m#{wkBopGu_YZ(yr6V_XZZTuRqdQ8$}m^mfJ8PPlEIkxDbxdA!GiGCoupkxFv#%c zz^y;`N3X+VUfD-aiEY@2&|q~a;b^gQUJ&dYgBQMl7Bf%+*uD4<1S{@{4r(z0S|ENd zO_Q^a*l|Mj5xgTeemjC5aWh|nd9D?P?*E{18yW>R0uwKv&2V~Iz5B{g%%ZaRjweu+ zkQlo@lL(@5F`RbimLn$PBlR#z@Zdip z7?KNTq~Uea&L>P{KBmE6xdx%V+B})8H*nHsZxJyxidR|=;jyT|Xb7)f10y$XF` z{6P!A=QZ4|=$bPoHxgAHO{4zwflouE2W1fSjO61C70N>;`*bkke+!A(T!Vj&yT4=~ zX#*@U2m${gVgvEnAPOK8?sRsQrny(V4bdSe!%(I0^XOCs$5HU4Cp86Oj(b!-&-qRp z{#OSR&gaO~{BFYy*zts@>eb6_d*7_J2~6n)gLx?IxI6Fmk+xx&G%AUPp@bwi=J6;l zGyrA>rHZ6TAbf$tYB>x})|eCr^(Mmm5ljKTqTV=>wu1wLd5K>7Qa{o}!ACHFCQfy0 zo!?QW+81t21~;{KHN$*sA83X1+Yi1!Rg&j!deVk0J3r&*vDd?QY4{J=tuMUWu>~ou zFAneaV{WvqE+vb!AQ9c75ce3oJG_kXcATZ-N>?@PpbygNtz2wCvXL)eP>$nk|bW| zdRWfQswa;3k>0Yl6SxJ^x^oE1X0Ov=-hgKgx`;X)iSSb!f^GOf@s6}dQ*@xXiFvOM zTKU<0)qj7%2q7WlFmVlf?XEGUEKbL~K&!(~mIRF(FvVRU4XGnc11qn;G^I#_(J?;2 zse9eT!=73c{wEF2#SQn+Rqoy1)A@;RdE3!3P=2CAtzL&TFDR!&x~!7uS`R{_Qd#b(W zA2|LqFle&<(JNxSrDj()K4MJR(r3Szv@PuXsl@wS?Bk-^o~x>7(>G+;UfDJN7-I}f z+NIW#RF0m2k!6$+hjxq3Y};Vwbv^r_;Yp{yKs8tU4~B0{oxLcOO)5eCbxwVw7_-A8 z=*#IAui7@qu}6<-lhO|6fB0jwQTaZph5jG;NmEto2c3TnLC5nu7%9BNx1kzhJN89R zOhVb|X9~I|4)nduj7_}F_5)`UOd7DuVE191Hgy`<pzCCj6 z{S+9Ssr#^}iCfHEFWJ(*ojBogeo#~SpTgq9#{*Y{M}Y{Sx<1v1_o9RZC(dK;qOrldT{2 zn29CpTk93NMgG?9?_nGR8Q1|SUbf8k%*>Uz8EM*fmeU$L!YdwXJ21b!;wVG0j!0f} zd_SNAlWF$qY360)(Od8Iy%t~4J?5wI=X28z4RXCYyp8H99^IBPHj9an!92HLQy3o9 z^X_QF=PdhOzfL{i-B!IAb!?^g;=W=SrGFmbjlBWpY0xZS zlU0cjKm=Ua1_CHM1_A=I1`vTn6D1Sq{xPSgf6SSlsZ(|Dt?z#CoLjH%eed00PCHnM z3hxt!Kp>(v*5*zS2p_opK>2w{?fJGXp2|LL=WM~#5C{qa!9h?c2#O~`2#$AC6hJNy z+m|O3kHS4Xc6M-9i@Kd)a)$92f8f@g{8t>&C z=Qfhr3w`SnTrdSaE{=x!KPCWaQjN214l3~HZA{2=8d-k(k>xi zbNWa1K2wMEJuxp3nD5GA5&ebIz8-HvIcl{2Wlfr20A|>mQfe|K5bxCSak?pga4G<} zUA#YjrZsAV?3MR|I+XrkkW@~-JeXCI&ELDg;6`f)x*J%?%#|q~3#TVv??C0WLDe(g z&h{Z@jMepp)`FK+;eHeyH6ugm?md!DMIp@^aG%vIX-Rpq4slFO&mg@BzGN&=VIQKu z+r21h@gv*W4R9#FT#rA7%~|p}-wZci}2bMww&fVwBC#Qe6FXlnS0zu59Sat^_~H= zDxW=3nche_@73hzm9Z#)(KfnD+q*il;+ zL43)nzqw?w55fx|a9^!6(9pk`qkqcgl+%G;isQM^3?d`zrbpEKtdf~95udHiBJL2H zXa}T#(#|gvWH5K zCqUYh8e|?c9FYL%8vjyzpv@%@=JWV8Pyg02YOQU@wdtV3un76^ylR#MGwM|uINg>o zQqSq$e322H4ptodQp+NH=7q6-F5_(ZOH4)Ar6e|ro1=(dsAEs)Bm>(5_m0?dan3E% zoM_BK%9$?<-{+cX@Br>;==0SsPr!kB?IY(m$BtsNdXB&x;nKp_z}?<}!qCvIv}oDB zBVjiB!Z*&=XS@-~C@1|Wm2OgSk&%2UYk-xPtD1`@TvyGUgAtvpb)uGq1Ljs_jK?*) z2nH|QkCu~Rk_A7CAuT@(-9uCP5*kG>PhW!ql81{NC6Xb3wZz=1RY5Fumk(xU8X{)0 za+Ray+#l9i3N8KoZbgGwTT@Pg{xl>OuYrBR`H)iTX&A|6~YmIjJQ5n zvfkgMYlZbT&>$O&95ig|Qk5|etOLT3m!R5?jSc6h{8T-ah z2g>f6nQ8LIfiT06+UUHS?ol;#T@kp@V-|S6Dx~TVe7#-lP@p4l-4Gj-u%rgHIgZ4d zFdGd!0~uagw$P9SVT_IHSP9x>bPCssL4Njbk*2vsS*V{=m5+k09c5vvJBbd@k2 z*@Hv{W&V%^xpMOM?l5Q}w$#6}!uEJas5-)naO?T_h;AF&dRM2*&{q+b?}a&EN8sCN zYez~W)P^tR`S06F=rBEY8cCDDJ!)VYg(SSD%#YTMJngThzK`WhX9m7of9JxPPLB;5 zku_2Au9>J{Ty5o1=H#s{x{WM~Za>%Cvsp?5^g? ziFsUrBZPc9g0-StuGUNl=!kMB*Q2(wSsAx~}q%)(t%=#d9Yl%Dy(~Aihe{owMC^hQr zj_{TR-jhSktT5jdOpKKfBaf%!&1+==ASv>j&jRpFb3j7+=T!iTWe&A z)xHzZE_xw!E$3$iWfjDi)*H|aOvho^O|TXE!P?svI!A6HK4r*A0pb!;eWoOa;+WeR z{Id9ojuznk)BPg6l?cU~2&@OZ;+a*D_QuGC1kgYJ2i??$(RkX`!5@Jl8Obf3F&EZO zxQLe@uHhHAcsBJQmU}LEtVLe$3FsS0Tg>Nh1|;+fVGWNPW4K0PK|$^z@_J=3BPG#M zFb+%w4Au<4qM?ASGWM-t9rQ-dG)Sm|o`3?t|3r59#@8hYU$Fuul4+WRQCV#jY!oYT z0e;0lMPiDiMX^W}I`3Ki0JdX@Nl3k>TtdR-Nl>8$nM4>-2_gM|kzhvrQ!{g0D6bTV z)eNL@@I7XQ%a-?1`>qMUb<2Wq3C9KdfV5b*?!M2g&83Btu?OC!HA!D@%!sZ7@5e08 zvB!<23)Ju3-Efr99%NW69)LOKm)|D{!@D^Ha6WV-6oIA6?-H7C=)C;CSWHMRg!dvB zQXnSum&eh!66Dy9mE3Rs&**$*fw#q#Y2cd+ zYuD^13=?DD(KvKnx+>KT8H{N~jf3T+k#JWpcFp%1PU))qma@|9_~Wz10ezzGRf?Am zD0l}fN8W2Dwd0*5-=VzWM@VsMqw z6v*)l{v6l8mYCNU_2j2M!7*}yf|I5B0oMZ2IzJm=9`3=Hn1`OyNHd+Ro_D!_KA=<< zk91?l86KZf*)(R3C+kgqx;JIUT{N;%6R$g12&2X1Tc)+}wPuvOxxh`sFUEN$3n2sX zd+@5dF>%39jF)0N4P#!4{zG+hq+4Wx-=VUR;wafK=#}zm9n@M$N;aLu(lkHs+uGoB zL*xvES0z5&;M?LCDT%X{t?86BZVp2O@cNRo&MbSyPXg8v1q#E?3;c9}XrktS-v|8Z ejP-AA1Xqx*FSz%x`INgh3{z2Vp$Rqr{$rge^%LuX9kz zW1=w#%}GLqN}*}IqR8Zxw+3^!y6dic*SdF|d(T~WuWx_%Lal*UU$hv4RC+Y7ASCmU^$|0KqEJK>iiRQ*QA8L;Lk0&2ix3F; zCZcGoFcFzyB8@|YVIs_@eU+x6Xb4Ip!U&-u4~IkJh{DxSFcBH6(l8AL!+e;7Od<{q z#i0>7-=q*s#OH`ae3;1R&=4kH#OI4dqMbVtIRq7nR+$K(YD8N=p``mQOpTA6%^b^; z3D0X(`iVYZ(0eA!l6Nyv(6h(k?Q3@MizboCFG~lbFnayr<)y7z)A*W`j~UYDdSYmL zf+5+iHgv83da&U@+$OZV7(i7sT@S@w#O%Us75|UH#u_xw5Svuc2~Km0J*C#6|CZYXOCAD`z$yNe1i}iuYKZmZV=wf);QmUJ}|_9;%(A;rAX{`rty6)Z3V++n>b17=EHT z^x7{6{aN+15#T|EG2E%awAaooax*>MYY#8jxCy+`mVds6@0Ahi?0U`RFc%I<4?b!V zlTqgOqzrzgt@(pO%zb86(WM7ig`vVGUgxGy!+VuqGkJXQi^S}U5=M~V+RHBReoUB= zrb29;Z*0|Iq1gy2F63HRXvrg4v|o*i@Yc|NPn9ppVHP3l7Zz_n3wYMsg-GfWrt+Va z>(pD?+$X3GE4rMRr5{YUX~b>*u>YnY8up8OkP*O)P45|YA62zQ z@70e4IXT&tq~(cc_ZBw;!#8c?PAM~0TgcRv1I^VVisd?vA5Ob#S+=9HC($T=G~Q;h%N#8`uU9;|(L1I(TO zlKjQ28NY!mVFQig664wp6Mg&h&Ld{J%MV0UB_zbynqqV%m%)|gkGK?E_JTq2E3g?q zBeggwF>mntDarZlu|#pm8f3vj#^$|uCShkNGleB1V9MEE1|}Xl%7u@~JB(zjwb*rr zE4G<*>y7V~%h39IbT`C)<`#S3VDq%cWI_uzvveKYqHj>$JFs8cD5Ea}l~3RF)XrR! zM2(zI@B_xhYwEleAIv2iMsO}HVA;#NT9P;1COa>Icd%B8=vY9;uPK0q_OF_34jfl;A5i(*3G{W8g!+=q^a*9S2oq*zH! zaPadC5~%Bsq4Iwpe|E2DHg@Q4_{=s-!-1Sew+)Yv1*prHTApWj3=5_bOQ-=W`}0Mc zJ`F%iMsN$*yWK(l-`_*+49}`B3|MR%WZicY6M*`2*VebXOUIiINEG{h_zj96VW8S; z4K90fcgJHOYuT^&pQD!P=eySAxC`qJO{?YRA0=fFOz&DeLDQ3)%kf6`@<4_Qc}kqE zF0We%r)++4!?c+VE^Wi^vmH~}@JanU23}PkEy>ev+K+c~B@22FD+I(JZ_Z@Q7!3xT zEM}TE=Y#bZCm4x<9#%IX!6g-K+*~x$dKB&2c{{3GE8ltxPyUKiGsBgV$%QXzTbYqJ znJCzH$=BOC91cTdJ;9LVkv>}hpM!BxZ|FL6@a;e+9#mHfNa-tl?u6FzJc2q-b`A@~ z>iVuK&#)o87HC~%64f1~Hv&#jm2;JS+1ur@p+ft%SBfQ!1b7Qb(W1EbPFE&#Ga7Yk zvD#pHuoDOH!L0-e-S2gochsA$I@s~p&&NvH5mBNj&C^fxKo&vPI{5V)2T#A->KO(E z^Z0fZ{_?tI@Xn;F^2+0@C zHyMdotA`!n%2|w-VGu8!zjPPud=vCaoWa&lTO%%*=j zRd!whNR=wvxUuNBeu;g$VB;COw*r7Zx4Uky^sWo`->&rpH6?3MRs* zyN)lf28$|tHW4Em?gcGi>pR?(f$n7^ji|h0_6&^x4SOX@HHVFOwqPq)Q;fkG>1jmm z!fP2qYU_Zq8B8<=YG`McV8+K(ln+Q-SjhqC3~ThfSE6M914P3R0TPxsuJd^%LHF#=i<<|%e}iG>B37 zYU2Sij}nr6)vcnhy6@1@DFWCT%-}nub%@Q9ZdC863&R1efl&gfbCCs(LYBlfd@Stc zDt^Bd;5mi`PfyM}q??U&WP(Qf8MnL`8bx&lj{5VO!$WuulHWq;4XN8eL9;IGwq19; z3{rIza`B8{?guE-DsCh-#JxD?M1tZ(E&Q)oZO>f2UuL@M$5c$z#LRYtY-TZPyoaJ&plNtbg@4U30AH-k zEHn}Sm;GB1fb-zx(KRlUm<0kPn3l+dgAeKaEEhc` zv7kf9Uz=2YCy(&(ledblR2k`K>(dVw#b^f_^=8?wX^3l+N~M3kb^e3ihq|OX`qRa1 TJN_HO>aU1}nT=^Fcr5y_RoAuI literal 0 HcmV?d00001 diff --git a/vignettes/tutorial.Rnw b/vignettes/tutorial.Rnw new file mode 100644 index 0000000..8ce70a7 --- /dev/null +++ b/vignettes/tutorial.Rnw @@ -0,0 +1,302 @@ +%\VignetteIndexEntry{SARTools tutorial} +%\VignettePackage{SARTools} +%\VignetteEngine{knitr::knitr} + +\documentclass[12pt]{article} +\usepackage{graphicx} +\usepackage{listings} +\usepackage{amsmath} + +<>= +library("knitr") +opts_chunk$set(tidy=FALSE,dev="png",fig.show="hide", + fig.width=4,fig.height=4.5, + message=FALSE) +@ + +<>= +BiocStyle::latex() +@ + +\definecolor{mygreen}{rgb}{0,0.6,0} +\definecolor{mygray}{rgb}{0.5,0.5,0.5} +\definecolor{mymauve}{rgb}{0.58,0,0.82} +\lstset{ + backgroundcolor=\color{white}, % choose the background color; you must add \usepackage{color} or \usepackage{xcolor} + basicstyle=\scriptsize\ttfamily, % the size of the fonts that are used for the code + breakatwhitespace=false, % sets if automatic breaks should only happen at whitespace + breaklines=true, % sets automatic line breaking + captionpos=b, % sets the caption-position to bottom + columns=flexible, + commentstyle=\color{mygreen}, % comment style + extendedchars=false, % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8 + fontadjust=true, + frame=trBL, % adds a frame around the code + keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible) + keywordstyle=\color{blue}, % keyword style + language=R, % the language of the code + numbers=none, % where to put the line-numbers; possible values are (none, left, right) + numbersep=5pt, % how far the line-numbers are from the code + numberstyle=\tiny\color{mygray}, % the style that is used for the line-numbers + rulecolor=\color{black}, % if not set, the frame-color may be changed on line-breaks within not-black text (e.g. comments (green here)) + showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces' + showstringspaces=false, % underline spaces within strings only + showtabs=false, % show tabs within strings adding particular underscores + stepnumber=2, % the step between two line-numbers. If it's 1, each line will be numbered + stringstyle=\color{mymauve}, % string literal style + tabsize=2, % sets default tabsize to 2 spaces + title=\lstname % show the filename of files included with \lstinputlisting; also try caption instead of title +} + +\newcommand{\deseq}{\Biocpkg{DESeq2}} +\newcommand{\edger}{\Biocpkg{edgeR}} + +\bioctitle[\Rpackage{SARTools} vignette]{\Rpackage{SARTools} vignette for the differential analysis of 2 or more conditions with \deseq~or \edger} +\author{M.-A. Dillies and H. Varet$^{*}$ \\ \small{Transcriptome and Epigenome Platform, Institut Pasteur, Paris} \\ \small{$^*$ \email{hugo.varet@pasteur.fr}}} + +\begin{document} +\maketitle +\tableofcontents + +\section{Introduction} + +This document aims to illustrate the use of the \Rpackage{SARTools} \R{} package in order to compare two or more biological conditions in a RNA-Seq framework. \Rpackage{SARTools} provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known \deseq~\cite{anders2010,love2014} or \edger~\cite{robinson2009} packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis. Note that \Rpackage{SARTools} does not intend to replace \deseq~or \edger: it simply provides an environment to go with them. For more details about the methodology behind \deseq~and \edger, the user should read their documentations and papers. \\ + +\Rpackage{SARTools} is distributed with two \R{} script templates which use functions of the package. For a more fluid analysis and to avoid possible bugs when creating the final HTML report, the user is encouraged to use them rather than writing a new script. \\ + +The next section details the tools and files required to perform an analysis and the third section explains the different steps of the analysis. Section 4 gives some examples of problems which can occur during an analysis and section 5 provides command lines to run a toy example of the workflow. Complete \R{} scripts of the workflow are given in the appendix. + +\section{Prerequisites} +\subsection{\R{} tools} + +In addition to the \Rpackage{SARTools} package itself, the workflow requires the installation of several packages: \deseq, \edger, \Biocpkg{genefilter}, \CRANpkg{xtable} and \CRANpkg{knitr} (all available online). This current version of \Rpackage{SARTools} has been developed under \R{}~3.1.1 and with \deseq~1.6.1, \edger~3.8.2, \Biocpkg{genefilter}~1.48.1 and \CRANpkg{knitr}~1.7. As a \deseq~or \edger~update might make the workflow unusable due to modifications on the statistical models, care is recommended when updating these packages. \\ + +The only file the user has to deal with for an analysis is either \file{template\_script\_DESeq2.r} or \file{template\_script\_edgeR.r} (supplied in the appendix at the end of this vignette). They contain all the code needed for the statistical analysis, and to generate figures, tables and the HTML report. + +\subsection{Data files} + +The statistical analysis assumes that reads have already been mapped and that counts per feature (gene or transcript) are available. If counting has been done with \texttt{HTSeq-count} \cite{htseq,anders2014}, output files are ready to be loaded in \R{} with the dedicated \Rpackage{SARTools} function. If not, the user must supply one count file per sample with two tab delimited columns without header: +\begin{itemize} + \item the unique IDs of the features in the first column; + \item the raw counts associated with these features in the second column (null or positive integers). +\end{itemize} + +All the count data files have to be placed in a directory whose name will be passed as a parameter at the beginning of the \R{} script. \\ + +The user has to supply another tab delimited file which describes the experiment, i.e. which contains the name of the biological condition associated with each sample. This file is called "target" as a reference to the target file needed when using the \Biocpkg{limma} package \cite{limma}. This file has one row per sample and is composed of at least three columns with headers: +\begin{itemize} + \item first column: unique names of the samples (short but informative as they will be displayed on all the figures); + \item second column: name of the count files; + \item third column: biological conditions; + \item optional columns: further information about the samples (day of library preparation for example). +\end{itemize} + +The table \ref{extarget} below shows an example of a target file: +\begin{table}[h!] +\centering +\texttt{ +\begin{tabular}{lll} +label & files & group \\ +s1c1 & count\_file\_sample1\_cond1.txt & cond1 \\ +s2c1 & count\_file\_sample2\_cond1.txt & cond1 \\ +s1c2 & count\_file\_sample1\_cond2.txt & cond2 \\ +s2c2 & count\_file\_sample2\_cond2.txt & cond2 \\ +\end{tabular} +} +\caption{Example of target file} +\label{extarget} +\end{table} + +\warning{if the counts and the target files are not supplied in the required formats, the workflow will probably crash and will not be able to run the analysis.} + +\section{Running the analysis} + +\subsection{Setting the parameters} + +All the parameters that can be modified by the user are at the beginning of the \R{} template files: + +\begin{itemize} + \item \Rcode{workDir}: path to the working directory for the \R{} session (must be supplied by the user); + \item \Rcode{projectName}: name of the project (must be supplied by the user); + \item \Rcode{author}: author of the analysis (must be supplied by the user); + \item \Rcode{targetFile}: path to the target file (\Rcode{"target.txt"} by default); + \item \Rcode{rawDir}: path to the directory where the counts files are stored (\Rcode{"raw"} by default); + \item \Rcode{featuresToRemove}: character vector containing the IDs of the features to remove before running the analysis (default are \Rcode{"alignment\_not\_unique"}, \Rcode{"ambiguous"}, \Rcode{"no\_feature"}, \Rcode{"not\_aligned"}, \Rcode{"too\_low\_aQual"} to remove \texttt{HTSeq-count} specific rows); + \item \Rcode{varInt}: variable of interest, i.e. biological condition, in the target file (\Rcode{"group"} by default); + \item \Rcode{condRef}: reference biological condition used to compute fold-changes (no default, must be one of the levels of \Rcode{varInt}); + \item \Rcode{batch}: adjustment variable to use as a batch effect, must be a column of the target file (\Rcode{NULL} if no batch effect needs to be taken into account); + \item \Rcode{fitType}: (if use of \deseq) type of model for the mean-dispersion relationship (\Rcode{"parametric"} by default, or \Rcode{"local"}); + \item \Rcode{cooksCutoff}: (if use of \deseq) \Rcode{NULL} to let \deseq~choosing the threshold for the outlier detection, \Rcode{Inf} to turn off the outlier detection or a numeric of length one to give a specific value \cite{Cook1977Detection}; + \item \Rcode{independentFiltering}: (if use of \deseq) \Rcode{TRUE} (default) of \Rcode{FALSE} to execute or not the independent filtering \cite{bourgon2010}; + \item \Rcode{alpha}: significance threshold applied to the adjusted p-values to select the differentially expressed features (default is \Rcode{0.05}); + \item \Rcode{pAdjustMethod}: p-value adjustment method for multiple testing \cite{bh1995,yekutieli} (\Rcode{"BH"} by default, \Rcode{"BY"} or any value of \Rcode{p.adjust.methods}); + \item \Rcode{typeTrans}: (if use of \deseq) method of transformation of the counts for the clustering and the PCA (default is \Rcode{"VST"} for Variance Stabilizing Transformation, or \Rcode{"rlog"} for Regularized Log Transformation); + \item \Rcode{locfunc}: (if use of \deseq) function used for the estimation of the size factors (default is \Rcode{"median"}, or \Rcode{"shorth"} from the \Biocpkg{genefilter} package); + \item \Rcode{cpmCutoff}: (if use of \edger) counts-per-million cut-off to filter low counts (default is 1, set to 0 to disable filtering); + \item \Rcode{gene.selection}: (if use of \edger) method of selection of the features for the MultiDimensional Scaling plot (\Rcode{"pairwise"} by default or \Rcode{common}); + \item \Rcode{colors}: colors used for the figures (one per biological condition), 4 are given by default. +\end{itemize} + +All these parameters will be saved and written at the end of the HTML report in order to keep track of what has been done. + +\subsection{Executing the script} + +When the parameters have been defined, the user can run all the \R{} code, either step by step or in one block. The command lines use functions of the \Rpackage{SARTools} package to load data, to produce figures, to perform the differential analysis, to export the results and to create the HTML report. Some results and potential warning/error messages will be printed in the \R{} console: +\begin{itemize} + \item target with the count files loaded and the biological condition associated with each sample; + \item number of features and null counts in each file; + \item top and bottom of the count matrix; + \item SERE coefficients computed between each pair of samples \cite{sere}; + \item normalization factors (TMM for \edger~and size factors for \deseq); + \item number of features discarded by the independent filtering (if use of \deseq); + \item number of differentially expressed features. +\end{itemize} + +If the \R{} code was executed in one block, the user should have a look at the console at the end of the analysis to check that the analysis ran without any problem. + +\subsection{Files generated} + +While running the script, PNG files are generated in the \texttt{figures} directory: +\begin{itemize} + \item \file{barplotTC.png}: total number of reads per sample; + \item \file{barplotNull.png}: percentage of null counts per sample; + \item \file{densplot.png}: estimation of the density of the counts for each sample; + \item \file{majSeq.png}: percentage of reads caught by the feature having the highest count in each sample; + \item \file{pairwiseScatter.png}: pairwise scatter plot between each pair of samples and SERE values; + \item \file{diagSizeFactorsHist.png}: diagnostic of the estimation of the size factors (if use of \deseq); + \item \file{diagSizeFactorsTC.png}: plot of the size factors vs the total number of reads (if use of \deseq); + \item \file{countsBoxplot.png}: boxplots on raw and normalized counts; + \item \file{cluster.png}: hierachical clustering of the samples (based on VST or rlog data for \deseq, or CPM data for \edger); + \item \file{PCA.png}: first and second factorial planes of the PCA on the samples based on VST or rlog data (if use of \deseq); + \item \file{MDS.png}: Multi Dimensional Scaling plot of the samples (if use of \edger); + \item \file{dispersionsPlot.png}: graph of the estimations of the dispersions and diagnostic of log-linearity of the dispersions (if use of \deseq); + \item \file{BCV.png}: graph of the estimations of the tagwise, trended and common dispersions (if use of \edger); + \item \file{rawpHist.png}: histogram of the raw p-values for each comparison; + \item \file{MAplot.png}: MA-plot for each comparison (log ratio of the means vs intensity). +\end{itemize} + +Some tab-delimited files are exported in the \texttt{tables} directory. They store information on the features as $\log_2\text{(FC)}$ or p-values and can be read easily in a spreadsheet: +\begin{itemize} + \item \file{TestVsRef.complete.txt}: contains all the features studied; + \item \file{TestVsRef.down.txt}: contains only significant down-regulated features, i.e. less expressed in Test than in Ref; + \item \file{TestVsRef.up.txt}: contains only significant up-regulated features i.e. more expressed in Test than in Ref. +\end{itemize} + +A \file{.RData} file with all the \R{} objects created during the analysis is saved: it may be used to perform downstream analyses. Finally, a HTML report which explains the full analysis is produced. Its goal is to give details about the methodology, the different steps and the results. It displays all the figures produced and the most important results of the differential analysis as the number of up- and down-regulated features. The user should read the full HTML report and closely analyze each figure to check that the analysis ran smoothly. \\ + +Note that the HTML report is stand alone and can be shared without the source figure files. It makes the report easily sendable via e-mail for instance. + +\section{Troubleshooting RNA-seq experiments with \Rpackage{SARTools}} + +This section aims at listing some problems that the user can face when analyzing data from a RNA-Seq experiment. + +\subsection{Inversion of samples} +For a variety of reasons, it might happen that some sample names are erroneously switched at a step of the experiment. This can be detected during the statistical analysis in several ways. Here, we have intentionally inverted two file names in a target file, such that the counts associated with these two samples (WT3 and KO3) are inverted. \\ + +The first tool to detect the inversion is the SERE statistic \cite{sere} since its goal is to measure the similarity between samples. The SERE values obtained are displayed on the lower triangle of the figure \ref{inversionSERE}. We clearly observe that KO3 is more similar to WT1 (SERE$=1.7$) than to KO2 ($3.4$), which potentially reveals a problem within the samples under study. The same phenomenon happens with WT3 which is more similar to KO1 ($1.6$) than to WT1 ($4.59$).\\ + +\begin{figure}[h!] +\centering +\includegraphics[width=0.45\textwidth]{inversionpairwiseScatter.png} +\caption{Pairwise scatter plot and SERE statistics when inverting samples} +\label{inversionSERE} +\end{figure} + +The clustering can also help detect such an inversion of samples. Indeed, on the dendrogram, samples from the same biological condition are supposed to cluster together while samples from two different biological conditions should group only at the final step of the algorithm. Figure \ref{inversionClusterPCA} (left) shows the dendrogram obtained: we can see that KO3 clusters immediately with WT1 and WT2 while WT3 clusters with KO1 and KO2. \\ + +The Principal Component Analysis on the right panel of figure \ref{inversionClusterPCA} (or the Multi-Dimensional Scaling plot) is a tool which allows exploration of the structure of the data. Samples are displayed on a two dimensional graph which can help the user to assess the distances between samples. The PCA presented here leads to the same conclusion as the dendrogram. \\ + +\begin{figure}[h!] +\centering +\includegraphics[width=0.40\textwidth]{inversioncluster.png} +\includegraphics[width=0.40\textwidth]{inversionPCA.png} +\caption{Clustering dendrogram (left) and PCA (right) when inverting samples} +\label{inversionClusterPCA} +\end{figure} + +Finally, when testing for differential expression, if two samples have been inverted during the process, the histogram of the raw p-values can have an unexpected shape. Instead of having a uniform distribution, with a possible peak at $0$ for the differentially expressed features, the distribution may be skewed toward the right (figure \ref{inversionHist}). + +\begin{figure}[h!] +\centering +\includegraphics[width=0.4\textwidth]{inversionrawpHist.png} +\caption{Raw p-values histogram when inverting samples} +\label{inversionHist} +\end{figure} + +\subsection{Batch effect} +A batch effect is a source of variation in the counts due to splitting the whole sample set into subgroups during the wet-lab part of the experiment. To illustrate this phenomenon, figure \ref{batchclusterPCA} shows the results of the clustering and of the PCA for an experiment with 12 samples: 6 WT and 6 KO labeled from 1 to 6 within each condition. \\ + +\begin{figure}[h!] +\centering +\includegraphics[width=0.40\textwidth]{batchcluster.png} +\includegraphics[width=0.40\textwidth]{batchPCA.png} +\caption{Clustering dendrogram (left) and PCA (right) with a batch effect} +\label{batchclusterPCA} +\end{figure} + +The first axis of the PCA, which catches 64.36\% of the variability, clearly separates WT samples from KO samples. However, we can see that the second axis separates samples labeled 1, 2 and 3 from samples labeled 4, 5 and 6 with a large percentage of variability (20.97\%). The clustering brings to the same conclusion: samples 1, 2 and 3 seem slightly different from samples 4, 5 and 6, both within WT and KO. \\ + +After a return to the conditions under which the experiment has been conducted, it has been found that the first three samples were not prepared on the same day as the last three ones (both for WT and KO). This is enough to create a batch effect. In that case, add a column to the target file reporting the day of preparation, set the \Rcode{batch} parameter value to "day of preparation" and re-do the analysis. It will result in a better fit of the model and potentially a gain of power when testing for differentially expressed features.\\ + + +\warning{batch effects can be taken into account only if they do not confound with another technical or biological factor included in the model.} + +%In this situation, the analysis must be reran taking the "day of preparation" effect into account by adding it both to the target file (in a new column) and to the design of the model as a blocking factor (\Rcode{batch} parameter at the beginning of the \R{} scripts). + +\subsection{Number of reads and outliers} +A sample with a total number of reads or a number of null counts too much different from the others may reveal a problem during the experiment, the sequencing or the alignment. The user can check this in the two first barplots of the HTML report (total number of reads and percentage of null counts). Moreover, such a sample will probably be outlier on the PCA/MDS plot, i.e. it will fall far from the other samples. It will often be preferable to remove it from the statistical analysis. For example, the figures \ref{outlierTCnull} and \ref{outlierPCA} illustrate this phenomenon and suggest the removal of sample WT3 from the analysis. + +\begin{figure}[h!] +\centering +\includegraphics[width=0.40\textwidth]{outlierbarplotTotal.png} +\includegraphics[width=0.40\textwidth]{outlierbarplotNull.png} +\caption{WT3 has a small total number of reads (left) and a high percentage of null counts (right)} +\label{outlierTCnull} +\end{figure} + +\begin{figure}[h!] +\centering +\includegraphics[width=0.40\textwidth]{outlierPCA.png} +\caption{WT3 falls far from the other samples on the first factorial plane of the PCA} +\label{outlierPCA} +\end{figure} + +\subsection{Ribosomal RNA} +It may happen that some features (ribosomal RNA for example) take up a large number of reads (up to 20\% or more). The user can detect them in a barplot in the HTML report. If these are not of interest for the experiment, these features can be removed by adding them to the \Rcode{featuresToRemove} argument at the beginning of the \R{} scripts. + +\subsection{Normalization parameter (only with \deseq)} +In order to normalize the counts, \deseq~computes size factors. There are two options to compute them: \Rcode{"median"} (default) or \Rcode{"shorth"}. The default parameter often works well but the HTML report contains a figure which allows an assessment of the quality of the estimation of the size factors: there is one histogram per sample with a vertical red line corresponding to the value of the size factor. If the estimation of the size factor is correct, the red line must fall on the mode of the histogram for each sample. If this is not the case, the user should use the \Rcode{"shorth"} parameter. Results with \Rcode{"median"} and \Rcode{"shorth"} for the same sample are given on figure \ref{diagSF} for an experiment where it was preferable to use \Rcode{"shorth"}. + +\begin{figure}[h!] +\centering +\includegraphics[width=0.8\textwidth]{diagSF.png} +\caption{Size factor diagnostic for one sample with \Rcode{"median"} (left) and \Rcode{"shorth"} (right)} +\label{diagSF} +\end{figure} + + +\section{Toy example} +A target file and counts files (4 samples: 2 WT and 2 KO) for a toy example are available within the package to enable the user to test the workflow. The target file and the directory containing the counts files can be reached with the following lines: +\begin{lstlisting} +targetFile <- system.file("target.txt", package="SARTools") +rawDir <- system.file("raw", package="SARTools") +\end{lstlisting} + +The user can try the \R{} scripts available in the appendix with the parameters above (all the others remaining unchanged). + +\clearpage +\appendix +\section{\R{} script templates} +Below are the \R{} codes of the workflow. The user can copy and paste them in a text editor to modify the parameters and run the analysis with \R{}. The two scripts are also available in the directory where the package has been installed (i.e. the \file{SARTools} directory of the \R{} packages library). Even if it is possible to run all the code directly, it is preferable to run it step by step in order to detect possible warning/error messages when they appear. Note that the code to load data and to generate graphs is similar between the two files. The main difference begins when using the \deseq~or \edger~functions. + +\subsection{\file{template\_script\_DESeq2.r}} +\lstinputlisting[caption=]{\Sexpr{system.file("template_script_DESeq2.r",package="SARTools")}} + +\clearpage +\subsection{\file{template\_script\_edgeR.r}} +\lstinputlisting[caption=]{\Sexpr{system.file("template_script_edgeR.r",package="SARTools")}} + +\clearpage +\bibliography{library} + +\end{document} \ No newline at end of file