RNA-seq data analysis practical

[Some slides to help with the practical] (https://drive.google.com/open?id=0B8OqXE2Rr7OHQVZQN1BXQ0RMOGM)

This tutorial will illustrate how to use standalone tools, together with R and Bioconductor for the analysis of RNA-seq data. We will also use one meta-pipeline [IRAP] (https://github.com/nunofonseca/irap). Keep in mind that this is a rapidly evolving field and that this document is not intended as a review of the many tools available to perform each step; instead, we will cover one of the many existing workflows to analyse this type of data.

We will be working with a subset of a publicly available dataset from Drosophila melanogaster, which is available:

As raw data: in the Short Read archive SRP001537 or through ArrayExpress/ENA [E-GEOD-18508] (http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-18508/)
and also as processed data: in Bioconductor as a package: pasilla

For more information about this dataset please refer to the original publication [(Brooks et al. (2010)] (http://genome.cshlp.org/content/early/2010/10/04/gr.108662.110).

The tools and R packages that we will be using during the practical are listed below (see Software requirements) and the necessary data files can be found here. After dowloading and uncompressing the tar.gz file, you should have the following directory structure in your computer:

DATA                                  # data used for the practicals
|-- demultiplexing                    # multiplexed data !!! Not used in the project - FYI only
|-- eqtl                              # data used in the eqtl practical !!! Not used in the project - FYI only
|-- fastq                             # fastq files -> starting point
|-- mapped                            # mapped data: BAM files
|-- QCreports                         # precomputed QC report
|-- reference                         # reference from release 62 of Ensembl
`-- IRAP_example                          # Directory setup for IRAP (raw_data +reference) + its output
    |-- data
    |   |-- contamination                 # E.coli reference
    |   |-- raw_data                      # fastq files
    |   |-- reference
    |   |   |--drosophila_melanogaster    # All the reference files are in this directory
    `-- E-GEOD-18508                      # output of IRAP
        | ...

You can also browse the files online and download only the needed material from here

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This means that you are able to copy, share and modify the work, as long as the result is distributed under the same license.

Dealing with raw data
1. The FASTQ format
2. Quality assessment (QA)
3. Filtering FASTQ files
4. Aligning reads to the genome (already processed - will not be run)
Dealing with aligned data
1. The SAM/BAM format
2. Visualising aligned reads (optional)
3. Filtering BAM files
4. Gene-centric analyses:
  1. Counting reads overlapping annotated genes
    - With htseq-count
    - With R
    - Alternative approaches
  2. Normalising counts
    - With RPKMs
    - With DESeq2
  3. Differential gene expression
Other topics - Not covered in the course
1. Dealing with raw data
  - De-multiplexing
2. Exon-centric analyses:
  - Differential exon usage

Software requirements

Note: depending on the topics covered in the course some of these tools might not be used.

Standalone tools:
- FastQC
- PRINSEQ
- eautils
- samtools
- IGV
- htseq-count
Bioconductor packages:
- GenomicRanges
- GenomicAlignments
- Rsamtools
- biomaRt
- pasilla
- DESeq - only for some dependencies
- DESeq2
- DEXSeq
(Meta)Pipeline
- IRAP - which will be used through docker for this tutorial

Other resources

Course data

Complete course data

Tutorials

Course materials available at the Bioconductor website
Online training resources at the EBI website
R and Bioconductor tutorial by Thomas Girke
Do not forget to check the documentation for the packages used in the practical!

Cheat sheets

Aknowledgments

This tutorial has been inspired on material developed by Mar Gonzalez-Porta, Liliana Greger, Nuno Fonseca, Claudia Calabrese, Fatemeh Zamanzad, Ângela Gonçalves, Nicolas Delhomme, Simon Anders and Martin Morgan, who we would like to thank and acknowledge.

Name		Name	Last commit message	Last commit date
Latest commit History 551 Commits
doc		doc
img		img
pdf		pdf
solutions		solutions
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-seq data analysis practical

Table of contents

Software requirements

Other resources

Course data

Tutorials

Cheat sheets

Aknowledgments

About

Releases

Packages

Functional-Genomics/TeachingMaterial

Folders and files

Latest commit

History

Repository files navigation

RNA-seq data analysis practical

Table of contents

Software requirements

Other resources

Course data

Tutorials

Cheat sheets

Aknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages