Basic analysis of bacterial RNAseq data for differential gene expression.
- Read files
- RNA-Seq data in FASTQ format (Ex: dataset1.fastq, dataset2.fastq)
- Reference files
- Genome sequence in GenBank or GFF3 format (Ex: REL606.gbk)
- Adaptor filtering (Trimmomatic)
- Download Trimmomatic
- Use this protocol.
- Read mapping (Bowtie2)
- Download Bowtie2
- Download an executable for your platform
- Add this directory to your $PATH or move bowtie, and bowtie-* star executable to your $PATH
- Sequence conversion and runfule generation (breseq and gdtools)
- Read counting (htseq)
- Python3
- Install using pip
- Can install with bioconda (use 0.12.4 to avoid Pythion incompatibilities)
- Differential gene expression (DEseq2)
- Download and install R
- Bioconductor R modules
- library(deseq2)
- These brnaseq scripts
- Put them into your
$PATH
. - Download from Github.
- Put them into your
The files should contain information about the reads and the references used in the analysis.
Example
Use this protocol.
gdtools RUNFILE --mode trimmomatic-PE-unique --preserve-pairs
brnaseq
gdtools RUNFILE --preserve-pairs --executable brnaseq --options "-j 12 -k"
differential_gene_expression.R
For an explanation of the methods: Manual and Instructions
graph_gene_counts.R
- MetaCyc
- clusterProfiler R package
You should always inspect your data at the nucleotide level to see if there are indications that an analysis has gone awry. For example, are all of the forward reads on the correct strand? Are there the mutations you expect in a certain strain there?
And convert to BAM format (assumes single-end data):
samtools faidx REL606.fna
samtools import REL606.fna datasetX.sam datasetX.unsorted.bam
samtools sort --threads 8 -o datasetX.bam datasetX.unsorted.bam
samtools index datasetX.bam
Now you can use IGV to view them!
Extract R1, sort and index; Index FASTA reference
samtools faidx reference.fna
samtools view -hbf 64 aligned.paired.sam > unsorted_R1.bam
samtools sort --threads 8 -o sorted_R1.bam unsorted_R1.bam
samtools index datasetX.bam
Plot to table file
breseq BAM2COV -b sorted_R1.bam -f reference.fna -t -p 0 <seq_id:start-end>
If you use this pipeline, you should cite:
- trimmomatic
- bowtie2
- htseq
- DESeq2