Skip to content

Methods and scripts used to characterize chimeric mitochondrial RNA transcripts.

License

Notifications You must be signed in to change notification settings

paulstothard/chimeric-mitochondrial-RNA-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chimeric-mitochondrial-RNA-analysis

Overview

This repository contains the code and methods used to characterize chimeric mitochondrial RNA transcripts in RNA-Seq datasets.

The results of this study are described in the following publication:

Vandiver AR, Herbst A, Stothard P, Wanagat J. Chimeric mitochondrial RNA transcripts predict mitochondrial genome deletion mutations in mitochondrial genetic diseases and aging. Genome Res. 2024 Nov 27:gr.279072.124. doi: 10.1101/gr.279072.124. Epub ahead of print. PMID: 39603705.

To download the repository:

git clone [email protected]:paulstothard/chimeric-mitochondrial-RNA-analysis

or download the latest release.

The scripts and procedures in this repository use STAR-Fusion to identify candidate fusion transcripts. R code is used to parse the STAR-fusion output files for each dataset and to enumerate mitochondrial gene fusions within each sample. For each observed fusion type (based on genes involved and ignoring the precise boundaries of the fusion) the total number of supporting reads is calculated, using values extracted from the JunctionReadCount column. Next, a table termed "raw counts" is generated, consisting of samples (rows) and fusion types (columns) with cells containing the summation of the JunctionReadCount values. A second table, termed "FFPM" for "fusion fragments per million total RNA-Seq fragments" is generated from the first table by dividing each raw count by the total number of sequenced fragments (in millions) in the corresponding sample. SRA metadata is programmatically added to each table as additional columns, to facilitate further analyses. The raw counts and FFPM tables are written to a single Excel file as separate worksheets. PCA plots with and without sample labels and loadings are produced from the FFPM table and saved in PDF format.

Dataset download, STAR-Fusion analysis, and R analysis are performed using scripts provided in the scripts directory. The scripts are designed to be run from the top-level directory in the repository. The output of the STAR-Fusion analysis for each dataset is written to a separate directory within a star-fusion-results directory. Excel files containing the raw counts and FFPM tables, and the PCA plots in PDF format are included in the star-fusion-results-summary folder for each of the datasets analyzed in this study.

Custom GTF files are used with STAR-Fusion to convey that the MT-ATP8 and MT-ATP6 genes, as well as the MT-ND4L and MT-ND4 genes, are encoded within single transcripts that do not represent chimeric mitochondrial RNA.

The detailed analysis procedure is described below and can be used to reproduce the results.

RNA-Seq datasets

Five datasets are analyzed in this study:

Name Source
Rat aging muscle PRJNA793055
Human Twinkle mutation PRJNA532885
Human aging muscle PRJNA662072
Human aging brain PRJNA283498
Human common deletion Available by request

Dependencies

Docker is used to run STAR-Fusion and to build the STAR-Fusion reference files.

To download the STAR-Fusion version 1.10.0 Docker image:

docker pull trinityctat/starfusion:1.10.0

For the other dependencies a Conda environment can be created using the following commands:

conda create -n chimeric-mtrna python=3.8
conda activate chimeric-mtrna
conda install -y -c bioconda fastp fastqc sra-tools trimmomatic
conda install -y -c anaconda h5py
conda install -y -c conda-forge parallel pigz r-base r-essentials
conda install -y -c conda-forge r-data.table r-ggfortify r-ggplot2 r-janitor r-openxlsx r-tidyverse r-writexl

Analysis procedure

The commands below assume that the scripts, metadata, and custom-GTFs directories from this repository are in the current working directory.

Prepare STAR-Fusion reference files

STAR-Fusion requires a CTAT genome lib, which includes various data files used in fusion-finding. Separate CTAT genome libs will be created for the rat and human datasets.

Build a Dfam file for the rat genome

if [ ! -f "Dfam.h5" ]; then
    wget https://www.dfam.org/releases/Dfam_3.3/families/Dfam.h5.gz
    gunzip Dfam.h5.gz
fi
./scripts/famdb.py -i Dfam.h5 lineage -a Rattus
./scripts/famdb.py -i Dfam.h5 families -f hmm -a Rattus > rat_dfam.hmm

Prepare the rat Dfam file for STAR-Fusion

docker run -v "$(pwd)":/data --rm -u "$(id -u)":"$(id -g)" trinityctat/starfusion \
hmmpress /data/rat_dfam.hmm

Build the rat CTAT genome lib

Download and uncompress the rat reference genome sequence:

wget http://ftp.ensembl.org/pub/release-104/fasta/rattus_norvegicus/dna/\
Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz

gunzip Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz

Uncompress the custom GTF file:

gunzip custom-GTFs/Rattus_norvegicus.Rnor_6.0.104_custom.gtf.gz

Run the STAR-Fusion prep_genome_lib.pl script, writing the output to the rat_ctat_genome_lib_build_dir_custom_MT directory:

docker run -v "$(pwd)":/data --rm trinityctat/starfusion \
/usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl \
--genome_fa /data/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa \
--gtf /data/custom-GTFs/Rattus_norvegicus.Rnor_6.0.104_custom.gtf \
--pfam_db current \
--dfam_db /data/rat_dfam.hmm \
--output_dir /data/rat_ctat_genome_lib_build_dir_custom_MT

Note that the above may create output owned by root. To change the ownership to the current user:

sudo chown -R $(id -u):$(id -g) rat_ctat_genome_lib_build_dir_custom_MT

If sudo is not available, try the following:

HOST_UID=$(id -u)
HOST_GID=$(id -g)

docker run -v "$(pwd)":/data --rm trinityctat/starfusion /bin/bash -c "\
chown -R $HOST_UID:$HOST_GID /data/rat_ctat_genome_lib_build_dir_custom_MT"

Build a Dfam file for the human genome

if [ ! -f "Dfam.h5" ]; then
    wget https://www.dfam.org/releases/Dfam_3.3/families/Dfam.h5.gz
    gunzip Dfam.h5.gz
fi
./scripts/famdb.py -i Dfam.h5 lineage -a human
./scripts/famdb.py -i Dfam.h5 families -f hmm -a human > human_dfam.hmm

Prepare the human Dfam file for STAR-Fusion

docker run -v "$(pwd)":/data --rm -u "$(id -u)":"$(id -g)" trinityctat/starfusion \
hmmpress /data/human_dfam.hmm

Build the human CTAT genome lib

Download and uncompress the human reference genome sequence:

wget http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/\
Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Uncompress the custom GTF file:

gunzip custom-GTFs/Homo_sapiens.GRCh38.104_custom.gtf.gz

Run the STAR-Fusion prep_genome_lib.pl script, writing the output to the human_ctat_genome_lib_build_dir_custom_MT directory:

docker run -v "$(pwd)":/data --rm trinityctat/starfusion \
/usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl \
--genome_fa /data/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--gtf /data/custom-GTFs/Homo_sapiens.GRCh38.104_custom.gtf \
--pfam_db current \
--dfam_db /data/human_dfam.hmm \
--output_dir /data/human_ctat_genome_lib_build_dir_custom_MT

Note that the above may create output owned by root. To change the ownership to the current user:

sudo chown -R $(id -u):$(id -g) human_ctat_genome_lib_build_dir_custom_MT

If sudo is not available, try the following:

HOST_UID=$(id -u)
HOST_GID=$(id -g)

docker run -v "$(pwd)":/data --rm trinityctat/starfusion /bin/bash -c "\
chown -R $HOST_UID:$HOST_GID /data/human_ctat_genome_lib_build_dir_custom_MT"

Rat aging muscle dataset analysis

Download the rat aging muscle sequence data

./scripts/run-fasterq-dump.sh \
metadata/rat-aging-muscle/SRR_Acc_List.txt \
rat-aging-muscle-data

Add fragment counts to the rat aging muscle data

./scripts/count-fragments.sh rat-aging-muscle-data

Run STAR-Fusion on the rat aging muscle data

./scripts/run-star-fusion.sh \
-i rat-aging-muscle-data \
-o rat-aging-muscle-data-results \
-r rat_ctat_genome_lib_build_dir_custom_MT \
-p 1

Merge the STAR-Fusion results for the rat aging muscle data

./scripts/merge-star-fusion-results.sh \
rat-aging-muscle-data-results \
star-fusion-results/rat-aging-muscle

Add fragment counts to the rat aging muscle STAR-Fusion results

cp rat-aging-muscle-data/fragment_counts.txt \
star-fusion-results/rat-aging-muscle

Compare the STAR-Fusion results among samples for the rat aging muscle data

Rscript scripts/summarize-rat-aging-muscle.R

The resulting Excel file and PDF plots are available in the star-fusion-results-summary/rat-aging-muscle directory.

Human Twinkle mutation dataset analysis

Download the human Twinkle mutation sequence data

./scripts/run-fasterq-dump.sh \
metadata/human-Twinkle-mutation/SRR_Acc_List.txt \
human-Twinkle-mutation-data

Add fragment counts to the human Twinkle mutation data

./scripts/count-fragments.sh human-Twinkle-mutation-data

Run STAR-Fusion on the human Twinkle mutation data

./scripts/run-star-fusion.sh \
-i human-Twinkle-mutation-data \
-o human-Twinkle-mutation-data-results \
-r human_ctat_genome_lib_build_dir_custom_MT \
-p 1

Merge the STAR-Fusion results for the human Twinkle mutation data

./scripts/merge-star-fusion-results.sh \
human-Twinkle-mutation-data-results \
star-fusion-results/human-Twinkle-mutation

Add fragment counts to the human Twinkle mutation STAR-Fusion results

cp human-Twinkle-mutation-data/fragment_counts.txt \
star-fusion-results/human-Twinkle-mutation

Compare the STAR-Fusion results among samples for the Twinkle mutation data

Rscript scripts/summarize-human-Twinkle-mutation.R

The resulting Excel file and PDF plots are available in the star-fusion-results-summary/human-Twinkle-mutation directory.

Human aging muscle dataset analysis

Download the human aging muscle sequence data

./scripts/run-fasterq-dump.sh \
metadata/human-aging-muscle/SRR_Acc_List.txt \
human-aging-muscle-data

Add fragment counts to the human aging muscle data

./scripts/count-fragments.sh human-aging-muscle-data

Run STAR-Fusion on the human aging muscle data

./scripts/run-star-fusion.sh \
-i human-aging-muscle-data \
-o human-aging-muscle-data-results \
-r human_ctat_genome_lib_build_dir_custom_MT \
-p 1

Merge the STAR-Fusion results for the human aging muscle data

./scripts/merge-star-fusion-results.sh \
human-aging-muscle-data-results \
star-fusion-results/human-aging-muscle

Add fragment counts to the human aging muscle STAR-Fusion results

cp human-aging-muscle-data/fragment_counts.txt \
star-fusion-results/human-aging-muscle

Compare the STAR-Fusion results among samples for the human aging muscle data

Rscript scripts/summarize-human-aging-muscle.R

The resulting Excel file and PDF plots are available in the star-fusion-results-summary/human-aging-muscle directory.

Human aging brain dataset analysis

Download the human aging brain sequence data

./scripts/run-fasterq-dump.sh \
metadata/human-aging-brain/SRR_Acc_List.txt \
human-aging-brain-data

Add fragment counts to the human aging brain data

./scripts/count-fragments.sh human-aging-brain-data

Run STAR-Fusion on the human aging brain data

./scripts/run-star-fusion.sh \
-i human-aging-brain-data \
-o human-aging-brain-data-results \
-r human_ctat_genome_lib_build_dir_custom_MT \
-p 1

Merge the STAR-Fusion results for the human aging brain data

./scripts/merge-star-fusion-results.sh \
human-aging-brain-data-results \
star-fusion-results/human-aging-brain

Add fragment counts to the human aging brain STAR-Fusion results

cp human-aging-brain-data/fragment_counts.txt \
star-fusion-results/human-aging-brain

Compare the STAR-Fusion results among samples for the human aging brain data

Rscript scripts/summarize-human-aging-brain.R

The resulting Excel file and PDF plots are available in the star-fusion-results-summary/human-aging-brain directory.

Human common deletion dataset analysis

Download the human common deletion sequence data

This data is available by request.

Trim the human common deletion sequence data

./scripts/trim-reads-fastp.sh \
-i human-common-deletion-data \
-o human-common-deletion-data-trimmed

Add fragment counts to the human common deletion data

./scripts/count-fragments.sh human-common-deletion-data-trimmed

Run STAR-Fusion on the human common deletion data

./scripts/run-star-fusion.sh \
-i human-common-deletion-data-trimmed \
-o human-common-deletion-data-results \
-r human_ctat_genome_lib_build_dir_custom_MT \
-p 1

Merge the STAR-Fusion results for the human common deletion data

./scripts/merge-star-fusion-results.sh \
human-common-deletion-data-results \
star-fusion-results/human-common-deletion

Add fragment counts to the human common deletion STAR-Fusion results

cp human-common-deletion-data-trimmed/fragment_counts.txt \
star-fusion-results/human-common-deletion

Compare the STAR-Fusion results among samples for the human common deletion data

Rscript scripts/summarize-human-common-deletion.R

The resulting Excel file and PDF plots are available in the star-fusion-results-summary/human-common-deletion directory.