- chimeric-mitochondrial-RNA-analysis
- Overview
- RNA-Seq datasets
- Dependencies
- Analysis procedure
- Prepare STAR-Fusion reference files
- Rat aging muscle dataset analysis
- Download the rat aging muscle sequence data
- Add fragment counts to the rat aging muscle data
- Run STAR-Fusion on the rat aging muscle data
- Merge the STAR-Fusion results for the rat aging muscle data
- Add fragment counts to the rat aging muscle STAR-Fusion results
- Compare the STAR-Fusion results among samples for the rat aging muscle data
- Human Twinkle mutation dataset analysis
- Download the human Twinkle mutation sequence data
- Add fragment counts to the human Twinkle mutation data
- Run STAR-Fusion on the human Twinkle mutation data
- Merge the STAR-Fusion results for the human Twinkle mutation data
- Add fragment counts to the human Twinkle mutation STAR-Fusion results
- Compare the STAR-Fusion results among samples for the Twinkle mutation data
- Human aging muscle dataset analysis
- Download the human aging muscle sequence data
- Add fragment counts to the human aging muscle data
- Run STAR-Fusion on the human aging muscle data
- Merge the STAR-Fusion results for the human aging muscle data
- Add fragment counts to the human aging muscle STAR-Fusion results
- Compare the STAR-Fusion results among samples for the human aging muscle data
- Human aging brain dataset analysis
- Download the human aging brain sequence data
- Add fragment counts to the human aging brain data
- Run STAR-Fusion on the human aging brain data
- Merge the STAR-Fusion results for the human aging brain data
- Add fragment counts to the human aging brain STAR-Fusion results
- Compare the STAR-Fusion results among samples for the human aging brain data
- Human common deletion dataset analysis
- Download the human common deletion sequence data
- Trim the human common deletion sequence data
- Add fragment counts to the human common deletion data
- Run STAR-Fusion on the human common deletion data
- Merge the STAR-Fusion results for the human common deletion data
- Add fragment counts to the human common deletion STAR-Fusion results
- Compare the STAR-Fusion results among samples for the human common deletion data
This repository contains the code and methods used to characterize chimeric mitochondrial RNA transcripts in RNA-Seq datasets.
The results of this study are described in the following publication:
Vandiver AR, Herbst A, Stothard P, Wanagat J. Chimeric mitochondrial RNA transcripts predict mitochondrial genome deletion mutations in mitochondrial genetic diseases and aging. Genome Res. 2024 Nov 27:gr.279072.124. doi: 10.1101/gr.279072.124. Epub ahead of print. PMID: 39603705.
To download the repository:
git clone [email protected]:paulstothard/chimeric-mitochondrial-RNA-analysis
or download the latest release.
The scripts and procedures in this repository use STAR-Fusion to identify candidate fusion transcripts. R code is used to parse the STAR-fusion output files for each dataset and to enumerate mitochondrial gene fusions within each sample. For each observed fusion type (based on genes involved and ignoring the precise boundaries of the fusion) the total number of supporting reads is calculated, using values extracted from the JunctionReadCount column. Next, a table termed "raw counts" is generated, consisting of samples (rows) and fusion types (columns) with cells containing the summation of the JunctionReadCount values. A second table, termed "FFPM" for "fusion fragments per million total RNA-Seq fragments" is generated from the first table by dividing each raw count by the total number of sequenced fragments (in millions) in the corresponding sample. SRA metadata is programmatically added to each table as additional columns, to facilitate further analyses. The raw counts and FFPM tables are written to a single Excel file as separate worksheets. PCA plots with and without sample labels and loadings are produced from the FFPM table and saved in PDF format.
Dataset download, STAR-Fusion analysis, and R analysis are performed using scripts provided in the scripts
directory. The scripts are designed to be run from the top-level directory in the repository. The output of the STAR-Fusion analysis for each dataset is written to a separate directory within a star-fusion-results
directory. Excel files containing the raw counts and FFPM tables, and the PCA plots in PDF format are included in the star-fusion-results-summary
folder for each of the datasets analyzed in this study.
Custom GTF files are used with STAR-Fusion to convey that the MT-ATP8 and MT-ATP6 genes, as well as the MT-ND4L and MT-ND4 genes, are encoded within single transcripts that do not represent chimeric mitochondrial RNA.
The detailed analysis procedure is described below and can be used to reproduce the results.
Five datasets are analyzed in this study:
Name | Source |
---|---|
Rat aging muscle | PRJNA793055 |
Human Twinkle mutation | PRJNA532885 |
Human aging muscle | PRJNA662072 |
Human aging brain | PRJNA283498 |
Human common deletion | Available by request |
Docker is used to run STAR-Fusion and to build the STAR-Fusion reference files.
To download the STAR-Fusion version 1.10.0 Docker image:
docker pull trinityctat/starfusion:1.10.0
For the other dependencies a Conda environment can be created using the following commands:
conda create -n chimeric-mtrna python=3.8
conda activate chimeric-mtrna
conda install -y -c bioconda fastp fastqc sra-tools trimmomatic
conda install -y -c anaconda h5py
conda install -y -c conda-forge parallel pigz r-base r-essentials
conda install -y -c conda-forge r-data.table r-ggfortify r-ggplot2 r-janitor r-openxlsx r-tidyverse r-writexl
The commands below assume that the scripts
, metadata
, and custom-GTFs
directories from this repository are in the current working directory.
STAR-Fusion requires a CTAT genome lib, which includes various data files used in fusion-finding. Separate CTAT genome libs will be created for the rat and human datasets.
if [ ! -f "Dfam.h5" ]; then
wget https://www.dfam.org/releases/Dfam_3.3/families/Dfam.h5.gz
gunzip Dfam.h5.gz
fi
./scripts/famdb.py -i Dfam.h5 lineage -a Rattus
./scripts/famdb.py -i Dfam.h5 families -f hmm -a Rattus > rat_dfam.hmm
docker run -v "$(pwd)":/data --rm -u "$(id -u)":"$(id -g)" trinityctat/starfusion \
hmmpress /data/rat_dfam.hmm
Download and uncompress the rat reference genome sequence:
wget http://ftp.ensembl.org/pub/release-104/fasta/rattus_norvegicus/dna/\
Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz
gunzip Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz
Uncompress the custom GTF file:
gunzip custom-GTFs/Rattus_norvegicus.Rnor_6.0.104_custom.gtf.gz
Run the STAR-Fusion prep_genome_lib.pl
script, writing the output to the rat_ctat_genome_lib_build_dir_custom_MT
directory:
docker run -v "$(pwd)":/data --rm trinityctat/starfusion \
/usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl \
--genome_fa /data/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa \
--gtf /data/custom-GTFs/Rattus_norvegicus.Rnor_6.0.104_custom.gtf \
--pfam_db current \
--dfam_db /data/rat_dfam.hmm \
--output_dir /data/rat_ctat_genome_lib_build_dir_custom_MT
Note that the above may create output owned by root. To change the ownership to the current user:
sudo chown -R $(id -u):$(id -g) rat_ctat_genome_lib_build_dir_custom_MT
If sudo
is not available, try the following:
HOST_UID=$(id -u)
HOST_GID=$(id -g)
docker run -v "$(pwd)":/data --rm trinityctat/starfusion /bin/bash -c "\
chown -R $HOST_UID:$HOST_GID /data/rat_ctat_genome_lib_build_dir_custom_MT"
if [ ! -f "Dfam.h5" ]; then
wget https://www.dfam.org/releases/Dfam_3.3/families/Dfam.h5.gz
gunzip Dfam.h5.gz
fi
./scripts/famdb.py -i Dfam.h5 lineage -a human
./scripts/famdb.py -i Dfam.h5 families -f hmm -a human > human_dfam.hmm
docker run -v "$(pwd)":/data --rm -u "$(id -u)":"$(id -g)" trinityctat/starfusion \
hmmpress /data/human_dfam.hmm
Download and uncompress the human reference genome sequence:
wget http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/\
Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
Uncompress the custom GTF file:
gunzip custom-GTFs/Homo_sapiens.GRCh38.104_custom.gtf.gz
Run the STAR-Fusion prep_genome_lib.pl
script, writing the output to the human_ctat_genome_lib_build_dir_custom_MT
directory:
docker run -v "$(pwd)":/data --rm trinityctat/starfusion \
/usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl \
--genome_fa /data/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--gtf /data/custom-GTFs/Homo_sapiens.GRCh38.104_custom.gtf \
--pfam_db current \
--dfam_db /data/human_dfam.hmm \
--output_dir /data/human_ctat_genome_lib_build_dir_custom_MT
Note that the above may create output owned by root. To change the ownership to the current user:
sudo chown -R $(id -u):$(id -g) human_ctat_genome_lib_build_dir_custom_MT
If sudo
is not available, try the following:
HOST_UID=$(id -u)
HOST_GID=$(id -g)
docker run -v "$(pwd)":/data --rm trinityctat/starfusion /bin/bash -c "\
chown -R $HOST_UID:$HOST_GID /data/human_ctat_genome_lib_build_dir_custom_MT"
./scripts/run-fasterq-dump.sh \
metadata/rat-aging-muscle/SRR_Acc_List.txt \
rat-aging-muscle-data
./scripts/count-fragments.sh rat-aging-muscle-data
./scripts/run-star-fusion.sh \
-i rat-aging-muscle-data \
-o rat-aging-muscle-data-results \
-r rat_ctat_genome_lib_build_dir_custom_MT \
-p 1
./scripts/merge-star-fusion-results.sh \
rat-aging-muscle-data-results \
star-fusion-results/rat-aging-muscle
cp rat-aging-muscle-data/fragment_counts.txt \
star-fusion-results/rat-aging-muscle
Rscript scripts/summarize-rat-aging-muscle.R
The resulting Excel file and PDF plots are available in the star-fusion-results-summary/rat-aging-muscle
directory.
./scripts/run-fasterq-dump.sh \
metadata/human-Twinkle-mutation/SRR_Acc_List.txt \
human-Twinkle-mutation-data
./scripts/count-fragments.sh human-Twinkle-mutation-data
./scripts/run-star-fusion.sh \
-i human-Twinkle-mutation-data \
-o human-Twinkle-mutation-data-results \
-r human_ctat_genome_lib_build_dir_custom_MT \
-p 1
./scripts/merge-star-fusion-results.sh \
human-Twinkle-mutation-data-results \
star-fusion-results/human-Twinkle-mutation
cp human-Twinkle-mutation-data/fragment_counts.txt \
star-fusion-results/human-Twinkle-mutation
Rscript scripts/summarize-human-Twinkle-mutation.R
The resulting Excel file and PDF plots are available in the star-fusion-results-summary/human-Twinkle-mutation
directory.
./scripts/run-fasterq-dump.sh \
metadata/human-aging-muscle/SRR_Acc_List.txt \
human-aging-muscle-data
./scripts/count-fragments.sh human-aging-muscle-data
./scripts/run-star-fusion.sh \
-i human-aging-muscle-data \
-o human-aging-muscle-data-results \
-r human_ctat_genome_lib_build_dir_custom_MT \
-p 1
./scripts/merge-star-fusion-results.sh \
human-aging-muscle-data-results \
star-fusion-results/human-aging-muscle
cp human-aging-muscle-data/fragment_counts.txt \
star-fusion-results/human-aging-muscle
Rscript scripts/summarize-human-aging-muscle.R
The resulting Excel file and PDF plots are available in the star-fusion-results-summary/human-aging-muscle
directory.
./scripts/run-fasterq-dump.sh \
metadata/human-aging-brain/SRR_Acc_List.txt \
human-aging-brain-data
./scripts/count-fragments.sh human-aging-brain-data
./scripts/run-star-fusion.sh \
-i human-aging-brain-data \
-o human-aging-brain-data-results \
-r human_ctat_genome_lib_build_dir_custom_MT \
-p 1
./scripts/merge-star-fusion-results.sh \
human-aging-brain-data-results \
star-fusion-results/human-aging-brain
cp human-aging-brain-data/fragment_counts.txt \
star-fusion-results/human-aging-brain
Rscript scripts/summarize-human-aging-brain.R
The resulting Excel file and PDF plots are available in the star-fusion-results-summary/human-aging-brain
directory.
This data is available by request.
./scripts/trim-reads-fastp.sh \
-i human-common-deletion-data \
-o human-common-deletion-data-trimmed
./scripts/count-fragments.sh human-common-deletion-data-trimmed
./scripts/run-star-fusion.sh \
-i human-common-deletion-data-trimmed \
-o human-common-deletion-data-results \
-r human_ctat_genome_lib_build_dir_custom_MT \
-p 1
./scripts/merge-star-fusion-results.sh \
human-common-deletion-data-results \
star-fusion-results/human-common-deletion
cp human-common-deletion-data-trimmed/fragment_counts.txt \
star-fusion-results/human-common-deletion
Rscript scripts/summarize-human-common-deletion.R
The resulting Excel file and PDF plots are available in the star-fusion-results-summary/human-common-deletion
directory.