This repository contains utilities for the generation of gene models and annotations used in the RNA-seq and eQTL pipelines.
Gene-level expression and eQTLs from the GTEx project are calculated based on a collapsed gene model (i.e., combining all isoforms of a gene into a single transcript), according to the following rules:
- Transcripts annotated as “retained_intron” or “read_through” are excluded. Additionally, transcripts that overlap with annotated read-through transcripts may be blacklisted (blacklists for GENCODE v19, 24 & 25 are provided in this repository; no transcripts were blacklisted for v26).
- The union of all exon intervals of each gene is calculated.
- Overlapping intervals between genes are excluded from all genes.
Command:
python3 collapse_annotation.py gencode.v26.GRCh38.annotation.gtf gencode.v26.GRCh38.genes.gtf
where gencode.v26.GRCh38.annotation.gtf
is the GTF from GENCODE.
Further documentation is available on the GTEx Portal.