Skip to content

marti-dotcom/multi-annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Variant Annotation Pipeline

Overview

This repository contains an automated annotation pipeline for processing human VCF files using multiple annotation tools, listed below:

  • ANNOVAR
  • SnpEff
  • Ensembl VEP
  • BCFtools/csq
  • FATHMM-MKL

The Python pipeline (five_tools.py) was developed to evaluate variant annotations from curated ClinVar and OMIM datasets, especially for single-nucleotide variants (SNVs), which are related to Mendelian and complex disorders. The personally curated VCF files can be found in this GitHub page above, named Clinvar_SNVs.vcf.gz and OMIM_SNVs.vcf.gz. The files contain a total of 3,226,691 SNVs, separated relative to their benchmark, OMIM and ClinVar databases, to assess the software tools' accuracy and consistency.


Dependencies

Please make sure that the following tools are installed and accessible:


Setting up your Environment

It is recommended to run this pipeline in a conda environment (especially for VEP) to manage tool dependencies and avoid any errors and conflicts:

conda create -n variant_env python=3.10
conda activate variant_env
conda install -c bioconda ensembl-vep bcftools samtools

Also, please make sure that the following tools are downloaded manually and placed in your home directory:

~/annovar/
~/snpEff/
~/fathmm-MKL/

Directory Structure

Your directory should be structured like this:

/home/user/
├── annovar/
├── snpEff/
├── fathmm-MKL/
├── input.vcf
└── five_tools.py

Usage

To run the pipeline, simply call the following from the command line:

python five_tools.py /full/path/to/input.vcf

This will sequentially run:

  1. ANNOVAR annotation
  2. SnpEff annotation
  3. Ensembl VEP annotation
  4. BCFtools/csq annotation
  5. FATHMM-MKL prediction

Output

The following output files will be generated in your working directory, according to the software tools, with their relative variant annotations:

  • output_annovar.hg38_multianno.txt
  • output_snpeff.vcf
  • output_vep.vcf
  • output_bcftools_csq.vcf
  • output_predictions.txt (for FATHMM-MKL)

License

This project is for academic and research use only. Please cite the original tools (ANNOVAR, SnpEff, VEP, etc.) in any publications.


Contact

Developed by: Martina Debnath | MSc Genetics and Multiomics in Medicine | UCL

Thank you for using my multi-annotator tool <3

Feel free to reach out for collaborations.

GitHub: @marti-dotcom

Email: [email protected]


About

Automated variant annotation pipeline, and curated OMIM and ClinVar SNV datasets, developed by Martina Debnath.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages