This repository contains an automated annotation pipeline for processing human VCF files using multiple annotation tools, listed below:
- ANNOVAR
- SnpEff
- Ensembl VEP
- BCFtools/csq
- FATHMM-MKL
The Python pipeline (five_tools.py) was developed to evaluate variant annotations from curated ClinVar and OMIM datasets, especially for single-nucleotide variants (SNVs), which are related to Mendelian and complex disorders. The personally curated VCF files can be found in this GitHub page above, named Clinvar_SNVs.vcf.gz and OMIM_SNVs.vcf.gz. The files contain a total of 3,226,691 SNVs, separated relative to their benchmark, OMIM and ClinVar databases, to assess the software tools' accuracy and consistency.
Please make sure that the following tools are installed and accessible:
perl
(for ANNOVAR)python ≥ 3.8
java ≥ 8
(for SnpEff)bcftools ≥ 1.9
samtools
tabix
- ANNOVAR
- SnpEff
- Ensembl VEP
- FATHMM-MKL
It is recommended to run this pipeline in a conda environment (especially for VEP) to manage tool dependencies and avoid any errors and conflicts:
conda create -n variant_env python=3.10
conda activate variant_env
conda install -c bioconda ensembl-vep bcftools samtools
Also, please make sure that the following tools are downloaded manually and placed in your home directory:
~/annovar/
~/snpEff/
~/fathmm-MKL/
Your directory should be structured like this:
/home/user/
├── annovar/
├── snpEff/
├── fathmm-MKL/
├── input.vcf
└── five_tools.py
To run the pipeline, simply call the following from the command line:
python five_tools.py /full/path/to/input.vcf
This will sequentially run:
- ANNOVAR annotation
- SnpEff annotation
- Ensembl VEP annotation
- BCFtools/csq annotation
- FATHMM-MKL prediction
The following output files will be generated in your working directory, according to the software tools, with their relative variant annotations:
output_annovar.hg38_multianno.txt
output_snpeff.vcf
output_vep.vcf
output_bcftools_csq.vcf
output_predictions.txt
(for FATHMM-MKL)
This project is for academic and research use only. Please cite the original tools (ANNOVAR, SnpEff, VEP, etc.) in any publications.
Developed by: Martina Debnath | MSc Genetics and Multiomics in Medicine | UCL
Thank you for using my multi-annotator tool <3
Feel free to reach out for collaborations.
GitHub: @marti-dotcom
Email: [email protected]