Skip to content

IARCbioinfo/snpeff_annotation-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

snpeff_annotation-nf

Nextflow DSL2 pipeline to annotate VCF files with SnpEff and dbSnp

This repository contains a Nextflow DSL2 pipeline for annotating genetic variants in VCF files using SnpEff and dbSnp database. The pipeline processes input VCF files, performs various annotations, and generates a comprehensive annotation file.

Prerequisites

Make sure you have the following dependencies installed before running the pipeline:

Pipeline Overview

  1. FilterInputFiles: Filters input VCF files using PLINK 2 to retain PASS variants with a maximum of 2 alleles.

  2. AnnotateWithRSID: Annotates variants with RSID using SnpSift and the dbSNP database.

  3. AnnotateWithImpact: Annotates variants with functional impact using snpEff and a specified reference genome.

  4. FullyAnnotateWithDbSNP: Performs comprehensive annotation using SnpSift and dbNSFP database, including information on gene impact, gnomAD data, REVEL scores, ClinVar information, and more.

  5. ExtractFields: Extracts relevant fields from the annotated VCF files and creates a tab-separated text file with a header for downstream analysis.

Usage

  1. Clone the repository:

    git clone https://github.com/IARCbioinfo/snpeff_annotation-nf
    cd snpeff_annotation-nf
  2. Adjust the nextflow.config file if necessary. The package versions are specified in environment.yml file.

  3. Run the pipeline with:

    nextflow run main.nf -profile conda

Input

Name Default value Description
--input_folder_with_VCF_files ${baseDir}/VCFs/ Folder containing *vcf.gz files

Parameters

  • Optional

Name Default value Description
--reference_genome GRCh37.75 Reference genome
--dbNSF_path ${baseDir}/dbNSFP4.1a.txt.gz dbNSFP database
--dbSNP_path ${baseDir}/dbsnp150.vcf.gz dbSNP database
--output_path ${baseDir}/output Output folder

Output

The final annotated and extracted information will be available in the output directory as full_annotation.txt.

Customization

  • Adjust the memory requirements etc in the nextflow.config file.
  • Customize the annotation processes in the main.nf script based on your specific requirements.

Acknowledgments

  • This pipeline utilizes various bioinformatics tools and databases, including PLINK, bcftools, SnpSift, snpEff, dbNSFP, and dbSNP.

About

Annotate VCF files with SnpEff and dbSnp

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published