snpeff_annotation-nf

Nextflow DSL2 pipeline to annotate VCF files with SnpEff and dbSnp

This repository contains a Nextflow DSL2 pipeline for annotating genetic variants in VCF files using SnpEff and dbSnp database. The pipeline processes input VCF files, performs various annotations, and generates a comprehensive annotation file.

Prerequisites

Make sure you have the following dependencies installed before running the pipeline:

Pipeline Overview

FilterInputFiles: Filters input VCF files using PLINK 2 to retain PASS variants with a maximum of 2 alleles.
AnnotateWithRSID: Annotates variants with RSID using SnpSift and the dbSNP database.
AnnotateWithImpact: Annotates variants with functional impact using snpEff and a specified reference genome.
FullyAnnotateWithDbSNP: Performs comprehensive annotation using SnpSift and dbNSFP database, including information on gene impact, gnomAD data, REVEL scores, ClinVar information, and more.
ExtractFields: Extracts relevant fields from the annotated VCF files and creates a tab-separated text file with a header for downstream analysis.

Usage

Clone the repository:

git clone https://github.com/IARCbioinfo/snpeff_annotation-nf
cd snpeff_annotation-nf

Adjust the nextflow.config file if necessary. The package versions are specified in environment.yml file.
Run the pipeline with:
```
nextflow run main.nf -profile conda
```

Input

Name	Default value	Description
`--input_folder_with_VCF_files`	`${baseDir}/VCFs/`	Folder containing `*vcf.gz` files

Parameters

Optional

Name	Default value	Description
`--reference_genome`	`GRCh37.75`	Reference genome
`--dbNSF_path`	`${baseDir}/dbNSFP4.1a.txt.gz`	dbNSFP database
`--dbSNP_path`	`${baseDir}/dbsnp150.vcf.gz`	dbSNP database
`--output_path`	`${baseDir}/output`	Output folder

Output

The final annotated and extracted information will be available in the output directory as full_annotation.txt.

Customization

Adjust the memory requirements etc in the nextflow.config file.
Customize the annotation processes in the main.nf script based on your specific requirements.

Acknowledgments

This pipeline utilizes various bioinformatics tools and databases, including PLINK, bcftools, SnpSift, snpEff, dbNSFP, and dbSNP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snpeff_annotation-nf

Nextflow DSL2 pipeline to annotate VCF files with SnpEff and dbSnp

Prerequisites

Pipeline Overview

Usage

Input

Parameters

Optional

Output

Customization

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config

License

IARCbioinfo/snpeff_annotation-nf

Folders and files

Latest commit

History

Repository files navigation

snpeff_annotation-nf

Nextflow DSL2 pipeline to annotate VCF files with SnpEff and dbSnp

Prerequisites

Pipeline Overview

Usage

Input

Parameters

Optional

Output

Customization

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages