Skip to content

This pipeline is to generate a VCF file for a set of genomes, compared with a reference genome, developed by Duong Vu.

License

Notifications You must be signed in to change notification settings

WesterdijkInstitute/SNPanalysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DOI

SNPanalysis

The SNP pipeline is to generate a vcf file based on the raw data (fastq files) of the given samples. The script scripts/SNPpipeline.py requires an input as a json file (examples/SNP_data_B8441.json for example) containing all information about the genomes and the tools used for the analysis. Longshot (Edge et al. 2019) is used for long sequencing reads (nanopore for example), and gatk (Van der Auwera & O'Connor 2020) is used for short sequencing reads (Illumina for example).

Dependencies (see examples/SNP_data_B8441.json for example):

  • longshot, used for long reads
  • minimap2, used to map long reads to the reference genome
  • gatk, used short reads
  • bwa, used to map short reads to the reference genome
  • sratoolkit, optional for downloading sra
  • picard, optional for marking duplicates for short reads

How to create a vcf file

scripts/SNPpipeline.py -i examples/SNP_data_B8441.json -o B8441_vcf -prefix allsnps

References

Duong Vu (2023). https://github.com/vuthuyduong/SNPanalysis. DOI: 10.5281/zenodo.8046747

About

This pipeline is to generate a VCF file for a set of genomes, compared with a reference genome, developed by Duong Vu.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%