The SNP pipeline is to generate a vcf file based on the raw data (fastq files) of the given samples. The script scripts/SNPpipeline.py requires an input as a json file (examples/SNP_data_B8441.json for example) containing all information about the genomes and the tools used for the analysis. Longshot (Edge et al. 2019) is used for long sequencing reads (nanopore for example), and gatk (Van der Auwera & O'Connor 2020) is used for short sequencing reads (Illumina for example).
- longshot, used for long reads
- minimap2, used to map long reads to the reference genome
- gatk, used short reads
- bwa, used to map short reads to the reference genome
- sratoolkit, optional for downloading sra
- picard, optional for marking duplicates for short reads
- vcftools, for merging vcf files
- sratoolkit, optional, for downloading fastq files if SRA accession numbers are given
scripts/SNPpipeline.py -i examples/SNP_data_B8441.json -o B8441_vcf -prefix allsnps
Duong Vu (2023). https://github.com/vuthuyduong/SNPanalysis. DOI: 10.5281/zenodo.8046747