Skip to content

PoonLab/sam2conseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The purpose of this script is to generate a consensus sequence from a SAM (sequence alignment map) format file that is a standard output of a short-read mapping program such as bowtie2.

We are in the process of validating this script on all published NGS data sets in the NCBI SRA database that were generated from betacoronavirus/SARS-COV-2 patient samples on Illumina platforms. You can view our progress in the issue tracker and the wiki document.

This script is derived from the Python script sam2aln.py that is part of the MiCall pipeline.

Example workflow

  1. Use cutadapt to remove contaminant adapter sequences:

    cutadapt -q 20,20 -a CTGTCTCTTATACACATCT -A CTGTCTCTTATACACATCT -o SRR11140746_1.trim.fastq.gz -p SRR11140746_2.trim.fastq.gz SRR11140746_1.fastq.gz SRR11140746_2.fastq.gz --cores 6
  2. Use bowtie2 to map reads to SARS-COV-2 reference sequence NC_045512:

    bowtie2 -x NC_045512 -1 SRR11140746_1.trim.fastq.gz -2 SRR11140746_2.trim.fastq.gz -S SRR11140746.trim.sam --local -p 12
  3. Run sam2conseq.py on resulting SAM file. We have to specify output paths to write the frequency table (CSV) and consensus sequence (plain text) as the second and third positional arguments, respectively:

    python3 sam2conseq.py data/SRR11140746.trim.sam test.csv test.out

    If the SAM file was generated from a FASTQ file containing unpaired reads, you need to specify an --unpaired flag:

    python3 sam2conseq.py --unpaired data/SRR11278167.trim.fastq data/SRR11278167.freqs.csv data/SRR11278167.conseq.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published