Skip to content

SIGA design

chungongyu edited this page Oct 18, 2018 · 3 revisions

Overview

SIGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient.

An SIGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. Example real-data assemblies can be found here.

Error Correction

The first stage of the assembly. An FM-index of the sequence reads is constructed, then base calling errors are identified by finding low-frequency k-mers in the reads. The output from the error corrector is a set of FASTQ files containing the corrected read sequences.

Contig Assembly

An FM-index of the corrected sequence reads is constructed. Duplicate reads, and low-quality reads after correction, are found and discarded. siga overlap computes the structure of the string graph and contigs are built using siga assemble.

Clone this wiki locally