Skip to content

kspham/mdup

Repository files navigation

Overview

Developed by BioTuring (www.bioturing.com), mdup is a tool that preprocess cloud-read data (read has barcode). mdup will do:

  • Remove duplicate reads, remove not primary reads, secondary alignment, unmapped reads.
  • Detect molecule by clustering reads have same barcode into group.
  • Get stats about sequencing and GEM performance.

Two reads are consider duplicate if they share same mapped position, mapped target, cigar, mate info (if paired-end).

Install

git clone https://github.com/kspham/mdup.git
cd mdup
bash build.sh

Usage

mdup take a BAM file as input, the Bam file must be sorted by coordinate and be indexed. Recommend using BWA to align cloud-read to referenece. All alignment record must have BX:Z: tag present for barcode.

mdup will generate some file in output directory:

  • output.bam : new BAM file after remove unneeded reads.
  • molecule.tsv : all molecule detected info.
  • summary.inf : stats about sequencing and GEM performance.
  • plot.html : plot of some metrics of stats.
./mdup [option] in.bam

Optional arguments:
  -t INT                number of threads [default: 1]
  -o DIR                output directory [default: "./mdup_out/"]
  -g FILE               reference file that generated bam file (for better stats)
  -n INT                minimum number of reads require for a molecule (default: 4)
  -l INT                minimum length require for a molecule (default: 1000)
  -k                    don't mark duplicate.

Contacts

Please report any issues directly to the github issue tracker.