Skip to content

antigenomics/ilrfilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
Oct 28, 2019
Oct 28, 2019
Oct 28, 2019
Oct 28, 2019
Oct 28, 2019

Repository files navigation

Pre-filtering of immunoglobulin-like reads

Speed up TCR/BCR mapping for large FASTQ files like the ones coming from RNA-Seq experiments. Inspired by Vidjil algorithm.

You can get latest JAR from releases section, required Java 1.8+ to run. Run as

java -jar ilrfilter-0.0.1.jar hash -S hsa -I reads_R1.fastq reads_R2.fastq -O out_prefix

To see the list of available options run either

java -jar ilrfilter-0.0.1.jar hash

for hashmap-based (kmer) algorithm or

java -jar ilrfilter-0.0.1.jar tree

for tree-based algorithm. To compile and check clone the repo and run test.sh in examples/ folder.

Tree-based algorithm is slower but takes less memory and startup time than hash-based.

Note that we found that using a K-mer of length 15 with 1 mismatch (default parameters for hash-based algorithm) allows reducing data size ~10 to 50-fold while having a false-negative rate < 0.1%. Selecting longer K-mers or more mismatches for hash-based algorhtm, or using several substitutions and indels for tree-based algorithm can significantly increase running time/memory requirements and lead to no filtering of input file.

Also note that this implementation uses Java Stream API from 1.8 so it will use all available cores by default.