GitHub - antigenomics/ilrfilter: Immunoglobulin-like Read Filter

Pre-filtering of immunoglobulin-like reads

Speed up TCR/BCR mapping for large FASTQ files like the ones coming from RNA-Seq experiments. Inspired by Vidjil algorithm.

You can get latest JAR from releases section, required Java 1.8+ to run. Run as

java -jar ilrfilter-0.0.1.jar hash -S hsa -I reads_R1.fastq reads_R2.fastq -O out_prefix

To see the list of available options run either

java -jar ilrfilter-0.0.1.jar hash

for hashmap-based (kmer) algorithm or

java -jar ilrfilter-0.0.1.jar tree

for tree-based algorithm. To compile and check clone the repo and run test.sh in examples/ folder.

Tree-based algorithm is slower but takes less memory and startup time than hash-based.

Note that we found that using a K-mer of length 15 with 1 mismatch (default parameters for hash-based algorithm) allows reducing data size ~10 to 50-fold while having a false-negative rate < 0.1%. Selecting longer K-mers or more mismatches for hash-based algorhtm, or using several substitutions and indels for tree-based algorithm can significantly increase running time/memory requirements and lead to no filtering of input file.

Also note that this implementation uses Java Stream API from 1.8 so it will use all available cores by default.

Name	Name	Last commit message	Last commit date
Latest commit History 10 Commits
example	example	2prev	Oct 28, 2019
src	src	Version upd	Oct 28, 2019
.gitignore	.gitignore	Version upd	Oct 28, 2019
README.md	README.md	Version upd	Oct 28, 2019
pom.xml	pom.xml	Version upd	Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pre-filtering of immunoglobulin-like reads

About

Releases 1

Packages

Languages

antigenomics/ilrfilter

Folders and files

Latest commit

History

Repository files navigation

Pre-filtering of immunoglobulin-like reads

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages