STRling is still in development. Please report bugs via GitHub issues.
STRling (pronounced like “sterling”) is a method to detect large STR expansions from short-read sequencing data. It is capable of detecting novel STR expansions, that is expansions where there is no STR in the reference genome at that position (or a different repeat unit from what is in the reference). It can also detect STR expansions that are annotated in the reference genome. STRling uses kmer counting to recover mis-mapped STR reads. It then uses soft-clipped reads to precisely discover the position of the STR expansion in the reference genome.
We recommending downloading the static binary.
Download strling
from the latest release from here.
Make it executable:
chmod +x strling
Install nim:
curl https://nim-lang.org/choosenim/init.sh -sSf > init.sh && sh init.sh
Install STRling:
git clone <URL>
cd STRling
nimble install
Compile options for development:
Compile in fast mode (danger):
nim c -d:danger -d:release src/strling.nim
name=hg002
strling extract -f $reference_fasta /path/to/$name.cram $name.bin
mkdir -p str-results/
strling call --output-prefix str-results/$name -f $reference_fasta /path/to/$name.cram $name.bin
The main output file is {$prefix}-genotype.txt
. It reports all STR expansion loci that pass thresholds as well as any provided as input. The columns are:
- chrom: chromosome/contig name
- left: predicted left boundry of STR locus
- right: predicted right boundry of STR locus
- repeatunit: predicted STR repeat unit
- allele1_est: estimated size of the shorter allele in repeat units, from spanning reads (if any)
- allele2_est: estimated size of the larger allele in repeat units, from anchored reads
- total_reads: number of reads supporting an expansion at this locus
- spanning_reads: number of reads that span the locus
- spanning_pairs: number of read pairs that span the locus
- left_clips: number of soft-clipped reads supporting the left side of the locus position
- right_clips: number of soft-clipped reads supporting the right side of the locus position
- unplaced_pairs: number of unplaced STR reads assigned to this locus (will only be >0 for a uniquely expanded repeat unit)
- depth: local median depth around the locus
- sum_str_counts: the sum of STR repeat units in all reads supporting an expansion
Some additional outputs are provided with detailed supporting evidence used to make the genotype calls:
- Putative str bounds:
{$prefix}-bounds.txt
- All str-like reads:
{$prefix}-reads.txt
- Spanning reads and spanning pairs:
{$prefix}-spanning.txt
- Counts of str-like reads that are unplaced (could not be assigned to a locus):
{$prefix}-unplaced.txt
nimble tests
If you get the error:
could not load: libhts.so
try:
export LD_LIBRARY_PATH=./htslib/