Konnector merges very few reads for high-coverage genome #489

zcalamari · 2024-12-26T19:31:48Z

abyss-pe version 2.1.5 (singularity container)
Linux Distribution: Rocky Linux release 8.9 (Green Obsidian)

Hello,

I am trying to troubleshoot using Konnector to generate pseudoreads for a mammalian genome around 2.7 Gb. The genome in question is approximately 100x coverage, but Konnector merges very few reads. Here is an example output:

Bloom filter FPR: 0.672%
Connecting read pairs
Processed 920776180 read pairs
Merged (Unique path + Multiple paths): 18 (1.95e-06%)
No start/goal kmer: 96582293 (10.5%)
No path: 515094849 (55.9%)
Unique path: 18 (1.95e-06%)
Multiple paths: 0 (0%)
Too many paths: 667414 (0.0725%)
Too many branches: 0 (0%)
Too many path/path mismatches: 16476 (0.00179%)
Too many path/read mismatches: 0 (0%)
Contains cycle: 0 (0%)
Max cost exceeded: 308415130 (33.5%)
Skipped: 0 (0%)
Bloom filter FPR: 0.672%

This was generated with this command:
singularity exec /home/calamari/abyss.sif konnector -j 128 -k 95 -b 220G --fastq -o Kon_2 -v Mg_1P.fq.gz Mg_2P.fq.gz

I have tried different k-mers, bloom filter sizes, and minimum coverage settings, with no improvement. I also increased the maximum search cost (--max-cost), but the operation timed out after a week without producing a result. Konnector has worked for me with lower coverage genomes (30x coverage for a species with a similar sized genome), so I am hoping I can get it to work as well again with these higher-coverage genomes. Is there anything I can do to address the number of reads with "no path" or "max cost exceeded"?

Thanks for the help!

Best,
Zac

The text was updated successfully, but these errors were encountered:

warrenlr · 2025-01-15T17:28:11Z

Hi Zac, Thank you for your message and interest in Konnector.

Your intuition is correct, and in line with what we recommend in such cases (e.g., different k-mers, bloom filter sizes, and minimum coverage settings). I recommend you take a look at closed (similar) reported issues for additional insights.

The no start/goal, no path, and max cost metrics % are so high that I wonder whether this could be due to low read accuracy with this particular dataset (and/or very uneven coverage of the genome).

Thanks,
Rene

zcalamari · 2025-02-01T15:47:23Z

Hello,

Thanks for the response, I did check through the other closed issues and wasn't able to find anything quite like this with Konnector.

Based on the Konnector paper's description of assembling a human genome, I did try splitting my file into smaller subsets and running them with a separate Bloom filter. I was trying to use the --extend option, but abyss is now producing an error that reads have the same ID. It looks like Konnector is outputting sequence IDs in this format:

@GWNJ-1013:244:GW2108111358th:3:1101:2103:1000/1 1:N:0:GGATCTGA+TCCGTAGT
@GWNJ-1013:244:GW2108111358th:3:1101:2103:1000/1/2

I would expect the second read to be named "/2" and not "/1/2" if it's the paired read. Could the "/1/2" be causing the issue with abyss? Would it be safe to rename the second read to just "/2"? I have checked multiple times, and all my input files at each step are correct/supplied to abyss only once.

Thanks,
Zac

warrenlr · 2025-02-03T16:49:35Z

Please see my response to a similar issue:
#490 (comment)
Note: I am told by one of our developers that:

The extend option is also something we don’t really use ourselves

Please take a look at how we used Konnector for spruce genome assembly (cited in the above linked issue). Something important to consider, in that work, we used konnector and abyss-mergepairs iteratively (i.e., different k will work better for different pairs).

I hope it helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Konnector merges very few reads for high-coverage genome #489

Konnector merges very few reads for high-coverage genome #489

zcalamari commented Dec 26, 2024

warrenlr commented Jan 15, 2025

zcalamari commented Feb 1, 2025 •

edited

Loading

warrenlr commented Feb 3, 2025 •

edited

Loading

Konnector merges very few reads for high-coverage genome #489

Konnector merges very few reads for high-coverage genome #489

Comments

zcalamari commented Dec 26, 2024

warrenlr commented Jan 15, 2025

zcalamari commented Feb 1, 2025 • edited Loading

warrenlr commented Feb 3, 2025 • edited Loading

zcalamari commented Feb 1, 2025 •

edited

Loading

warrenlr commented Feb 3, 2025 •

edited

Loading