Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Konnector merges very few reads for high-coverage genome #489

Open
zcalamari opened this issue Dec 26, 2024 · 3 comments
Open

Konnector merges very few reads for high-coverage genome #489

zcalamari opened this issue Dec 26, 2024 · 3 comments

Comments

@zcalamari
Copy link

abyss-pe version 2.1.5 (singularity container)
Linux Distribution: Rocky Linux release 8.9 (Green Obsidian)

Hello,

I am trying to troubleshoot using Konnector to generate pseudoreads for a mammalian genome around 2.7 Gb. The genome in question is approximately 100x coverage, but Konnector merges very few reads. Here is an example output:

Bloom filter FPR: 0.672%
Connecting read pairs
Processed 920776180 read pairs
Merged (Unique path + Multiple paths): 18 (1.95e-06%)
No start/goal kmer: 96582293 (10.5%)
No path: 515094849 (55.9%)
Unique path: 18 (1.95e-06%)
Multiple paths: 0 (0%)
Too many paths: 667414 (0.0725%)
Too many branches: 0 (0%)
Too many path/path mismatches: 16476 (0.00179%)
Too many path/read mismatches: 0 (0%)
Contains cycle: 0 (0%)
Max cost exceeded: 308415130 (33.5%)
Skipped: 0 (0%)
Bloom filter FPR: 0.672%

This was generated with this command:
singularity exec /home/calamari/abyss.sif konnector -j 128 -k 95 -b 220G --fastq -o Kon_2 -v Mg_1P.fq.gz Mg_2P.fq.gz

I have tried different k-mers, bloom filter sizes, and minimum coverage settings, with no improvement. I also increased the maximum search cost (--max-cost), but the operation timed out after a week without producing a result. Konnector has worked for me with lower coverage genomes (30x coverage for a species with a similar sized genome), so I am hoping I can get it to work as well again with these higher-coverage genomes. Is there anything I can do to address the number of reads with "no path" or "max cost exceeded"?

Thanks for the help!

Best,
Zac

@warrenlr
Copy link
Contributor

Hi Zac, Thank you for your message and interest in Konnector.

Your intuition is correct, and in line with what we recommend in such cases (e.g., different k-mers, bloom filter sizes, and minimum coverage settings). I recommend you take a look at closed (similar) reported issues for additional insights.

The no start/goal, no path, and max cost metrics % are so high that I wonder whether this could be due to low read accuracy with this particular dataset (and/or very uneven coverage of the genome).

Thanks,
Rene

@zcalamari
Copy link
Author

zcalamari commented Feb 1, 2025

Hello,

Thanks for the response, I did check through the other closed issues and wasn't able to find anything quite like this with Konnector.

Based on the Konnector paper's description of assembling a human genome, I did try splitting my file into smaller subsets and running them with a separate Bloom filter. I was trying to use the --extend option, but abyss is now producing an error that reads have the same ID. It looks like Konnector is outputting sequence IDs in this format:

@GWNJ-1013:244:GW2108111358th:3:1101:2103:1000/1 1:N:0:GGATCTGA+TCCGTAGT
@GWNJ-1013:244:GW2108111358th:3:1101:2103:1000/1/2

I would expect the second read to be named "/2" and not "/1/2" if it's the paired read. Could the "/1/2" be causing the issue with abyss? Would it be safe to rename the second read to just "/2"? I have checked multiple times, and all my input files at each step are correct/supplied to abyss only once.

Thanks,
Zac

@warrenlr
Copy link
Contributor

warrenlr commented Feb 3, 2025

Please see my response to a similar issue:
#490 (comment)
Note: I am told by one of our developers that:

The extend option is also something we don’t really use ourselves

Please take a look at how we used Konnector for spruce genome assembly (cited in the above linked issue). Something important to consider, in that work, we used konnector and abyss-mergepairs iteratively (i.e., different k will work better for different pairs).

I hope it helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants