-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Konnector merges very few reads for high-coverage genome #489
Comments
Hi Zac, Thank you for your message and interest in Konnector. Your intuition is correct, and in line with what we recommend in such cases (e.g., different k-mers, bloom filter sizes, and minimum coverage settings). I recommend you take a look at closed (similar) reported issues for additional insights. The no start/goal, no path, and max cost metrics % are so high that I wonder whether this could be due to low read accuracy with this particular dataset (and/or very uneven coverage of the genome). Thanks, |
Hello, Thanks for the response, I did check through the other closed issues and wasn't able to find anything quite like this with Konnector. Based on the Konnector paper's description of assembling a human genome, I did try splitting my file into smaller subsets and running them with a separate Bloom filter. I was trying to use the --extend option, but abyss is now producing an error that reads have the same ID. It looks like Konnector is outputting sequence IDs in this format: @GWNJ-1013:244:GW2108111358th:3:1101:2103:1000/1 1:N:0:GGATCTGA+TCCGTAGT I would expect the second read to be named "/2" and not "/1/2" if it's the paired read. Could the "/1/2" be causing the issue with abyss? Would it be safe to rename the second read to just "/2"? I have checked multiple times, and all my input files at each step are correct/supplied to abyss only once. Thanks, |
Please see my response to a similar issue:
Please take a look at how we used Konnector for spruce genome assembly (cited in the above linked issue). Something important to consider, in that work, we used konnector and abyss-mergepairs iteratively (i.e., different k will work better for different pairs). I hope it helps |
abyss-pe version 2.1.5 (singularity container)
Linux Distribution: Rocky Linux release 8.9 (Green Obsidian)
Hello,
I am trying to troubleshoot using Konnector to generate pseudoreads for a mammalian genome around 2.7 Gb. The genome in question is approximately 100x coverage, but Konnector merges very few reads. Here is an example output:
Bloom filter FPR: 0.672%
Connecting read pairs
Processed 920776180 read pairs
Merged (Unique path + Multiple paths): 18 (1.95e-06%)
No start/goal kmer: 96582293 (10.5%)
No path: 515094849 (55.9%)
Unique path: 18 (1.95e-06%)
Multiple paths: 0 (0%)
Too many paths: 667414 (0.0725%)
Too many branches: 0 (0%)
Too many path/path mismatches: 16476 (0.00179%)
Too many path/read mismatches: 0 (0%)
Contains cycle: 0 (0%)
Max cost exceeded: 308415130 (33.5%)
Skipped: 0 (0%)
Bloom filter FPR: 0.672%
This was generated with this command:
singularity exec /home/calamari/abyss.sif konnector -j 128 -k 95 -b 220G --fastq -o Kon_2 -v Mg_1P.fq.gz Mg_2P.fq.gz
I have tried different k-mers, bloom filter sizes, and minimum coverage settings, with no improvement. I also increased the maximum search cost (--max-cost), but the operation timed out after a week without producing a result. Konnector has worked for me with lower coverage genomes (30x coverage for a species with a similar sized genome), so I am hoping I can get it to work as well again with these higher-coverage genomes. Is there anything I can do to address the number of reads with "no path" or "max cost exceeded"?
Thanks for the help!
Best,
Zac
The text was updated successfully, but these errors were encountered: