Replies: 2 comments 1 reply
-
Hi MariaMS23, Sounds frustrating. You're problem sounds very similar to mine. I wasn't able to solve my problem in the end. Rather than trying to assemble the whole chloroplast using a de novo method I constructed one by mapping the reads to a reference genome. Even using a reference genome of a different species of the same genus I able to identify variation within the chloroplast useful for phylogenetic studies. If you have the resources to pay for a method like Pacific Bio long-read sequencing that could solve your issue. I know this method was used to help construct other long chloroplast genomes like those in Vaccinium. |
Beta Was this translation helpful? Give feedback.
-
Thanks @Tmesipteris for actively participating; however, #256 is irrelevant. Also, #256 is about needing help getting a full cp genome for Ericaceae species with abundant repeats, which is why long-read sequencing will be helpful there (but not here). I've also edited both titles to stop misleading yourselves and others. Stop bothering yourselves and others about the size limits. There are plenty of published examples reporting much larger ones using GetOrganelle. The situation here is that IR is not complete, so GetOrganelle is only exporting one copy. I would strongly against using a reference-based approach, which contaminates the published organelle genome pool. For @mariaMS23 , even using the There is only one gap in your assembly graph, so I still believe reducing |
Beta Was this translation helpful? Give feedback.
-
Hi Kinggerm,
I also had quite similar issue, expected size around 169-170kb.
I followed the suggestion in FAQ What should I do with incomplete result/"broken assembly graph.
reducing word size (-w), increasing input reads (--reduce-reads-for-coverage or --max-reads), using close related seed (-s), increasing the round (-R) and used wider and denser -k, but i still could not get the complete genome, only 135kb.
I attached the graph and log file.
Thank you for your suggestion.
get_org.log.txt
Discussed in #256
Originally posted by Tmesipteris April 5, 2023
I am working on assembling the cp genome of a range of plant species and can easily obtain nicely assembled chloroplast genomes for those that are up to 160,000 bp in length using GetOrganelle. However, for two species in the Ericaceae family that have genomes between 170,000-200,000 bp long I cannot get the whole genome. So far I have tried using a -R of 100 to obtain more chloroplast reads, changing the -w value and increasing the disentangling time but I haven't changed -k values. However, no matter what I try I only get an incomplete genome consisting of scaffolds that represent around 135,000 bp in total.
It doesn't seem to be due to a lack of chloroplast reads because read depths are over 400. and when I map the reads identified by GetOrganelle as cp reads (extended_1_paired.fq and extended_2_paired.fq files) to the reference genome there is good coverage across the whole genome. However, when I align the scaffolds to the reference genome the two IRs are not completely represented suggesting that GetOrganelle is having trouble assembling the unusually long IRs in these species. Novoplasty also cannot assemble these long cp genomes.
Are there any ways to better assemble long chloroplast genomes using GetOrganelle?
Any help with this matter would be much appreciated.
Beta Was this translation helpful? Give feedback.
All reactions