-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LDWeaver with plasmidic dataset #10
Comments
Hi Bradd, sounds like very exciting work. Unfortunately, it is a bit tricky to do this type of analysis when sequence diversity is high. We have another pipeline that could be better suited: https://github.com/Sudaraka88/PAN-GWES The idea is to build a pan-genome of the plasmids (using gen bank annotations) and run the analysis on the deBruijn graph (at a unitig presence/absence level). However, measuring the distance between unitigs can be tricky at times... |
I already have a presence/absence matrix from the pangenomic analysis with plasmids annotated using KEGG, including proteins and metabolic modules. I plan to run an additive Bayesian network analysis to investigate gene co-occurrence. However, I'm also interested in exploring the physical distance (in base pairs) between co-occurring genes. |
I see. It completely depends on your question(s), but I think the pan-genome approach I mentioned above might be the most straightforward - it is probably worth a try... Also, how diverse is the accessory genome? Instead of selecting a representative plasmid, you might be able to build a reference pseudo-plasmid using the pan genome. If done reasonably well, you can run LDWeaver. But this can be tricky depending on the accessory diversity: if genes move around a lot, genomic distances measured using a pseudo-reference is unlikely to represent the true distance. |
Hello Sudaraka! I hope you are doing well.
I would like to use the software to analyze a dataset of 400 related plasmids. While these plasmids are not identical, I aim to determine whether a proportion of them exhibit linkage disequilibrium between the multidrug resistance (MDR) region and other core genes. The plasmids are quite large, exceeding 200 kb in size, and I have their sequences in FASTA format.
My main question is: how should I proceed? Can I use multiple GenBank annotation files for each plasmid sequence, or is that unnecessary? Selecting a single representative plasmid sequence would be hard.
The text was updated successfully, but these errors were encountered: