Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LDWeaver with plasmidic dataset #10

Open
braddmg opened this issue Jan 27, 2025 · 3 comments
Open

Use LDWeaver with plasmidic dataset #10

braddmg opened this issue Jan 27, 2025 · 3 comments

Comments

@braddmg
Copy link

braddmg commented Jan 27, 2025

Hello Sudaraka! I hope you are doing well.

I would like to use the software to analyze a dataset of 400 related plasmids. While these plasmids are not identical, I aim to determine whether a proportion of them exhibit linkage disequilibrium between the multidrug resistance (MDR) region and other core genes. The plasmids are quite large, exceeding 200 kb in size, and I have their sequences in FASTA format.

My main question is: how should I proceed? Can I use multiple GenBank annotation files for each plasmid sequence, or is that unnecessary? Selecting a single representative plasmid sequence would be hard.

@Sudaraka88
Copy link
Owner

Hi Bradd, sounds like very exciting work.

Unfortunately, it is a bit tricky to do this type of analysis when sequence diversity is high. We have another pipeline that could be better suited: https://github.com/Sudaraka88/PAN-GWES

The idea is to build a pan-genome of the plasmids (using gen bank annotations) and run the analysis on the deBruijn graph (at a unitig presence/absence level). However, measuring the distance between unitigs can be tricky at times...

@braddmg
Copy link
Author

braddmg commented Jan 28, 2025

I already have a presence/absence matrix from the pangenomic analysis with plasmids annotated using KEGG, including proteins and metabolic modules. I plan to run an additive Bayesian network analysis to investigate gene co-occurrence. However, I'm also interested in exploring the physical distance (in base pairs) between co-occurring genes.

@Sudaraka88
Copy link
Owner

I see. It completely depends on your question(s), but I think the pan-genome approach I mentioned above might be the most straightforward - it is probably worth a try...

Also, how diverse is the accessory genome? Instead of selecting a representative plasmid, you might be able to build a reference pseudo-plasmid using the pan genome. If done reasonably well, you can run LDWeaver. But this can be tricky depending on the accessory diversity: if genes move around a lot, genomic distances measured using a pseudo-reference is unlikely to represent the true distance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants