Use LDWeaver with plasmidic dataset #10

braddmg · 2025-01-27T04:19:07Z

Hello Sudaraka! I hope you are doing well.

I would like to use the software to analyze a dataset of 400 related plasmids. While these plasmids are not identical, I aim to determine whether a proportion of them exhibit linkage disequilibrium between the multidrug resistance (MDR) region and other core genes. The plasmids are quite large, exceeding 200 kb in size, and I have their sequences in FASTA format.

My main question is: how should I proceed? Can I use multiple GenBank annotation files for each plasmid sequence, or is that unnecessary? Selecting a single representative plasmid sequence would be hard.

Sudaraka88 · 2025-01-27T23:59:22Z

Hi Bradd, sounds like very exciting work.

Unfortunately, it is a bit tricky to do this type of analysis when sequence diversity is high. We have another pipeline that could be better suited: https://github.com/Sudaraka88/PAN-GWES

The idea is to build a pan-genome of the plasmids (using gen bank annotations) and run the analysis on the deBruijn graph (at a unitig presence/absence level). However, measuring the distance between unitigs can be tricky at times...

braddmg · 2025-01-28T03:31:25Z

I already have a presence/absence matrix from the pangenomic analysis with plasmids annotated using KEGG, including proteins and metabolic modules. I plan to run an additive Bayesian network analysis to investigate gene co-occurrence. However, I'm also interested in exploring the physical distance (in base pairs) between co-occurring genes.

Sudaraka88 · 2025-01-31T04:36:51Z

I see. It completely depends on your question(s), but I think the pan-genome approach I mentioned above might be the most straightforward - it is probably worth a try...

Also, how diverse is the accessory genome? Instead of selecting a representative plasmid, you might be able to build a reference pseudo-plasmid using the pan genome. If done reasonably well, you can run LDWeaver. But this can be tricky depending on the accessory diversity: if genes move around a lot, genomic distances measured using a pseudo-reference is unlikely to represent the true distance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use LDWeaver with plasmidic dataset #10

Use LDWeaver with plasmidic dataset #10

braddmg commented Jan 27, 2025

Sudaraka88 commented Jan 27, 2025

braddmg commented Jan 28, 2025

Sudaraka88 commented Jan 31, 2025

Use LDWeaver with plasmidic dataset #10

Use LDWeaver with plasmidic dataset #10

Comments

braddmg commented Jan 27, 2025

Sudaraka88 commented Jan 27, 2025

braddmg commented Jan 28, 2025

Sudaraka88 commented Jan 31, 2025