Hi Matt!
I'm using dRep v3.4.5.
I've been using the compare command to look at secondary genomes clusters that all belong to the same primary cluster. I noticed that some are being assigned to their own secondary cluster even though their ANI is above my threshold. However, their alignment FRACTION with many other genomes is only 0.12-0.4 because they are SAGs and quite incomplete. I thought this was happening because the minimum coverage threshold default was set to be very low, but the 0.1 default should still group those genomes based more on their ANI. I tried rerunning with the coverage threshold explicitly set to 0.1, but got the same results.
Here is my code:
dRep compare -p 24 -g ~/Mendota_genomes/1_dRep/starting_genomes/*fna -sa 0.96 --cov_thresh 0.1 dRep
Here is a snippet of my output:
26Sep2015rr0052-bin.134.fna,SAG_2739367632.fna,0.978604,0.1875,1 27Jul2012rr0045-bin.200.fna,SAG_2739367632.fna,0.969617,0.25,1 MAGv2_3300020483-bin.4.fna,SAG_2739367632.fna,0.977835,0.19377162629757785,
It seems like 26Sep2015rr0052-bin.134.fna,SAG_2739367632, and MAGv2_3300020483-bin.4 should all be in the same cluster. Is it splitting them because of the way cluster-wide ANI is calculated (i.e. average?). Or some other reason?
Love all your tools and they are so accessible. Thank you for your efforts!
Hi Matt!
I'm using dRep v3.4.5.
I've been using the compare command to look at secondary genomes clusters that all belong to the same primary cluster. I noticed that some are being assigned to their own secondary cluster even though their ANI is above my threshold. However, their alignment FRACTION with many other genomes is only 0.12-0.4 because they are SAGs and quite incomplete. I thought this was happening because the minimum coverage threshold default was set to be very low, but the 0.1 default should still group those genomes based more on their ANI. I tried rerunning with the coverage threshold explicitly set to 0.1, but got the same results.
Here is my code:
dRep compare -p 24 -g ~/Mendota_genomes/1_dRep/starting_genomes/*fna -sa 0.96 --cov_thresh 0.1 dRepHere is a snippet of my output:
26Sep2015rr0052-bin.134.fna,SAG_2739367632.fna,0.978604,0.1875,1 27Jul2012rr0045-bin.200.fna,SAG_2739367632.fna,0.969617,0.25,1 MAGv2_3300020483-bin.4.fna,SAG_2739367632.fna,0.977835,0.19377162629757785,It seems like 26Sep2015rr0052-bin.134.fna,SAG_2739367632, and MAGv2_3300020483-bin.4 should all be in the same cluster. Is it splitting them because of the way cluster-wide ANI is calculated (i.e. average?). Or some other reason?
Love all your tools and they are so accessible. Thank you for your efforts!