Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to plot secondary clustering plots #202

Open
etd530 opened this issue Jul 24, 2023 · 6 comments
Open

Failure to plot secondary clustering plots #202

etd530 opened this issue Jul 24, 2023 · 6 comments
Labels

Comments

@etd530
Copy link

etd530 commented Jul 24, 2023

Dear developers,

When I run dRep with the -nc parameter set to 0.6, the program generates most output correctly but the PDF files of the secondary clustering dendrograms and MDS cannot be opened. In the log file it says:

07-24 14:36 INFO     Plotting secondary dendrograms
07-24 14:36 INFO     Failed to make plot #2: invalid literal for int() with base 10: '19.21'
07-24 14:36 INFO     Plotting MDS plot
07-24 14:36 INFO     Failed to make plot #3: invalid literal for int() with base 10: '19.21'

The exact command is:

dRep dereplicate dRep_comparison.s_ani_099.tertiary_clust.MAF_60/ -g *.fna --S_ani 0.99 --S_algorithm fastANI --run_tertiary_clustering -nc 0.6

Strangely, when I set the parameter to 0.5 (or leave it at the default value), all plots are correctly generated. Could this be a bug? Or am I missing something?

I am running dRep version 3.4.3 on Ubuntu 18.04.

Thanks in advance!

Eric

@MrOlm
Copy link
Owner

MrOlm commented Jul 26, 2023

Hi Eric,

Intersting. I think this bug has to do with an incompatibility with those plots and "run_tertiary_clustering". To be sure, would you mind uploading the Cdb.csv file from the run that failed to make the plot?

Best,
Matt

@etd530
Copy link
Author

etd530 commented Jul 26, 2023

Hi Matt,
Sure, here it is:
Cdb.csv

Best,
Eric

@MrOlm
Copy link
Owner

MrOlm commented Jul 26, 2023

OK- it is definitely the case that the tertiary_clustering is causing this problem. I will fix it in the next dRep update.

If you'd like to make the plots now, a hack to do this would be to edit Cdb.csv and rename all secondary clusters with points in them (e.g. 2_19.21) to remove the points (e.g. 2_1921).

Thanks for bringing this to my attention!

MO

@MrOlm MrOlm added the bug label Jul 26, 2023
@etd530
Copy link
Author

etd530 commented Jul 27, 2023

Thanks for the help Matt! I would like to make the plots but I don't understand how. If I edit the Cdb.csv and run:

dRep dereplicate dRep_comparison.s_ani_099.tertiary_clust.MAF_60/ --S_ani 0.99 --S_algorithm fastANI --run_tertiary_clustering -nc 0.6

It overwrites the file again and the points are already there. Am I missing something? Is there an option to just make the plots?

Thanks!

Eric

@MrOlm
Copy link
Owner

MrOlm commented Jul 27, 2023

My apologies Eric! In pervious versions of dRep it was possible to run the plots on already-completed data. I forgot that I removed that functionality a few updates ago.

For now, the only way to make these plots is to remove the --run_tertiary_clustering flag. To achieve the exact same effect, you can run dRep twice with the same arguments. The first time include the genomes you're including now, and the second time just run it on the output of the first run (the dereplicated_genomes folder). That second run is all --run_tertiary_clustering does, and running it twice in this way will achieve the same effect.

Sorry about the hacky solution while I work on a real update

-MO

@etd530
Copy link
Author

etd530 commented Jul 28, 2023

Thanks Matt! Don't worry, the hacky solution is fine. Also the Cluster_scoring.pdf plots are fine, so I can check the clusters there too.
Eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants