Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute_3d_conformer takes a long time to run - Would it be reasonable to bypass it? #171

Open
wehs7661 opened this issue Jan 31, 2025 · 3 comments

Comments

@wehs7661
Copy link

wehs7661 commented Jan 31, 2025

Thanks to the team for making this amazing tool publicly available!

As I tested Boltz-1, I noticed that for some cases, the step of "processing input data" could take an unreasonably long time (>30 minutes). One example is protein 1c5f, for which I performed a prediction task with the following input:

>C|protein
MSKKDRRRVFLDVTIDGNLAGRIVMELYNDIAPRTCNNFLMLCTGMAGTGKISGKPLHYKGSTFHRVIKNFMIQGGDFTKGDGTGGESIYGGMFDDEEFVMKHDEPFVVSMANKGPNTNGSQFFITTTPAPHLNNIHVVFGKVVSGQEVVTKIEYLKTNSKNRPLADVVILNCGELV
>E|protein
MSKKDRRRVFLDVTIDGNLAGRIVMELYNDIAPRTCNNFLMLCTGMAGTGKISGKPLHYKGSTFHRVIKNFMIQGGDFTKGDGTGGESIYGGMFDDEEFVMKHDEPFVVSMANKGPNTNGSQFFITTTPAPHLNNIHVVFGKVVSGQEVVTKIEYLKTNSKNRPLADVVILNCGELV
>F|smiles
C/C=C/C[C@@H](C)[C@@H](O)[C@H]1C(=O)N[C@@H](CC)C(=O)N(C)CC(=O)N(C)[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N(C)[C@@H](CC(C)C)C(=O)N[C@@H](C)C(=O)N[C@H](C)C(=O)N(C)[C@@H](CC(C)C)C(=O)N(C)[C@@H](CC(C)C)C(=O)N(C)[C@@H](C(C)C)C(=O)N1C

I terminated the prediction process upon receiving WARNING: RDKit ETKDGv3 failed to generate a conformer for molecule ... after 30 minutes. I assume the process would have failed if it had not been terminated, as in other cases having the same warning after a long time.

I believe that the issue came from the fact that EmbedMolecule in compute_3d_conformer in schema.py could take a very long time presumably for relatively large ligands like the one in my case (for which I have briefly reported here in the RDKit repo). Given this, I have two questions:

  • Is it within the future plan of Boltz-1 to enable SDF inputs? In RoseTTAFold All-Atom, Open Babel is used to process the input SDF file (in parse_mol, see here) and that seems to be a nice approach as well. For my case, the RFAA prediction took 13 minutes to complete successfully for 1c5f. Although Open Babel could also have its own conformer generation issues for some other ligands, the process fails pretty quickly for those cases, which is preferrable for many scenarios.
    - I wonder if compute_3d_conformer is really necessary in parse_boltz_schema. It looks to me that in parse_boltz_schema, compute_3d_conformer is just a check as it only returns a boolean indicating whether a conformer could be generated using EmbedMolecule. However, I don't see EmbedMolecule or compute_3d_conformer used anywhere else in the pipeline, so I wonder if this check is even necessary. Please let me know if I am missing something.

Thanks a lot for your help in advance!

@benf549
Copy link

benf549 commented Jan 31, 2025

I haven't tried this myself at all but you might be able to modify this notebook I wrote to inject arbitrary covalently modified residues into the Boltz CCD to inject a precomputed 3D conformer into the CCD and avoid needing to run the compute_3d_conformer function. Might be a temporary fix for you? +1 to getting an SDF input mode (keeping atom orders consistent would be nice for pipelining.

https://github.com/benf549/boltz-generalized-covalent-modification

@wehs7661
Copy link
Author

wehs7661 commented Feb 9, 2025

Hi @benf549, thanks so much for sharing your code (and apologies for the delayed reply)! I'll take a look at it and post any updates I have.

@wehs7661
Copy link
Author

wehs7661 commented Feb 11, 2025

@benf549 in my case I would like to avoid using EmbedMolecule as it could take a long time for relatively large ligands, but I see your point that cell 8 in your notebook can be modified to handle SDF files, so thanks again!

I noticed that there is actually a PR trying to enable handling SDF files (#40) but it has not been merged. I might play around with it but I wonder if @jwohlwend or the team already have plans to enable SDF inputs.

Regarding my original question about whether compute_3d_conformer is necessary, I missed that mol was actually modified by the function so the original question does not make sense anymore (and I have added a strikethrough). Still, it would be great to have SDF files supported.

About supporting SDF files, here is another relevant issue: #34 .

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants