`compute_3d_conformer` takes a long time to run - Would it be reasonable to bypass it? #171

wehs7661 · 2025-01-31T16:39:04Z

Thanks to the team for making this amazing tool publicly available!

As I tested Boltz-1, I noticed that for some cases, the step of "processing input data" could take an unreasonably long time (>30 minutes). One example is protein 1c5f, for which I performed a prediction task with the following input:

>C|protein
MSKKDRRRVFLDVTIDGNLAGRIVMELYNDIAPRTCNNFLMLCTGMAGTGKISGKPLHYKGSTFHRVIKNFMIQGGDFTKGDGTGGESIYGGMFDDEEFVMKHDEPFVVSMANKGPNTNGSQFFITTTPAPHLNNIHVVFGKVVSGQEVVTKIEYLKTNSKNRPLADVVILNCGELV
>E|protein
MSKKDRRRVFLDVTIDGNLAGRIVMELYNDIAPRTCNNFLMLCTGMAGTGKISGKPLHYKGSTFHRVIKNFMIQGGDFTKGDGTGGESIYGGMFDDEEFVMKHDEPFVVSMANKGPNTNGSQFFITTTPAPHLNNIHVVFGKVVSGQEVVTKIEYLKTNSKNRPLADVVILNCGELV
>F|smiles
C/C=C/C[C@@H](C)[C@@H](O)[C@H]1C(=O)N[C@@H](CC)C(=O)N(C)CC(=O)N(C)[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(=O)N(C)[C@@H](CC(C)C)C(=O)N[C@@H](C)C(=O)N[C@H](C)C(=O)N(C)[C@@H](CC(C)C)C(=O)N(C)[C@@H](CC(C)C)C(=O)N(C)[C@@H](C(C)C)C(=O)N1C

I terminated the prediction process upon receiving WARNING: RDKit ETKDGv3 failed to generate a conformer for molecule ... after 30 minutes. I assume the process would have failed if it had not been terminated, as in other cases having the same warning after a long time.

I believe that the issue came from the fact that EmbedMolecule in compute_3d_conformer in schema.py could take a very long time presumably for relatively large ligands like the one in my case (for which I have briefly reported here in the RDKit repo). Given this, I have two questions:

Is it within the future plan of Boltz-1 to enable SDF inputs? In RoseTTAFold All-Atom, Open Babel is used to process the input SDF file (in parse_mol, see here) and that seems to be a nice approach as well. For my case, the RFAA prediction took 13 minutes to complete successfully for 1c5f. Although Open Babel could also have its own conformer generation issues for some other ligands, the process fails pretty quickly for those cases, which is preferrable for many scenarios.
- I wonder if compute_3d_conformer is really necessary in parse_boltz_schema. It looks to me that in parse_boltz_schema, compute_3d_conformer is just a check as it only returns a boolean indicating whether a conformer could be generated using EmbedMolecule. However, I don't see EmbedMolecule or compute_3d_conformer used anywhere else in the pipeline, so I wonder if this check is even necessary. Please let me know if I am missing something.

Thanks a lot for your help in advance!

The text was updated successfully, but these errors were encountered:

benf549 · 2025-01-31T17:52:46Z

I haven't tried this myself at all but you might be able to modify this notebook I wrote to inject arbitrary covalently modified residues into the Boltz CCD to inject a precomputed 3D conformer into the CCD and avoid needing to run the compute_3d_conformer function. Might be a temporary fix for you? +1 to getting an SDF input mode (keeping atom orders consistent would be nice for pipelining.

https://github.com/benf549/boltz-generalized-covalent-modification

wehs7661 · 2025-02-09T00:49:52Z

Hi @benf549, thanks so much for sharing your code (and apologies for the delayed reply)! I'll take a look at it and post any updates I have.

wehs7661 · 2025-02-11T12:41:09Z

@benf549 in my case I would like to avoid using EmbedMolecule as it could take a long time for relatively large ligands, but I see your point that cell 8 in your notebook can be modified to handle SDF files, so thanks again!

I noticed that there is actually a PR trying to enable handling SDF files (#40) but it has not been merged. I might play around with it but I wonder if @jwohlwend or the team already have plans to enable SDF inputs.

Regarding my original question about whether compute_3d_conformer is necessary, I missed that mol was actually modified by the function so the original question does not make sense anymore (and I have added a strikethrough). Still, it would be great to have SDF files supported.

About supporting SDF files, here is another relevant issue: #34 .

Thanks so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`compute_3d_conformer` takes a long time to run - Would it be reasonable to bypass it? #171

`compute_3d_conformer` takes a long time to run - Would it be reasonable to bypass it? #171

wehs7661 commented Jan 31, 2025 •

edited

Loading

benf549 commented Jan 31, 2025

wehs7661 commented Feb 9, 2025

wehs7661 commented Feb 11, 2025 •

edited

Loading

compute_3d_conformer takes a long time to run - Would it be reasonable to bypass it? #171

compute_3d_conformer takes a long time to run - Would it be reasonable to bypass it? #171

Comments

wehs7661 commented Jan 31, 2025 • edited Loading

benf549 commented Jan 31, 2025

wehs7661 commented Feb 9, 2025

wehs7661 commented Feb 11, 2025 • edited Loading

`compute_3d_conformer` takes a long time to run - Would it be reasonable to bypass it? #171

`compute_3d_conformer` takes a long time to run - Would it be reasonable to bypass it? #171

wehs7661 commented Jan 31, 2025 •

edited

Loading

wehs7661 commented Feb 11, 2025 •

edited

Loading