You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks to the team for making this amazing tool publicly available!
As I tested Boltz-1, I noticed that for some cases, the step of "processing input data" could take an unreasonably long time (>30 minutes). One example is protein 1c5f, for which I performed a prediction task with the following input:
I terminated the prediction process upon receiving WARNING: RDKit ETKDGv3 failed to generate a conformer for molecule ... after 30 minutes. I assume the process would have failed if it had not been terminated, as in other cases having the same warning after a long time.
I believe that the issue came from the fact that EmbedMolecule in compute_3d_conformer in schema.py could take a very long time presumably for relatively large ligands like the one in my case (for which I have briefly reported here in the RDKit repo). Given this, I have two questions:
Is it within the future plan of Boltz-1 to enable SDF inputs? In RoseTTAFold All-Atom, Open Babel is used to process the input SDF file (in parse_mol, see here) and that seems to be a nice approach as well. For my case, the RFAA prediction took 13 minutes to complete successfully for 1c5f. Although Open Babel could also have its own conformer generation issues for some other ligands, the process fails pretty quickly for those cases, which is preferrable for many scenarios. - I wonder if compute_3d_conformer is really necessary in parse_boltz_schema. It looks to me that in parse_boltz_schema, compute_3d_conformer is just a check as it only returns a boolean indicating whether a conformer could be generated using EmbedMolecule. However, I don't see EmbedMolecule or compute_3d_conformer used anywhere else in the pipeline, so I wonder if this check is even necessary. Please let me know if I am missing something.
Thanks a lot for your help in advance!
The text was updated successfully, but these errors were encountered:
I haven't tried this myself at all but you might be able to modify this notebook I wrote to inject arbitrary covalently modified residues into the Boltz CCD to inject a precomputed 3D conformer into the CCD and avoid needing to run the compute_3d_conformer function. Might be a temporary fix for you? +1 to getting an SDF input mode (keeping atom orders consistent would be nice for pipelining.
@benf549 in my case I would like to avoid using EmbedMolecule as it could take a long time for relatively large ligands, but I see your point that cell 8 in your notebook can be modified to handle SDF files, so thanks again!
I noticed that there is actually a PR trying to enable handling SDF files (#40) but it has not been merged. I might play around with it but I wonder if @jwohlwend or the team already have plans to enable SDF inputs.
Regarding my original question about whether compute_3d_conformer is necessary, I missed that mol was actually modified by the function so the original question does not make sense anymore (and I have added a strikethrough). Still, it would be great to have SDF files supported.
About supporting SDF files, here is another relevant issue: #34 .
Thanks to the team for making this amazing tool publicly available!
As I tested Boltz-1, I noticed that for some cases, the step of "processing input data" could take an unreasonably long time (>30 minutes). One example is protein
1c5f
, for which I performed a prediction task with the following input:I terminated the prediction process upon receiving
WARNING: RDKit ETKDGv3 failed to generate a conformer for molecule ...
after 30 minutes. I assume the process would have failed if it had not been terminated, as in other cases having the same warning after a long time.I believe that the issue came from the fact that
EmbedMolecule
incompute_3d_conformer
inschema.py
could take a very long time presumably for relatively large ligands like the one in my case (for which I have briefly reported here in the RDKit repo). Given this, I have two questions:parse_mol
, see here) and that seems to be a nice approach as well. For my case, the RFAA prediction took 13 minutes to complete successfully for1c5f
. Although Open Babel could also have its own conformer generation issues for some other ligands, the process fails pretty quickly for those cases, which is preferrable for many scenarios.- I wonder ifcompute_3d_conformer
is really necessary inparse_boltz_schema
. It looks to me that inparse_boltz_schema
,compute_3d_conformer
is just a check as it only returns a boolean indicating whether a conformer could be generated usingEmbedMolecule
. However, I don't seeEmbedMolecule
orcompute_3d_conformer
used anywhere else in the pipeline, so I wonder if this check is even necessary. Please let me know if I am missing something.Thanks a lot for your help in advance!
The text was updated successfully, but these errors were encountered: