-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing does not appear to work for inferring valence & formal charge states from molecules from some PDB files #231
Comments
Hey @Croydon-Brixton, thanks for using datamol. Your specific question is a bit tricky. There are 2 reasons why the "fixing" fail on your molecule:
In other words, the issue is because of the aromaticity perception of RDKit and kekulization failing.
Therefore the algorithm skip over the atom, since everything looks fine, this is a direct consequence of each "aromatic" bond being perceived as a single connection at this point. A naive fix for your specific case would be:
I doubt however that this solves the main issue. If you can share your goal and how you load the original structure, I am sure there are better and more systematic approaches (including perhaps not using RDKit here) that I can point you towards. |
Thank you for the quick and detailed answer @maclandrol ! Yes, I'm looking for a solution for this general problem:
The reason I am asking is that when retrieving ligands from the PDB the most reliable bits of information are the bonded structures of heavy atoms and the hybridization (which translates into the single/double/... flags), but formal charge is often entirely unspecified. I would like to have a way to turn these molecules into valid ones while preserving this information. Does that make sense? I would need to do this programatically, as it will apply to many structures. I had a look at ChEMBL's pipeline, but this was not able to do the above task either for the example I gave. Thank you for your input! |
Yeah, the SMARTS patterns they have does not cover your case:
If you are loading from PDB and PDB only, then you should consider this:
Alternatively,
This would probably be something useful for the community so I can definitely help implement this. |
Thank you for this nice library!
I'm have a question re fixing 'broken' Mols by inferring the correct valences and charges that I was hoping
datamol
could fix for me.If I load NAP structures from examples in the pdb (e.g.
5ocm
) and simply transfer over bond annotations and atoms (formal charge is not specified in this PDB, so I'm assuming 0 charge) I end up with a structure like this:RDkit then fails to load this due to sanitization problems
This molecule can be 'rescued' by assigning a positive charge to nitrogen number 4, but the
datamol
pipeline unfortunately fails to do this:Is there a way to fix this structure computationally with datamol?
The text was updated successfully, but these errors were encountered: