Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New molecules returned by enumerator components do not handle molecule properties correctly #98

Open
trevorgokey opened this issue Feb 4, 2021 · 2 comments

Comments

@trevorgokey
Copy link
Contributor

I tried to enumerate protomers and stereoisomers for torsion driving, which returns some new molecules from the toolkit. The issue is that the TK returns e.g. mol.properties['dihedrals'] as a string in the new molecules, which is a serialized TorsionIndexer object in the original molecule. I have a PR here that has a quick fix so I was able to get a dataset submitted in qca-dataset-submissions. A general method which repacks data stored in properties would be ideal, as I know there are other fields ("extras", "keywords"), and possibly more in the future. Feel free to use the PR as you want to devise a solution.

@jthorton
Copy link
Contributor

Thanks for having a go at this. I think your solution will work well as long as the enumerated molecules returned from the toolkit are in the same order, one concern would be what happens when the ordering is changed the torsion indexer could then be on an incorrect torsion, if the torsion was not connected the validators would pick this up when making the dataset but we might need to think more about how we would get around this. I think this might come up when enumerating tautomers but I should check this.

@jthorton
Copy link
Contributor

jthorton commented Apr 28, 2021

I think we could solve this by using atom maps, these would tag the atoms in the dihedral we want to rotate and should be respected by the backend toolkits and allow use to extract the new index numbers of the dihedral atoms. Here is an example using rdkit and openeye, note that the openff-toolkit=0.9.2 does not automatically pass on the atom maps yet and this would need to be fixed.

rdkit

from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize

mol = Molecule.from_smiles("CCC(C)=O")
# select the 4 carbon atoms 
atom_map = {0: 1, 1: 2, 2: 3, 3: 4}
rdmol = mol.to_rdkit()
# add the atom map
for atom, mapnum in atom_map.items():
    rdatom = rdmol.GetAtomWithIdx(atom)
    rdatom.SetAtomMapNum(mapnum)

enumerator = rdMolStandardize.TautomerEnumerator()
enumerator.SetMaxTautomers(10)
rdmol = Chem.RemoveHs(rdmol)

tautomers = enumerator.Enumerate(rdmol)

# make a list of OpenFF molecules
molecules = []
for taut in tautomers:
    taut_hs = Chem.AddHs(taut)
    mol = Molecule.from_smiles(
        Chem.MolToSmiles(taut_hs), allow_undefined_stereo=True
    )
    molecules.append(mol)
# get the index of the dihedral atoms
molecules[0].properties

{'atom_map': {1: 1, 4: 2, 7: 3, 9: 4}}

openeye

oemol = mol.to_openeye()
# add the atom map
for oeatom in oemol.GetAtoms():
    oe_index = oeatom.GetIdx()
    if oe_index in atom_map:
        oeatom.SetMapIdx(atom_map[oe_index])

tautomer_options = oequacpac.OETautomerOptions()
tautomer_options.SetApplyWarts(False)
tautomer_options.SetMaxTautomersGenerated(10)
tautomer_options.SetSaveStereo(True)
# this aligns the outputs of rdkit and openeye for the example cases
tautomer_options.SetCarbonHybridization(False)

tautomers = []
for tautomer in oequacpac.OEEnumerateTautomers(oemol, tautomer_options):
    tautomers.append(Molecule.from_openeye(tautomer, allow_undefined_stereo=True)

# look at the map ids
tautomers[0].properties

{'atom_map': {0: 1, 1: 2, 2: 3, 3: 4}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants