New molecules returned by enumerator components do not handle molecule properties correctly #98

trevorgokey · 2021-02-04T03:51:50Z

I tried to enumerate protomers and stereoisomers for torsion driving, which returns some new molecules from the toolkit. The issue is that the TK returns e.g. mol.properties['dihedrals'] as a string in the new molecules, which is a serialized TorsionIndexer object in the original molecule. I have a PR here that has a quick fix so I was able to get a dataset submitted in qca-dataset-submissions. A general method which repacks data stored in properties would be ideal, as I know there are other fields ("extras", "keywords"), and possibly more in the future. Feel free to use the PR as you want to devise a solution.

The text was updated successfully, but these errors were encountered:

jthorton · 2021-03-19T19:16:23Z

Thanks for having a go at this. I think your solution will work well as long as the enumerated molecules returned from the toolkit are in the same order, one concern would be what happens when the ordering is changed the torsion indexer could then be on an incorrect torsion, if the torsion was not connected the validators would pick this up when making the dataset but we might need to think more about how we would get around this. I think this might come up when enumerating tautomers but I should check this.

jthorton · 2021-04-28T10:45:35Z

I think we could solve this by using atom maps, these would tag the atoms in the dihedral we want to rotate and should be respected by the backend toolkits and allow use to extract the new index numbers of the dihedral atoms. Here is an example using rdkit and openeye, note that the openff-toolkit=0.9.2 does not automatically pass on the atom maps yet and this would need to be fixed.

rdkit

from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize

mol = Molecule.from_smiles("CCC(C)=O")
# select the 4 carbon atoms 
atom_map = {0: 1, 1: 2, 2: 3, 3: 4}
rdmol = mol.to_rdkit()
# add the atom map
for atom, mapnum in atom_map.items():
    rdatom = rdmol.GetAtomWithIdx(atom)
    rdatom.SetAtomMapNum(mapnum)

enumerator = rdMolStandardize.TautomerEnumerator()
enumerator.SetMaxTautomers(10)
rdmol = Chem.RemoveHs(rdmol)

tautomers = enumerator.Enumerate(rdmol)

# make a list of OpenFF molecules
molecules = []
for taut in tautomers:
    taut_hs = Chem.AddHs(taut)
    mol = Molecule.from_smiles(
        Chem.MolToSmiles(taut_hs), allow_undefined_stereo=True
    )
    molecules.append(mol)
# get the index of the dihedral atoms
molecules[0].properties

{'atom_map': {1: 1, 4: 2, 7: 3, 9: 4}}

openeye

oemol = mol.to_openeye()
# add the atom map
for oeatom in oemol.GetAtoms():
    oe_index = oeatom.GetIdx()
    if oe_index in atom_map:
        oeatom.SetMapIdx(atom_map[oe_index])

tautomer_options = oequacpac.OETautomerOptions()
tautomer_options.SetApplyWarts(False)
tautomer_options.SetMaxTautomersGenerated(10)
tautomer_options.SetSaveStereo(True)
# this aligns the outputs of rdkit and openeye for the example cases
tautomer_options.SetCarbonHybridization(False)

tautomers = []
for tautomer in oequacpac.OEEnumerateTautomers(oemol, tautomer_options):
    tautomers.append(Molecule.from_openeye(tautomer, allow_undefined_stereo=True)

# look at the map ids
tautomers[0].properties

{'atom_map': {0: 1, 1: 2, 2: 3, 3: 4}}

j-wags mentioned this issue May 4, 2021

Problem when input mol has atom map #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New molecules returned by enumerator components do not handle molecule properties correctly #98

New molecules returned by enumerator components do not handle molecule properties correctly #98

trevorgokey commented Feb 4, 2021

jthorton commented Mar 19, 2021

jthorton commented Apr 28, 2021 •

edited

Loading

New molecules returned by enumerator components do not handle molecule properties correctly #98

New molecules returned by enumerator components do not handle molecule properties correctly #98

Comments

trevorgokey commented Feb 4, 2021

jthorton commented Mar 19, 2021

jthorton commented Apr 28, 2021 • edited Loading

jthorton commented Apr 28, 2021 •

edited

Loading