-
Notifications
You must be signed in to change notification settings - Fork 42
Work done by each step
Greg Landrum edited this page Apr 17, 2019
·
14 revisions
- Clears S Group data from the mol file
- Kekulise structures
- Identifies and fixes bad valence (where possible)
- Fix KO to K+ O- and NaO to Na+ O- (Also add Li+ to this)
- Standardise NO2 groups to N+[O-]
- Change NH+ Cl- to N and HCl
- Remove stereo from tartrate to simplify salt matching
- Fix wiggly bonds on sp3 carbons - sets atoms and bonds marked as unknown stereo to no stereo
- Remove explicit hydrogens from molecules (excepting certain atom types)
- Normalise (straighten) triple bonds and allenes
- Standardise sulphoxides to charge separated form (need to take care this is just sulphoxides and not sulphonamides or sulphones)
- “Fix wiggly bonds” on double bonds – set to crossed bond
- If formal charge on molecule is not zero then protonate and/or deprotonate to neutralise molecule (accepting that for some examples of compounds with quaternary nitrogens and multiple carboxyl, sulphates, phosphates this might not be possible).
- Correct amides with N=COH
- Correct bond angles where possible
- Handle unknown stereochemistry. Handled by the Mol file parser.
- Fix wiggly bonds on sp3 carbons - sets atoms and bonds marked as unknown stereo to no stereo
- Fix wiggly bonds on double bonds – set double bond to crossed bond
- Clears S Group data from the mol file
- Kekulize the structure
- Remove non-chiral H atoms (Hs that don't have wedged bonds to them in the mol block)
- Normalization:
- Fix hypervalent nitro groups
- Fix KO to K+ O- and NaO to Na+ O- (Also add Li+ to this)
- Correct amides with N=COH
- Standardise sulphoxides to charge separated form
- Standardize diazonium N (atom
:2here:[*:1]-[N;X2:2]#[N;X1:3]>>[*:1]) to N+ - Ensure quaternary N is charged
- Ensure trivalent O (
[*:1]=[O;X2;v3;+0:2]-[#6:3]) is charged - Ensure trivalent S (
[O:1]=[S;D2;+0:2]-[#6:3]) is charged - Ensure halogen with no neighbors (
[F,Cl,Br,I;X0;+0:1]) is charged
- If formal charge on molecule is not zero then protonate and/or deprotonate to neutralise molecule (accepting that for some examples of compounds with quaternary nitrogens and multiple carboxyl, sulphates, phosphates this might not be possible).
- Remove stereo from tartrate to simplify salt matching
- Normalise (straighten) triple bonds and allenes
Add chiral Hs
- Identify salts or isotopes for processing (currently this is done from the inchi e.g. inchi like '%.%' or inchi like '%/i%’ but can be achieved in other ways).
- Identify metal containing compounds for which structures are to be excluded from ChEMBL (using a metal list and a set of rules – details to be supplied). For these molecules they do not need to be salt stripped. However, from these compounds water molecules and isotopes are removed to make a parent so they can be grouped as parents and salts
- For all other compounds salt stripping is performed to form a parent molecule
- Salts removed according to the CHEMBL salt list
- Solvents are then removed according to the solvents list
- Isotopes are removed
- If no salts are removed the molecule is defined as a mixture but isotopes are then removed
- Check for “empty" salts i.e. both components are salts. If this is true the salt is recreated but any isotopes or solvent are removed
- Formal charges before and after salt stripping are checked and parent neutralised if necessary. This will deal with carboxylic acid salts and HCl salts where the molecules are drawn as XNH+ and Cl- rather than XN and HCl
- Allows the option to add specific compounds that are flagged to be salt stripped even though they fit the criteria for being excluded from the process. Currently one specific example is ranitidine bismuth citrate (CHEMBL2111286).
- Number of atoms <1 i.e. empty CTAB
- Polymer
- V3000 mol file
- 3D coordinates in mol file
- Illegal bond type
- Illegal bond stereo
- Multiple stereobonds on stereoatom
- Overlapping atoms (atoms with identical coordinates)
- Zero coordinates (all atoms have zero coordinates) - can happen when mol file created from smiles stereo bond in ring
- Stereobond in ring
- Stereobond between stereo centres
- Crossed bonds in ring
- Radicals that don’t fit known stable radical patterns (allowed are 'Nitric Oxide, Aminoxyl’)
- StereoCentersMOL/INCHI/CSMILES mismatch
- StereoCentersMOL_CSMILES/INCHI mismatch
- StereoCentersMOL_INCHI/CSMILES mismatch
- StereoCentersINCHI_CSMILES/MOL mismatch
- InChI warning:Unknown element(s)
- InChI warning:Bond to nonexistent atom
- InChI warning:Multiple bonds between two atoms
- InChI warning:Atom has more than 3 aromatic bonds
- InChI warning:Too many atoms
- InChI warning:Atom X has more than 20 bonds
- InChI warning:Accepted unusual valence(s)
- InChI warning:Empty structure
- InChI warning:All other warnings
- Illegal input
- Number of atoms <1 i.e. empty CTAB
- Polymer
- V3000 mol file
- 3D coordinates in mol file
- Illegal bond type
- Illegal bond stereo
- Multiple stereobonds on stereoatom
- Overlapping atoms (atoms with identical coordinates)
- Zero coordinates (all atoms have zero coordinates) - can happen when mol file created from smiles stereo bond in ring
- Stereobond in ring
- Stereobond between stereo centres
- Crossed bonds in ring
- Radicals that don’t fit known stable radical patterns (allowed are 'Nitric Oxide, Aminoxyl’)
- StereoCentersMOL/INCHI/CSMILES mismatch
- StereoCentersMOL_CSMILES/INCHI mismatch
- StereoCentersMOL_INCHI/CSMILES mismatch
- StereoCentersINCHI_CSMILES/MOL mismatch
- InChI warning:Accepted unusual valence(s)
- InChI warning:Empty structure
- Any other InChI error
- InChI warning:All other warnings
- Illegal input