Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log charge assignment #1053

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from
Open

Log charge assignment #1053

wants to merge 10 commits into from

Conversation

mattwthompson
Copy link
Member

@mattwthompson mattwthompson commented Sep 19, 2024

Description

Resolves #1048

Checklist

  • Log, for most cases, which atom (with some topology index) got what charge assigned by which method
    • Library charges
    • Preset charges
    • ChargeIncrementHandler
    • AM1-BCC
    • Virtual sites
  • Add tests
    • Sage with a ligand in vacuum
    • Sage with a ligand in water/ions
      • Water gets library charges
      • Ions get library charges
      • Non-water solvent gets AM1-BCC
      • Ligands get AM1-BCC
    • Sage with a ligand in non-water solvent
      • Non-water solvent gets AM1-BCC
      • Ligands get AM1-BCC
    • Sage with a ligand in mixed solvent
      • Water gets library charges
      • Ions get library charges
      • Non-water solvent gets AM1-BCC
      • Ligands get AM1-BCC
    • ff14SB with Sage
      • Protein gets library charges
      • Water gets library charges
      • Ions get library charges
      • Non-water solvent gets AM1-BCC
      • Ligands get AM1-BCC
    • ff14SB with Sage and preset charges on Protein A
      • Protein A gets preset charges
      • Other proteins get library charges
      • Water gets library charges
      • Ions get library charges
      • Non-water solvent gets AM1-BCC
      • Ligands get AM1-BCC
    • Sage with ligand and OPC water
      • Water gets library charges
      • Water virtual sites get ??
      • Ions get library charges
      • Non-water solvent gets AM1-BCC
      • Ligands get AM1-BCC
    • Sage with preset charges on ligand A
      • Water gets library charges
      • Ions get library charges
      • Non-water solvent gets AM1-BCC
      • Ligand A gets preset charges
      • Other ligands get AM1-BCC
    • Sage with preset charges on water
      • Water gets preset charges
      • Ions get library charges
      • Non-water solvent gets AM1-BCC
      • Ligands get AM1-BCC
    • Sage with (ligand) virtual site parameters
      • Water gets preset charges
      • Ions get library charges
      • Non-water solvent gets AM1-BCC
      • Ligand heavy atoms get AM1-BCC and charge increments
      • Virtual sites get charge increments
    • AM1-with-custom-BCCs Sage with ligand and ions water
      • Water gets library charges
      • Ions get library charges
      • Ligand gets charge increments
    • Capture specifics of charge method (AM1-BCC, AM1-BCC-ELF10, AM1-BCC via NAGL)
    • Slightly non-isomorphic molecules with one having preset charges
    • Preset charges and virtual sites is underspecified and cannot be tested (yet)
    • Run some test cases with shuffled topologies
    • Report all atom indices, not just unique molecules
  • Lint
  • Update docstrings No particular docstring to update, I don't think?
  • Add section in user guide
  • Update release history

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 90.47619% with 2 lines in your changes missing coverage. Please review.

Project coverage is 93.51%. Comparing base (292faec) to head (a3679db).

Additional details and impacted files

@mattwthompson mattwthompson changed the title ENH: Initial implementation of charge assignment logging Log charge assignment Sep 20, 2024
@mattwthompson
Copy link
Member Author

Here's what this looks like for a set of fairly simple cases:

import logging

from openff.toolkit import ForceField, Molecule, Topology

logging.basicConfig(level=logging.INFO)

ff = ForceField("openff-2.0.0.offxml")

water = Molecule.from_smiles("O")
ligand = Molecule.from_smiles("CC")
ligand.assign_partial_charges("am1bcc")

ff.create_interchange(Topology.from_molecules([water, water]))
ff.create_interchange(ligand.to_topology())
ff.create_interchange(ligand.to_topology(), charge_from_molecules=[ligand])

ForceField("opc.offxml").create_openmm_system(Molecule.from_smiles("O").to_topology())
$ python run.py
INFO:openff.toolkit.typing.engines.smirnoff.parameters:Attempting to up-convert vdW section from 0.3 to 0.4
INFO:openff.toolkit.typing.engines.smirnoff.parameters:Successfully up-converted vdW section from 0.3 to 0.4. `method="cutoff"` is now split into `periodic_method="cutoff"` and `nonperiodic_method="no-cutoff"`.
INFO:openff.toolkit.typing.engines.smirnoff.parameters:Attempting to up-convert Electrostatics section from 0.3 to 0.4
INFO:openff.toolkit.typing.engines.smirnoff.parameters:Successfully up-converted Electrostatics section from 0.3 to 0.4. `method="PME"` is now split into `periodic_potential="Ewald3D-ConductingBoundary"`, `nonperiodic_potential="Coulomb"`, and `exception_potential="Coulomb"`.
/Users/mattthompson/micromamba/envs/new-models/lib/python3.11/site-packages/mdtraj/formats/__init__.py:13: DeprecationWarning: 'xdrlib' is deprecated and slated for removal in Python 3.13
  from mdtraj.formats.trr import TRRTrajectoryFile
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 0
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 3
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 1
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 4
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 2
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 5
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 0
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 1
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 2
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 3
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 4
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 5
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 6
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 7
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 0
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 1
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 2
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 3
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 4
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 5
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 6
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 7
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 0
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 1
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 2
INFO:openff.interchange.smirnoff._virtual_sites:Charge section VirtualSites applied to virtual site with orientation atoms (0, 1, 2)

@mattwthompson mattwthompson marked this pull request as ready for review September 20, 2024 20:38
@mattwthompson mattwthompson linked an issue Sep 23, 2024 that may be closed by this pull request
@mattwthompson mattwthompson linked an issue Sep 23, 2024 that may be closed by this pull request
@IAlibay
Copy link

IAlibay commented Sep 24, 2024

Sorry only very briefly looking at this, so please ignore me if I'm missing something important.

Report all atom indices, not just unique molecules

This seems like it would be incredibly overwhelming to users, indeed a solvated protein-ligand complex could easily be incomprehensible.

Is there no way to:

  1. Condense this to each unique molecule
  2. Condense the per atom report of each molecule to just a summary (N atoms got this, N atoms got this, etc...)
  3. If necessary, offer a flag for folks to turn on to get a per-atom breakdown?

@@ -879,6 +897,42 @@ def store_matches(

# Have this new key (on a duplicate molecule) point to the same potential
# as the old key (on a unique/reference molecule)
if type(new_key) is LibraryChargeTopologyKey:
logger.info(
"Charge section LibraryCharges applied to (topology) atom index "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Charge section LibraryCharges applied to (topology) atom index "
"Charge section LibraryCharges applied to topology atom index "

I initially read "(topology)" as standing in for some missing data, but I think it's here to disambiguate which atom index is meant? In which case I think it's essential information and doesn't need to be parenthesized

Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty great!

>>> import logging
... 
... from openff.toolkit import ForceField, Molecule, Topology
... 
... logging.basicConfig(level=logging.INFO)
>>> mol1 = Molecule.from_smiles("O")
>>> mol2 = Molecule.from_smiles("CCO")
>>> mol3 = Molecule.from_smiles("CC(=O)O")
>>> mol3.assign_partial_charges("am1bcc")
>>> top = Topology.from_molecules([mol1, mol2, mol3])
>>> ForceField("openff-2.2.1.offxml").create_interchange(top, charge_from_molecules=[mol3])
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 0
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 1
INFO:openff.interchange.smirnoff._nonbonded:Charge section LibraryCharges applied to (topology) atom index 2
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 3
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 4
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 5
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 6
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 7
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 8
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 9
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 10
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bcc, applied to (topology) atom index 11
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 12
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 13
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 14
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 15
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 16
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 17
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 18
INFO:openff.interchange.smirnoff._nonbonded:Preset charges applied to atom index 19
Interchange with 7 collections, non-periodic topology with 20 atoms.

I think you've got the amount of information in the log right - the atom is uniquely identified in the fewest number of characters possible. I can imagine condensing this down by molecule would add a lot of complexity, and the point of doing logging is to avoid that complexity. Also, a per-atom approach allows charge increments to be logged on an atom-by-atom basis, which more closely matches how the force fields work. In any case, with the highly formulaic messages users should be able to write logging handlers to get more info out if they want:

class MoleculeFormatter(logging.Formatter):
    def __init__(self, topology, *args, **kwargs):
        self.topology = topology
        self.last_mol = None
        return super().__init__(*args, **kwargs)

    def format(self, *args, **kwargs):
        string = super().format(*args, **kwargs)

        _, _, atom_index = string.rpartition(" ")
        atom_index = int(atom_index)
        atom = self.topology.atom(atom_index)
        if self.last_mol != atom.molecule:
            self.last_mol = atom.molecule
            string += f" from {atom.molecule}"
        
        return string
        
handler = logging.StreamHandler()
handler.formatter = MoleculeFormatter(top)
logging.basicConfig(level=logging.INFO, handlers=[handler])

My notes are mostly just to enhance the consistency and clarity of the logs - in particular I think we should specify that each of the numbers is a topology atom index, and that its clearer what that means if "topology" is not parenthesized.

One oversight seems to be charge increments from virtual sites - they do not seem to be reflected in the logs:

import logging
import sys

from openff.toolkit import ForceField, Molecule, Topology

logging.basicConfig(level=logging.INFO)

def sage_with_bond_charge(sage):
    from openff.toolkit.typing.engines.smirnoff.parameters import (
        BondType,
        ChargeIncrementModelHandler,
        VirtualSiteType,
    )
    
    sage["Bonds"].add_parameter(
        parameter=BondType(
            smirks="[#6:2]-[#17X1:1]",
            id="b0",
            length="1.6 * angstrom",
            k="500 * angstrom**-2 * mole**-1 * kilocalorie",
        ),
    )

    sage.get_parameter_handler("VirtualSites")
    sage["VirtualSites"].add_parameter(
        parameter=VirtualSiteType(
            smirks="[#6:2]-[#17X1:1]",
            type="BondCharge",
            match="all_permutations",
            distance="0.8 * angstrom ** 1",
            charge_increment1="0.2 * elementary_charge ** 1",
            charge_increment2="0.1 * elementary_charge ** 1",
        ),
    )

    return sage
    
sage_with_bond_charge(ForceField("openff-2.2.1.offxml")).create_interchange(Molecule.from_smiles("CCl").to_topology())
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 0
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 1
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 2
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 3
INFO:openff.interchange.smirnoff._nonbonded:Charge section ToolkitAM1BCC, using charge method am1bccelf10, applied to (topology) atom index 4
INFO:openff.interchange.smirnoff._virtual_sites:Charge section VirtualSites applied to virtual site with orientation atoms (1, 0)

And they aren't mentioned in the hierarchy docs.

The only other thing I can think of that might be useful is to actually log the charge. This might be useful with virtual site charge increments so that users can track how the charge changes as increments are applied - but I don't think its essential and it definitely adds noise.

logger.info(
"Charge section ToolkitAM1BCC, using charge method "
f"{new_key.extras['partial_charge_method']}, "
f"applied to (topology) atom index {topology_atom_index}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"applied to (topology) atom index {topology_atom_index}",
f"applied to topology atom index {topology_atom_index}",

Ditto above.


elif new_key.extras["handler"] == "preset":
logger.info(
f"Preset charges applied to atom index {topology_atom_index}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Preset charges applied to atom index {topology_atom_index}",
f"Preset charges applied to topology atom index {topology_atom_index}",

I think the fact that its a topology atom index is important to clarify.

Optionally, refactoring all these log messages into a single string that gets format()ed in each branch to ensure consistency might be worth doing - people will probably end up processing logs programmatically and having a single regex match everything would be great.

I'm also not 100% sure about "preset", I'm not sure it makes the connection to charge_from_molecules clear. "charge_from_molecule charges applied" might be clearer for a first time user, though it is a little clunky.

logger.info(
"Charge section ChargeIncrementModel, using charge method "
f"{new_key.partial_charge_method}, "
f"applied to (topology) atom index {new_key.this_atom_index}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"applied to (topology) atom index {new_key.this_atom_index}",
f"applied to topology atom index {new_key.this_atom_index}",

See above.

1. Look for preset charges from the `charge_from_molecules` argument
1. Look for chemical environment matches within the `<LibraryCharges>` section
1. Look for chemical environment matches within the `<ChargeIncrementModel>` section
1. Try to run AM1-BCC or some variant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Try to run AM1-BCC or some variant
1. Try to run AM1-BCC according to the `<ToolkitAM1BCC>` section or some variant

Just to clarify that this is still following the force field

1. Look for chemical environment matches within the `<ChargeIncrementModel>` section
1. Try to run AM1-BCC or some variant

If a molecule gets charges from one method, attempts to match charges for later methods are skipped. For more on how SMIRNOFF defines this behavior, see [this issue](https://github.com/openforcefield/standards/issues/68) and linked discussions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a molecule gets charges from one method, attempts to match charges for later methods are skipped. For more on how SMIRNOFF defines this behavior, see [this issue](https://github.com/openforcefield/standards/issues/68) and linked discussions.
If a molecule gets charges from one method, attempts to match charges for later methods are skipped. Note that preset charges override the force field and are not checked for consistency; any charges provided to the `charge_from_molecules` argument technically modify the force field. For more on how SMIRNOFF defines this behavior, see [this issue](https://github.com/openforcefield/standards/issues/68) and linked discussions.

I think clarifying that this is an escape hatch is useful.


6. Sage with ligand and OPC water
* Water gets library charges
* Water virtual sites get ??
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be clarified?

Comment on lines +7 to +10
1. Look for preset charges from the `charge_from_molecules` argument
1. Look for chemical environment matches within the `<LibraryCharges>` section
1. Look for chemical environment matches within the `<ChargeIncrementModel>` section
1. Try to run AM1-BCC or some variant
Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do virtual site charge increments come in here?

@@ -328,6 +331,7 @@ def _get_charges(

total_charge: Quantity = numpy.sum(parameter_value)
# assumes virtual sites can only have charges determined in one step

charges[topology_key] = -1.0 * total_charge

# Apply increments to "orientation" atoms
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to log this? Since it will change the charges coming out. I can imagine being very confused that the charges are completely different to what the logs say they should be because they were modified by a charge increment from a virtual site.

@@ -178,6 +181,12 @@ def store_potentials( # type: ignore[override]
),
},
)

logger.info(
"Charge section VirtualSites applied to virtual site with orientation atoms "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Charge section VirtualSites applied to virtual site with orientation atoms "
"Charge section VirtualSites applied to virtual site with orientation atoms at topology indices "

I'm not sure this is a huge improvement but I feel like clarifying what the atom numbers are is helpful.

@Yoshanuikabundi
Copy link
Collaborator

Welp of course as soon as I submit the review I realise I misunderstood how ChargeIncrementModel worked. I've gone back and editing things to hide my shame but if you run into something weird asking how charge increments are applied (outside the context of vsites)... just ignore it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Improve logging / user experience for charge assignment
3 participants