Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle some edge cases with preset charges #1070

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

mattwthompson
Copy link
Member

@mattwthompson mattwthompson commented Oct 3, 2024

Description

This PR better handles user inputs around the charge_from_molecules argument from the toolkit. Resolves #1057 #1058 #1059

Checklist

  • Error if charge_from_molecules contains any molecule without partial charges
  • Error if charge_from_molecules contains any duplicate molecules (defined by isomorphism, approximated by SMILES without hydrogens)
  • Test behavior discussed in Partial charge assignment using charge_from_molecules on virtual sites / charge increments can be unexpected. #1050
  • Add tests
  • Lint
  • Update docstrings
    • Update from_smirnoff docstring
    • Add user guide
      • Describe charge assignment hierarchy
      • Describe limitations with preset charges
      • Add FAQ entry I guess we don't have an FAQ here, maybe should check openff-docs

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Oct 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.61%. Comparing base (bdd2177) to head (25b144e).
Report is 8 commits behind head on develop.

Additional details and impacted files

@mattwthompson mattwthompson marked this pull request as ready for review October 4, 2024 20:08
Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great improvements I think. I think the documentation changes from #1048 could be merged with this so that the charge hierarchy is only documented once, which might improve findability - I think if there is similar documentation in two places, people sometimes read one place carefully, then later find the second place and think they're already familiar with it coz it looks the same.

My notes are mostly just adding more detail to what you've written, but this is great!

Comment on lines +9 to +12
1. **Preset charges**: Look for molecule matches in the `charge_from_molecules` argument
2. **Library charges**: Look for chemical environment matches in library charges
3. **Charge increments**: Look for chemical environment matches in charge increments
4. **AM1-BCC**: Try to run some variant of AM1-BCC (presumably this is were graph charges go)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Preset charges**: Look for molecule matches in the `charge_from_molecules` argument
2. **Library charges**: Look for chemical environment matches in library charges
3. **Charge increments**: Look for chemical environment matches in charge increments
4. **AM1-BCC**: Try to run some variant of AM1-BCC (presumably this is were graph charges go)
1. **Preset charges**: Look for molecule matches in the `charge_from_molecules` argument
2. **Library charges**: Look for chemical environment matches in the `<LibraryCharges>` section of the force field
3. **Charge increment models**: Look for chemical environment matches in the `<ChargeIncrementModel>` section of the force field
4. **AM1-BCC**: Try to run some variant of AM1-BCC (presumably this is were graph charges go) as described by the `<ToolkitAM1BCC>` section of the force field

The spec uses the same language for ChargeIncrementModels and the charge_increments that are applied to move charges to virtual sites, so I think it's important to keep this distinction very clear. I also think it's clearer to specify the section of the force field that is being applied, instead of just repeating the name.


### Preset charges

The following restrictions are in place when using preset charges:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following restrictions are in place when using preset charges:
The charges specified by the force field can be overridden by providing molecules with partial charges to the `charge_from_molecules` argument. This may be used to make use of alternate implementations of the appropriate charge generation method, or to provide different charges to the force field. Charges provided via `charge_from_molecules` are called "preset charges" because they are pre-set by the user, rather than computed by the force field. The following restrictions are in place when using preset charges:

Just think this could use a little extra context.

Comment on lines +27 to +28
* All molecules in the the `charge_from_molecules` list must be non-isomorphic with each other.
* All molecules in the the `charge_from_molecules` list must have partial charges.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* All molecules in the the `charge_from_molecules` list must be non-isomorphic with each other.
* All molecules in the the `charge_from_molecules` list must have partial charges.
* All molecules in the the `charge_from_molecules` list must be non-isomorphic with each other.
* All molecules in the the `charge_from_molecules` list must have partial charges.
* All copies of a molecule in the topology will be parametrized with the charges from an isomorphic molecule from the `charge_from_molecules` list.

I think this one-to-many relationship is worth making explicit.

Comment on lines 111 to +114
If specified, partial charges will be taken from the given molecules
instead of being determined by the force field.
instead of being determined by the force field. All molecules in this list
must have partial charges assigned and must not be isomorphic with any other
molecules in the list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If specified, partial charges will be taken from the given molecules
instead of being determined by the force field.
instead of being determined by the force field. All molecules in this list
must have partial charges assigned and must not be isomorphic with any other
molecules in the list.
If specified, partial charges for any molecules isomorphic to those
given will be taken from the given molecules' `partial_charges`
attribute instead of being determined by the force field. All
molecules in this list must have partial charges assigned and must
not be isomorphic with any other molecules in the list.

I think this might help clear up the misconception that I sometimes see hinted at that this only affects the same molecule in the topology - like if I have four copies of a molecule, and pass literally one of those objects to charge_from_molecules, then only that object will get those charges.

Comment on lines +149 to +151
"Preset charges were provided (via `charge_from_molecules`) alongside a force field that includes "
"virtual site parameters. Note that virtual sites will be applied charges from the force field and "
"cannot be given preset charges.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Preset charges were provided (via `charge_from_molecules`) alongside a force field that includes "
"virtual site parameters. Note that virtual sites will be applied charges from the force field and "
"cannot be given preset charges.",
"Preset charges were provided (via `charge_from_molecules`) alongside a force field that includes "
"virtual site parameters. Note that virtual sites will be applied charges from the force field and "
"cannot be given preset charges. Virtual sites may also affect the charges of their orientation "
"atoms, even if those atoms are given preset charges.",

Do you think adding something like this to the warning will be useful for people? It seems like the sort of thing that would surprise me but maybe it's just noise.

if molecules_with_preset_charges is None:
return None

molecule_set = {molecule.to_smiles() for molecule in molecules_with_preset_charges}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
molecule_set = {molecule.to_smiles() for molecule in molecules_with_preset_charges}
molecule_set = {molecule for molecule in molecules_with_preset_charges}

Molecule.__eq__() exists, so you should be able to make a set out of molecules directly. It definitely seems to work for me. Is matching SMILES faster for large topologies or something?

Comment on lines +238 to +243
topology = Topology.from_molecules(
[
Molecule.from_smiles("C"),
Molecule.from_smiles("CCO"),
],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this topology include a molecule with partial charges, to ensure that it's testing if there are any missing charges?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants