Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created OpenMM Topology and System virtual site atom indices don't match #1049

Open
IAlibay opened this issue Sep 16, 2024 · 6 comments
Open
Labels

Comments

@IAlibay
Copy link

IAlibay commented Sep 16, 2024

Expectation

The atom indices of an OpenMM Topology match those of its associated System, this includes virtual sites.

Current status

Creating an OpenMM Topology with virtual sites puts the virtual site at the end of each residue whilst the System has them completely at the end of the entire System. This means that if you have more than one residue, the atom indices don't match.

Reproduction

from openff.toolkit import Molecule, Topology, ForceField
from openff.interchange.components._packmol import solvate_topology
from openff.interchange import Interchange

ligand = Molecule.from_smiles('CCCCC')
ligand.generate_conformers()

off_top = Topology.from_molecules(ligand)
solvated_off_top = solvate_topology(topology=off_top)

ff = ForceField('openff-2.2.0.offxml', 'opc.offxml')
inter = Interchange.from_smirnoff(topology=solvated_off_top, force_field=ff)
omm_top = inter.to_openmm_topology()
omm_sys = inter.to_openmm_system()

print(list(omm_top.atoms())[20].name == 'EP')
print(omm_sys.isVirtualSite(20) == True)

Output

True
False

Software versions

Interchange v0.3.27

@IAlibay
Copy link
Author

IAlibay commented Sep 16, 2024

The "ideal scenario" ask

If we could change behaviour to not have virtual sites being added at the end of the system, but rather at the end of each residue (edit: read here as continuous molecule - I know that would be a pain for polymers otherwise), that would save a lot of headaches downstream.

Particularly, this behaviour doesn't match OpenMM's default behaviour (i.e. what happens with Modeller) when it comes to virtual sites.

What this means is that I can be presented with a pair of OpenMM System and Topology and not know if I should expect the indices to match or if I need to shift all the indices because the virtual sites got shuffled around. So I have to build shims everywhere to handle OpenFF specific behaviour.

The alternative asks

  1. Warn users on OpenMM Topology creation that the indices won't match
  2. Document this issue / behaviour
  3. Provide tooling to map the index shift (i.e. "topology_to_system_index(i) -> shifted_i")
    • I could do this myself downstream, but it's messy and prone to error if something changes in Interchange, it'd be a lot cleaner / safer if it was done here.

@mattwthompson
Copy link
Member

Virtual sites being added to the end of the system is intentional1, them not matching up with the topology is not. The ordering in the topology derives from toolkit behavior, although it's never properly supported virtual sites. I'm going to look into updating the topology output to match the system's ordering, alternatively with some options to pick between them. The internals already have some toggles here (by comparison, GROMACS prefers virtual sites at the end of each molecule - not residue) but this must not be exposed at the OpenMM export.

I can't commit to changing the default behavior, but these indices lining up seems like a completely reasonable user expectation. These details need to be better documented as well.

I notice that this particular behavior (matching up, but all virtual sites at end) isn't your ideal scenario nor is it an alternative ask - is this something that's workable, or would you still need to build a shim?

Footnotes

  1. I forget why, but changing this is something I'd like to avoid. OpenMM itself works with just about any particle bookkeepping

mattwthompson added a commit that referenced this issue Sep 17, 2024
@IAlibay
Copy link
Author

IAlibay commented Sep 17, 2024

I notice that this particular behavior (matching up, but all virtual sites at end)

If the OpenMM Topology matches the OpenMM System, then that works for us. That possibility didn't cross my mind since I just couldn't work out "how" (since OpenMM Topologies need contiguous indices for each residue).

@mattwthompson
Copy link
Member

OpenMM Topologies need contiguous indices for each residue

Sadly, I'm running into this now. The stopgap solution probably will have a weak spot with residue accounting, though I'm hoping the rest of the particle indexing will make sense or at least be self-consistent.

@IAlibay
Copy link
Author

IAlibay commented Sep 17, 2024

The stopgap solution probably will have a weak spot with residue accounting

I'll have to think about it a bit more, but I suspect that for us fixing indexing at the risk of residues being split kinda breaks things in different ways 😅

@mattwthompson
Copy link
Member

I get that - I'm not sure there'll be a perfect solution here. The major constraints I'm working with are

  • the toolkit does not always provide me topologies with coherent residue information1 so I cannot trust residue information in and of itself
  • OpenMM doesn't have clean accounting for molecules2 as distinct from residues
  • no community standard to work off of, particularly for important cases in which residues and molecular graphs differ, as simple as vanilla proteins or as tricky at PTM/organometallics/etc.

I have a swing at this in draft at #1051, from where I'll continue to add in some tests and iterate. Documentation probably won't be added until something crystalizes out.

Footnotes

  1. Nor should it, arguably

  2. Ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants