Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ordering of charges in CHARGE section when using to_prmtop #1033

Closed

Conversation

lukasbaldauf
Copy link
Contributor

Description

The ordering of the atomic charges in the CHARGE section of the .prmtop was wrong because we iterated over a dictionary. Now we excplicitly iterate over the atoms of interchange.topolgy to avoid this and fixes the following bug:

Reproduction

from openff.toolkit import ForceField, Molecule
from openff.interchange import Interchange

water = Molecule.from_smiles("O")
topology = water.to_topology()

for i in range(2):
	topology.add_molecule(water)

sage = ForceField("openff-2.0.0.offxml")

interchange = Interchange.from_smirnoff(force_field=sage, topology=topology)
interchange.to_prmtop("test.prmtop")

Output
Among others, we can see the following entries in the test.prmtop file:

%FLAG ATOM_NAME
%FORMAT(20a4)
   O   H   H   O   H   H   O   H   H
...
%FLAG CHARGE
%FORMAT(5E16.8)
 -1.51973982E+01 -1.51973982E+01 -1.51973982E+01  7.59869910E+00  7.59869910E+00
  7.59869910E+00  7.59869910E+00  7.59869910E+00  7.59869910E+00

where the oxygen charges are listed first, which is wrong.

Checklist

  • Add tests
  • Lint
  • Update docstrings

The ordering of the atomic charges in the CHARGE section of the
.prmtop was wrong because we iterated over a dictionary. Now we
excplicitly iterate over the atoms of interchange.topolgy to avoid
this.
@mattwthompson
Copy link
Member

Thanks for catching this! I'll dig into this more later but on the surface -

I'm not sure this solution is the one we want to go with (I'll wait for tests to run, and there are some other considerations around i.e. performance of larger systems with proteins or many solvents) but we probably shouldn't rely on dictionary ordering here. The other writers do, and haven't had issues like this reported, so that'll be another thing to look at. Atom ordering is a pretty common pain point in our field and something I worry about often since it can frequently cause results that are incorrect in subtle ways.

Copy link

codecov bot commented Aug 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.36%. Comparing base (6137232) to head (51f8bd7).
Report is 15 commits behind head on main.

Additional details and impacted files

@mattwthompson
Copy link
Member

Hmm ... I expected this to very much not work since partial charges aren't stored on the topology. (One would be forgiven for thinking so, the reasons are complex and mostly historical.)

I'd like to yoink your little snippet into a test since what you're seeing there is very much not what we're after. (You don't have to do this - I can do it when I get to this.)

I just come across #970 which is what I was thinking of earlier; this could be it, but there could be something even worse going on. I'm a bit confused by all of this - if charges are written in such poor order, I'd expect almost nothing about the Amber interface to work!

@lukasbaldauf
Copy link
Contributor Author

I think the reason the Amber interface works is because the atomic indices are explicitly included in the .prmtop for all other interactions, but for the charges the ordering is implicitly assumed.

I’m still looking into this as I’m trying to match the forces between OpenMM and Cp2k (using the .prmtop). I’m still seeing some small but significant discrepancies between the forces, even with the charges correctly ordered.

Among others, I believe the 1-4 interactions are not calculated because they are part of the exclusion list given in the .prmtop. However, setting scale14 = 0.0 for both vdW and Electrostatics in the .offxml still doesn’t match the forces exactly.

So there is something else going on that I haven’t wrapped my head around yet. I’ll keep you posted on this when I figure this one out.

@lukasbaldauf
Copy link
Contributor Author

I found the reason for the discrepancies, which were related to Cp2k stuff. All forces match perfectly now, so the order of the charges was the only thing that was off in the .prmtop generation.

@mattwthompson
Copy link
Member

Thanks @lukasbaldauf - just an FYI, the "OpenEye true" tests will always fail because the license isn't configured to work from a fork. I'll yoink these changes into a separate PR when I start digging into this

@mattwthompson
Copy link
Member

Looking at this again, I do think the issue is localized to the ordering of the charge array. The atoms are listed as expected and the bond graph looks to also match an OHHOHH ordering

In [15]: [(atom.charge, atom.atomic_number) for atom in parmed.load_file("test.prmtop").atoms]
Out[15]:
[(-0.834, 8),
 (-0.834, 1),
 (-0.834, 1),
 (0.417, 8),
 (0.417, 1),
 (0.417, 1),
 (0.417, 8),
 (0.417, 1),
 (0.417, 1)]

In [16]: [
    ...:     int(x) / 3
    ...:     for x in "       0       3       1       0       6       1       9      12       1       9"
    ...:     "      15       1      18      21       1      18      24       1".split()
    ...: ]
Out[16]:
[0.0,
 1.0,
 0.3333333333333333,
 0.0,
 2.0,
 0.3333333333333333,
 3.0,
 4.0,
 0.3333333333333333,
 3.0,
 5.0,
 0.3333333333333333,
 6.0,
 7.0,
 0.3333333333333333,
 6.0,
 8.0,
 0.3333333333333333]

@mattwthompson
Copy link
Member

Indeed the underlying issue is how atoms are inserted into the electrostatics collection:

ipdb> [
    (key.atom_indices[0], value.m)
    for key, value in interchange["Electrostatics"].charges.items()
]
[(0, -0.834), (3, -0.834), (6, -0.834), (1, 0.417), (4, 0.417), (7, 0.417), (2, 0.417), (5, 0.417), (8, 0.417)]

The way OpenMM and GROMACS exports are constructed avoids this issue, if only by accident. LAMMPS might have this issue.

@mattwthompson
Copy link
Member

Okay, I think this is a straightforward enough fix, which I've tried at in #1045

@mattwthompson
Copy link
Member

Thanks much @lukasbaldauf for this report - a detailed reproduction like this may seem tedious but it made my job much easier. (The test I added is basically the script you wrote up.) Even though your proposed solution isn't the one we went with, it immediately brought me to where in the codebase we needed to make changes, and the first idea worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants