Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LAMMPS atom writer makes assigning unique IDs to chemically-identical molecules impossible #998

Closed
timbernat opened this issue Jul 1, 2024 · 3 comments
Labels
lammps Relating to LAMMPS

Comments

@timbernat
Copy link
Contributor

timbernat commented Jul 1, 2024

Description

I am trying to use Interchange to generate LAMMPS input files for some large polymer systems, which consist of many repeated-and-translated copies of oligomers, plasticizers, and solvent water molecules. I've found that in every case, a unique molecule ID in the atoms table section of the resulting LAMMPS file is only generated for each distinct Molecule chemistry, rather than for each individual Molecule in the Topology.

Some further digging reveals that the culprit is a speed-up update to the LAMMPS valence writers, in particular the _write_atoms() function, which generates molecule IDs from a chemical table hash of each molecule, allegedly to bypass an extra toolkit wrapper conversion step. This molecule CTAB hash, in turn, is only based on basic chemical info of the atoms and bonds in the Molecule object (namely atom symbols, formal charges, stereo, and bond orders).

The consequence of this is that any distinct Molecules with identical chemical tables in the same Topology will unavoidably assigned the same molecule ID in the atoms table when writing to LAMMPS

Reproduction

This turns out to affect even really basic systems, such as a box filled with nothing but water.
To illustrate, one could take the waterbox.pdb file found here and execute the following code:

from pathlib import Path
from openff.toolkit import Topology, ForceField


waterbox_path = Path('waterbox.pdb')
watertop = Topology.from_pdb(waterbox_path)

ff = ForceField('tip3p.offxml')
winc = ff.create_interchange(watertop)
winc.to_lammps('water.lammps')

Output

All atoms in the resulting water.lammps file will be labelled with molecule ID 500 (the number of unique water molecules in the Topology), rather than unique IDs from 1-500.

Solution
The simplest solution from my point of view is to salt the molecule ID hash in _write_atoms() with the molecule ID to ensure distinct molecules with identical CTABs are hashed as such. This would look something like (interop/lammps/export/export.py, lines 296-300):

    for molecule_index, molecule in enumerate(interchange.topology.molecules):
        molecule_hash = hash(f'{molecule_index}_{molecule.ordered_connection_table_hash()}') # salt with mol ID
        molecule_map[molecule_hash] = molecule_index
        for atom in molecule.atoms:
            atom_molecule_map[atom] = molecule_hash

I've tested on my machine (via manual source code edits in a conda env) that this circumvents the non-uniqueness issue, and have put together a simple PR to do exactly this.
Software versions

  • OS: running on Linux Ubuntu 20.04.6 LTS
  • Installed OpenFF Toolkit (0.15.2) and Interchange (0.3.23) via mamba, though the offending portions of code are still present in the most recent versions on GitHub.
@PEFrankel
Copy link

PEFrankel commented Jul 1, 2024

I was just working on this very issue and this seems to prove it's not isolated since it persists within interchange.to_gromacs as well. A broader solution may be necessary to fix this (i.e. somewhere in interop or just copied in each export).

@j-wags
Copy link
Member

j-wags commented Jul 2, 2024

Thanks @timbernat for the great writeup and quick PR. Matt is on a much-needed vacation right now so I'll help polish up the PR but his final review won't come until next week :-)

@mattwthompson mattwthompson added the lammps Relating to LAMMPS label Jul 16, 2024
@mattwthompson
Copy link
Member

We're pretty sure this is fixed in #1000, which I expect to include in 0.3.28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lammps Relating to LAMMPS
Projects
None yet
Development

No branches or pull requests

4 participants