Fix to non-unique LAMMPS molecule IDs #999

timbernat · 2024-07-01T21:48:16Z

Description

Addresses issue 998, wherein distinct but chemically-identical molecules are forced to have the same molecule ID when writing the atom table section of LAMMPS files.

Checklist

Add molecule_index to CTAB hash to ensure uniqueness
Ensure this PR passes existing tests (single-line edit)
Lint

…emically-identical Molecules in a Topology

…r implementation

for more information, see https://pre-commit.ci

codecov · 2024-07-01T21:53:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.32%. Comparing base (a649485) to head (dec84f5).

Additional details and impacted files

j-wags · 2024-07-02T20:35:47Z

I'll let @mattwthompson give the final review when he's back next week, but in the meantime, there are three things that would improve this PR:

Could you move this from a fork to a branch? I've just added you to the openforcefield "developers" team so you should be able to make branches now. Feel free to ping me on Slack if you'd like to screenshare to do this. You'll probably need to close this PR and open a new one with the changes in a branch.
Is it possible to fix this further upstream for all writers since @PEFrankel mentioned having a similar issue with gromacs here?
Could you add a minimal test to ensure that this bug doesn't get reintroduced?

timbernat · 2024-07-03T22:36:09Z

Can do! To address point-by-point:

I can move the change to a branch without too much trouble, yes. There is an extant lammps-improvements branch which is 300+ commits behind main (and crucially does not contain the code I'd like to modify), should I try to pull into it from main to bring it up-to-date or would it be simpler for me to just create a new branch for these edits?
I'm not sure at this point in time if an upstream fix is possible, as I've never used the Interchange GMX interface myself. I'd like to make the changes to the LAMMPS writers first, and then once @PEFrankel provides a more detailed write-up of the GMX issues I can maybe dig thru the source to try and hunt down their cause.
I'd be more than happy to write a test! Just a few clarifying questions:
- I assume it will live here?
- Are there any prescribed requirements by OpenFF (behavioral, formatting, or otherwise) for unit tests that I should follow when writing it?
- If I'd like to include data that the test references (namely a PDB file), I assume that it should live in openff.interchange._tests.data and be referenced by openff.utilities.utilities.get_data_file_path(), is that correct?
  Thanks for the help!

mattwthompson · 2024-07-08T18:19:35Z

+1 on introducing these changes from a branch instead of a fork. Please also base the changes off of the current main branch if at all possible. That other branch can be ignored unless you need something from it. It looks like these changes are small, which makes it feasible to review.

Feel free to focus these changes on LAMMPS but please do add a comment like "TODO: See if this change is also needed in the GROMACS/Amber exporters" so it's at least noted somewhere. I'd be happy if Patrick or somebody carried the torch to another changeset (though I'm not sure if this problem also applied to GROMACS files).

I assume it will live here

Your tests can live there or here or anywhere that looks halfway appropriate. Interchange's test suite isn't very well-organized at the moment and I'm more concerned about the tests existing than being in the right place immediately.

Are there any prescribed requirements by OpenFF

Not really. Formatting is handled by automation. I'd like to think our tests follow "Arrange-Act-Assert" either in spirit or in a partially literal sense. What I look for in reviewing is if I can understand what the test is trying to do and that the test actually checks everything that it sets out to check (sometimes this takes several assert statements). A good reproducing example is most of the work of writing good tests, so you should be able to yoink some of your work in #998 over into tests.

If I'd like to include data [...]

Yep, precisely. I have a marginal preference for creating topologies on the fly vs. reading files, but if you already have PDB files sitting around those would be fine

timbernat · 2024-07-09T19:37:30Z

Alrighty, I've moved my changes to a new branch ("lammps-hotfix"), opened a new PR for these changes, and written a unit test to verify that the changes take. I rewrote it such that the Topology could be generated on the fly (no bundled PDB data necessary), but couldn't avoid the creation of a temporary .lammps file to test the correct final output was produced; everything else in the test is basically self-contained.

For full disclosure, I've never used Pytest before and wasn't clear what the role of the decoration and formatting of the TestLammps class was, so I wrote my test to simply return bool (True if passing, False otherwise). I imagine this is relatively easy to reformat if needed, but I would be interested to know what's involved in that if that's the case. In an environment made from the dev_env.yaml requirements on my machine, the tests runs and returns as expected, namely False with the current main code and True with my modifications.

Hope this will pass the review, if so I'd be happy to dig into the related GROMACS/AMBER issue and put forth further LAMMPS improvements as I spot them. Thanks!

mattwthompson · 2024-07-09T20:03:10Z

I've never used Pytest before

Well, ya fooled me. You seem to understand enough to write tests, but here's a good resource on the basics if you want to start from the ground up again: https://education.molssi.org/python_scripting_cms/08-testing/index.html

wasn't clear what the role ...

There isn't much of consequence in the structure of the TestLammps class, and in fact it's just a way of organizing tests in files that may have dozens of them. See, for example, some test files in the toolkit that have many tests grouped up into various TestThing classes purely for bookkeepping: https://github.com/openforcefield/openff-toolkit/blob/e34e786c2fc62a7dc7d217465c0217c44b9f4500/openff/toolkit/_tests/test_parameters.py

timbernat and others added 3 commits July 1, 2024 15:39

Added mol ID salt to molecule hash to permit unique ID numbers for ch…

86c85ee

…emically-identical Molecules in a Topology

Added hashing of salted mol string for greatest consistency with prio…

a26e09f

…r implementation

[pre-commit.ci] auto fixes from pre-commit.com hooks

dec84f5

for more information, see https://pre-commit.ci

timbernat mentioned this pull request Jul 1, 2024

LAMMPS atom writer makes assigning unique IDs to chemically-identical molecules impossible #998

Closed

timbernat mentioned this pull request Jul 9, 2024

LAMMPS non-unique molecule ID hotfix #1000

Merged

5 tasks

timbernat closed this Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to non-unique LAMMPS molecule IDs #999

Fix to non-unique LAMMPS molecule IDs #999

timbernat commented Jul 1, 2024 •

edited

Loading

codecov bot commented Jul 1, 2024 •

edited

Loading

j-wags commented Jul 2, 2024

timbernat commented Jul 3, 2024

mattwthompson commented Jul 8, 2024

timbernat commented Jul 9, 2024 •

edited

Loading

mattwthompson commented Jul 9, 2024

Fix to non-unique LAMMPS molecule IDs #999

Fix to non-unique LAMMPS molecule IDs #999

Conversation

timbernat commented Jul 1, 2024 • edited Loading

Description

Checklist

codecov bot commented Jul 1, 2024 • edited Loading

Codecov Report

j-wags commented Jul 2, 2024

timbernat commented Jul 3, 2024

mattwthompson commented Jul 8, 2024

timbernat commented Jul 9, 2024 • edited Loading

mattwthompson commented Jul 9, 2024

timbernat commented Jul 1, 2024 •

edited

Loading

codecov bot commented Jul 1, 2024 •

edited

Loading

timbernat commented Jul 9, 2024 •

edited

Loading