Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to non-unique LAMMPS molecule IDs #999

Closed
wants to merge 3 commits into from

Conversation

timbernat
Copy link
Contributor

@timbernat timbernat commented Jul 1, 2024

Description

Addresses issue 998, wherein distinct but chemically-identical molecules are forced to have the same molecule ID when writing the atom table section of LAMMPS files.

Checklist

  • Add molecule_index to CTAB hash to ensure uniqueness
  • Ensure this PR passes existing tests (single-line edit)
  • Lint

Copy link

codecov bot commented Jul 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.32%. Comparing base (a649485) to head (dec84f5).

Additional details and impacted files

@j-wags
Copy link
Member

j-wags commented Jul 2, 2024

I'll let @mattwthompson give the final review when he's back next week, but in the meantime, there are three things that would improve this PR:

  • Could you move this from a fork to a branch? I've just added you to the openforcefield "developers" team so you should be able to make branches now. Feel free to ping me on Slack if you'd like to screenshare to do this. You'll probably need to close this PR and open a new one with the changes in a branch.
  • Is it possible to fix this further upstream for all writers since @PEFrankel mentioned having a similar issue with gromacs here?
  • Could you add a minimal test to ensure that this bug doesn't get reintroduced?

@timbernat
Copy link
Contributor Author

Can do! To address point-by-point:

  • I can move the change to a branch without too much trouble, yes. There is an extant lammps-improvements branch which is 300+ commits behind main (and crucially does not contain the code I'd like to modify), should I try to pull into it from main to bring it up-to-date or would it be simpler for me to just create a new branch for these edits?
  • I'm not sure at this point in time if an upstream fix is possible, as I've never used the Interchange GMX interface myself. I'd like to make the changes to the LAMMPS writers first, and then once @PEFrankel provides a more detailed write-up of the GMX issues I can maybe dig thru the source to try and hunt down their cause.
  • I'd be more than happy to write a test! Just a few clarifying questions:
    • I assume it will live here?
    • Are there any prescribed requirements by OpenFF (behavioral, formatting, or otherwise) for unit tests that I should follow when writing it?
    • If I'd like to include data that the test references (namely a PDB file), I assume that it should live in openff.interchange._tests.data and be referenced by openff.utilities.utilities.get_data_file_path(), is that correct?
      Thanks for the help!

@mattwthompson
Copy link
Member

+1 on introducing these changes from a branch instead of a fork. Please also base the changes off of the current main branch if at all possible. That other branch can be ignored unless you need something from it. It looks like these changes are small, which makes it feasible to review.

Feel free to focus these changes on LAMMPS but please do add a comment like "TODO: See if this change is also needed in the GROMACS/Amber exporters" so it's at least noted somewhere. I'd be happy if Patrick or somebody carried the torch to another changeset (though I'm not sure if this problem also applied to GROMACS files).

I assume it will live here

Your tests can live there or here or anywhere that looks halfway appropriate. Interchange's test suite isn't very well-organized at the moment and I'm more concerned about the tests existing than being in the right place immediately.

Are there any prescribed requirements by OpenFF

Not really. Formatting is handled by automation. I'd like to think our tests follow "Arrange-Act-Assert" either in spirit or in a partially literal sense. What I look for in reviewing is if I can understand what the test is trying to do and that the test actually checks everything that it sets out to check (sometimes this takes several assert statements). A good reproducing example is most of the work of writing good tests, so you should be able to yoink some of your work in #998 over into tests.

If I'd like to include data [...]

Yep, precisely. I have a marginal preference for creating topologies on the fly vs. reading files, but if you already have PDB files sitting around those would be fine

@timbernat
Copy link
Contributor Author

timbernat commented Jul 9, 2024

Alrighty, I've moved my changes to a new branch ("lammps-hotfix"), opened a new PR for these changes, and written a unit test to verify that the changes take. I rewrote it such that the Topology could be generated on the fly (no bundled PDB data necessary), but couldn't avoid the creation of a temporary .lammps file to test the correct final output was produced; everything else in the test is basically self-contained.

For full disclosure, I've never used Pytest before and wasn't clear what the role of the decoration and formatting of the TestLammps class was, so I wrote my test to simply return bool (True if passing, False otherwise). I imagine this is relatively easy to reformat if needed, but I would be interested to know what's involved in that if that's the case. In an environment made from the dev_env.yaml requirements on my machine, the tests runs and returns as expected, namely False with the current main code and True with my modifications.

Hope this will pass the review, if so I'd be happy to dig into the related GROMACS/AMBER issue and put forth further LAMMPS improvements as I spot them. Thanks!

@mattwthompson
Copy link
Member

I've never used Pytest before

Well, ya fooled me. You seem to understand enough to write tests, but here's a good resource on the basics if you want to start from the ground up again: https://education.molssi.org/python_scripting_cms/08-testing/index.html

wasn't clear what the role ...

There isn't much of consequence in the structure of the TestLammps class, and in fact it's just a way of organizing tests in files that may have dozens of them. See, for example, some test files in the toolkit that have many tests grouped up into various TestThing classes purely for bookkeepping: https://github.com/openforcefield/openff-toolkit/blob/e34e786c2fc62a7dc7d217465c0217c44b9f4500/openff/toolkit/_tests/test_parameters.py

@timbernat timbernat closed this Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants