🌟 [FEATURE] OpenMM #288

asedova · 2022-12-22T15:25:19Z

Is your feature request related to a problem? Please describe.
Using LAMMPS can be a bit of a pain.

Describe the solution you'd like
OpenMM is a really easy to use, Pythonic MD code that also allows a user-defined set of forces and runs ridiculously fast on one GPU. It seems they are also staring to incorporate some ML methods. It would be amazing if one could run NEQUIP MD with OpenMM.

Describe alternatives you've considered
ASE is an option, but wouldn't get the speed of a GPU-accelerated MD code. OpenMM will JIT user-defined forces to the GPU.

asedova · 2022-12-22T19:46:41Z

Left an question on OpenMM's github about how this would work, and here is the reply: openmm/openmm-ml#48 (comment) I am about to see how hard this would be for a high-level user to do themselves. Any help would be appreciated!

Linux-cpp-lisp · 2022-12-23T01:07:50Z

Hi @asedova ,

Thanks very much for your interest in our methods and working on this integration!

We previously had some discussion about this (#76) and the great news is that the main thing that was missing---a PyTorch neighborlist--- has now been implemented very nicely by Felix Musil with good performance and what appear to be pretty rigorous correctness tests: https://github.com/felixmusil/torch_nl. So it's now the perfect time for a "high-level user" to get this done finally! 🎉

Reading @peastman 's comment in the linked thread, I think the best way forward would be a 1-to-1 translation of the OpenMM-ML ANI interface https://github.com/openmm/openmm-ml/blob/main/openmmml/models/anipotential.py.

It would be something like:

Load a deployed NequIP model using nequip.scripts.deploy.load_deployed_model (nequip.ase.NequIPCalculator is a good example of this); the result is a TorchScript model.
Similar to ANIForce, define a wrapper around the deployed NequIP model that:
- Uses torch_nl to construct edge_index for NequIP from the positions and, optionally, cell + PBC. It is important to set edge_cell_shift correctly as well, see here for how we do this with the ase neighborlist. For complete PBC correctness, make sure to replicate the self-interaction filtering logic as well.
- Builds a NequIP input dictionary (Dict[str, torch.Tensor], keys can be found here). Necessary keys are pos, edge_index, atom_types and, if PBC are used, cell and edge_cell_shift.
- Somehow map OpenMM atom types / species into the NequIP nequip.data.AtomicDataDict.ATOM_TYPES_KEY 0-based index. See again (nequip.ase.NequIPCalculator for an example of how to do this. I'm not super familiar with how the OpenMM type machinery works, but basically there should be a user-defined mapping of OpenMM types into NequIP type names, which the code then internally uses to create a mapping of OpenMM types to NequIP type numbers.
- Call the deployed NequIP model with this dict as input, and get the total energy and forces from the result: model(in_dict)[nequip.data.AtomicDataDict.TOTAL_ENERGY_KEY / FORCE_KEY]
- Do any unit conversions according to the user's provided conversions
- Return this total energy and forces, see https://github.com/openmm/openmm-torch#computing-forces-in-the-model
torch.jit.script this whole thing, then torch.jit.save it to a tempfile and pass it along to OpenMM-Torch.

It would be very nice to see this PRed into OpenMM-ML; please let me know if you have more questions, and feel free to ask/open PRs (off develop) here if there are any additions to nequip needed to make it work.

Linux-cpp-lisp · 2022-12-23T01:20:57Z

OK @asedova I've edited the above to reflect more recent versions of OpenMM-Torch 🙂

asedova · 2022-12-23T01:46:41Z

Thanks so much! I am going to try to see what I can do. I just started looking at NEQUIP after some time with DeePMDkit, so it may take me some time to work through all this. So happy there is momentum in this direction!

Linux-cpp-lisp · 2022-12-23T02:13:32Z

Great! Let me know if you have any questions. It looks like a lot but I think it should hopefully be straightforward, mostly "copy-paste" style coding.

peastman · 2022-12-23T04:38:14Z

there should be a user-defined mapping of OpenMM types into NequIP type names, which the code then internally uses to create a mapping of OpenMM types to NequIP type numbers.

OpenMM doesn't have any fixed set of types. It treats atom types as a property of a particular force field. addForces() receives a Topology object which contains chemical information: elements, bonds, residue names, etc. No set of atom types have been applied to them.

How does NequIP define atom types?

Linux-cpp-lisp · 2022-12-23T06:17:46Z

It treats atom types as a property of a particular force field.

That sounds quite compatible.

In nequip, atom types are just 0-based integer indexes of which there can be arbitrarily many. Each is also associated with a string name for usability. What they mean is specific to a given trained model and how the user did the training, but typically the names just correspond to element symbols. So clearly there should be some kind of user-defined mapping from OpenMM "groups" of atoms (however that is mostly naturally defined) to nequip type names; the user is then responsible for making sure this mapping makes sense for the model they are using.

Similarly, nequip places no restriction on system of units, as long as they are self-consistent; a trained model has the units of the data used to train it. So it's also necessary to allow the user to supply the unit (conversion) from the units of the particular model they are using into kJ/mol for OpenMM.

asedova · 2022-12-23T13:17:41Z

@peastman what troubles me is that with NEQUIP (and other similar codes like DeePMDkit) there is no topology, no bonds, no residues. And for good reason-- since it's trained on DFT, these all can actually change during a simulation, just like in AIMD. I'm not super familiar with ANI, but with these codes, input file would just be atom type (element) + features in, then the model gets energy + forces on each atom. The most compatible traditional input file to use as an example would be an xyz file, like CP2K uses.

Linux-cpp-lisp · 2022-12-23T16:47:50Z

receives a Topology object which contains chemical information: elements, bonds, residue names, etc.

I think as a first solution, you can just ignore all of that, take the elements, convert them to chemical symbols, and use those as NequIP type names? For most models this will be sufficient... what is missing would be something like how do you specify a model that treats, say, the oxygen atoms in one molecule as different from the oxygen atoms in a second molecule. That is clearly some kind of a function of the topology, I'm just not sure in OpenMM language how you let the user specify that.

asedova · 2022-12-23T16:56:25Z

That would be ok-- for now, we can let the model decide how the oxygen will act in different contexts, which is what I planned to start with anyway. I also think OpenMM would be ok if the topology object is not coming from any of the normal topology files from packages like AMBER and GROMACS. OpenMM in general allows a good amount of user-defined things, it's just not how I normally use it so I need to do some reading up. Here's an example: https://gpantel.github.io/computational-method/LJsimulation/

peastman · 2022-12-23T17:19:28Z

what is missing would be something like how do you specify a model that treats, say, the oxygen atoms in one molecule as different from the oxygen atoms in a second molecule.

In any model that allows forming and breaking bonds, I think the only option is to equate atom types with elements. Everything else can change with time. If the molecules are required to stay fixed over a simulation, that does allow incorporating more information into the atom types. I've experimented with including formal charges and/or hybridization states into the definitions of types, and it does seem to help accuracy.

Most force fields in OpenMM identify atom types by matching templates to residues. That isn't required though. For example, OpenFF uses a completely different mechanism of assigning parameters based on matching SMIRKS patterns (and not involving atom types at all). For ML models, you can build your model however you want. The PyTorch model and the Python code to build it are both completely in your control.

Linux-cpp-lisp · 2022-12-23T17:43:17Z

👍 I vote then to start with just taking OpenMM element -> chemical symbol as NequIP type name, possibly with an option of letting the user provide their own function that maps OpenMM topology -> NequIP type names, since every trained NequIP model will be different in this way.

Linux-cpp-lisp · 2023-02-06T18:54:25Z

Hi @asedova ,

Just curious if you'd had any luck with this. Thanks!

sef43 · 2023-02-13T18:58:44Z

Hi @asedova @Linux-cpp-lisp how is progress on this going? I am an RSE working with OpenMM, I have time available to help write/write a wrapper in openmm-ml for NequIP. Let me know if you have any questions and how far you have got with this!

asedova · 2023-02-13T22:27:51Z

Hey all-

Progress is slow, since we have never used NEQUIP that much. We are much more familiar with DeePMD. I should be able to get back to this soon; we are finishing up some DeePMD work. @sef43 I was wondering if there would be a way to pass the virial/stress tensor to OpenMM, based on this: #291 (comment)

Linux-cpp-lisp · 2023-02-13T23:33:11Z

@sef43 hi, thank you for reaching out! 👋 We'd love to see this happen, so your help and hopefully leadership on it as an RSE who knows OpenMM well would be fantastic.

I think all my relevant thoughts are collected in the post above (#288 (comment)), but please feel free to get in touch either here or by email (see my GitHub profile) if there is more we should discuss.

@asedova no worries; do you have any code we ought work off, or would it be better for @sef43 to start a fresh repo / fork to work on?

(I'm assuming @sef43 that, as laid out above, all of this wrapper should live in the fork of openmm-ml and eventually be merged there, but if there is code that we think is more appropriate in our repos we can make that happen easily too.)

Linux-cpp-lisp · 2023-02-13T23:48:48Z

I've just added a few more notes to the post above with information on how we can go about this, FYI.

Regarding stress/virial @asedova, my understanding is that OpenMM doesn't support or need it: openmm/openmm-torch#91 (comment)

asedova · 2023-02-14T00:40:51Z

I would suggest just starting with a basic box of water example first, as done in the nature comms paper. Our systems are really difficult and we are still working out some of the science. Has getting a condensed-phase NNP to work with OpenMM happened before? I am confused because it seems there is a lot of gas phase work within ANI/TorchANI

Linux-cpp-lisp · 2023-02-14T06:00:22Z

Sure--- I think, however, that the science questions that you bring up are fairly decoupled from the software-level interfacing questions, which are fairly mechanical and binary (either they correctly transfer the predictions of the neural network potential or not, regardless of how useful / accurate that potential model is)

At the software level, I think the biggest difficulty moving from gas to condensed phase is PBC, but torch-nl should handle that as described above.

sef43 · 2023-02-14T13:41:21Z

I will start doing this then!

(I'm assuming @sef43 that, as laid out above, all of this wrapper should live in the fork of openmm-ml and eventually be merged there, but if there is code that we think is more appropriate in our repos we can make that happen easily too.)

Yes I will make a fork with the aim that it ends up merged into openmm-ml main, It shouldn't require any changes to the nequip repo. You steps in the comment above are very clear thank you, should be most of what I need, I will let you know when I have questions.

As @Linux-cpp-lisp has said there should be no major reasons why condensed phase simulations wouldn't work provided an appropriately trained neural network is used.
And yes OpenMM uses a Monte-Carlo barostat for NPT simulations, this means it does not do, and does not need, a virial calculation.

Linux-cpp-lisp · 2023-02-14T16:24:51Z

Fantastic! Thanks again @sef43 , looking forward to having this!

It shouldn't require any changes to the nequip repo. You steps in the comment above are very clear thank you, should be most of what I need, I will let you know when I have questions.

Sounds good, feel free to reach out any time here or by email as is convenient for you.

sef43 · 2023-02-15T13:32:51Z

@Linux-cpp-lisp I have a WIP fork branch here: https://github.com/sef43/openmm-ml/tree/nequip

I have done a first implementation. It works for the Toluene example, I have put an example here, which can be run in colab

It uses a trained deployed model I ran using your example nequip-train configs/example.yaml

Things I still need to do:

Implement PBCs properly. Can you provide me with a test case that is a liquid/condensed phase? I will need to be able to run the training, as there are some issues with pytorch versions (more below)
cleanup/check the way atom types/atomic numbers are used
test on GPUs
deal with different unit conversions, currently angstrom and kcal/mol from toluene example are hardcoded

Some issues:

I think the version of torch used by nequip that comes from pip (pip install torch) is not compatible with the version openmm-torch uses (conda install -c conda-forge pytorch). I got around this by using nequip to train the toluene example with conda-forge pytorch.
when i use nequip.scripts.deploy.load_deployed_model I get errors which I have put below. If I instead just use torch.jit.load directly then it works. Do you know why this might be the case? It looks like load_deployed_model also just calls torch.jit.load

pytorch error from using nequip.scripts.deploy.load_deployed_model:


/Users/stephen/miniconda3/envs/neqmm/lib/python3.10/site-packages/nequip/utils/_global_options.py:58: UserWarning: Setting the GLOBAL value for jit fusion strategy to `[('DYNAMIC', 3)]` which is different than the previous value of `[('STATIC', 2), ('DYNAMIC', 10)]`
  warnings.warn(
Traceback (most recent call last):
  File "/Users/stephen/Documents/nequip/openmm-ml/example/run_nequip.py", line 19, in <module>
    system = potential.createSystem(pdb.topology)
  File "/Users/stephen/miniconda3/envs/neqmm/lib/python3.10/site-packages/openmmml/mlpotential.py", line 178, in createSystem
    self._impl.addForces(topology, system, None, 0, **args)
  File "/Users/stephen/miniconda3/envs/neqmm/lib/python3.10/site-packages/openmmml/models/nequippotential.py", line 169, in addForces
    module = torch.jit.script(nequipforce)
  File "/Users/stephen/miniconda3/envs/neqmm/lib/python3.10/site-packages/torch/jit/_script.py", line 1286, in script
    return torch.jit._recursive.create_script_module(
  File "/Users/stephen/miniconda3/envs/neqmm/lib/python3.10/site-packages/torch/jit/_recursive.py", line 458, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, stubs_fn)
  File "/Users/stephen/miniconda3/envs/neqmm/lib/python3.10/site-packages/torch/jit/_recursive.py", line 520, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/Users/stephen/miniconda3/envs/neqmm/lib/python3.10/site-packages/torch/jit/_script.py", line 615, in _construct
    init_fn(script_module)
  File "/Users/stephen/miniconda3/envs/neqmm/lib/python3.10/site-packages/torch/jit/_recursive.py", line 500, in init_fn
    cpp_module.setattr(name, scripted)
RuntimeError: Could not cast attribute 'model' to type __torch__.torch.jit._script.RecursiveScriptModule (of Python compilation unit at: 0x6000032964f8): Expected a value of type '__torch__.torch.jit._script.RecursiveScriptModule (of Python compilation unit at: 0x6000032964f8)' for field 'model', but found '__torch__.nequip.nn._rescale.___torch_mangle_0.RescaleOutput (of Python compilation unit at: 0x60000326ee58)'
Exception raised from setattr at /Users/runner/miniforge3/conda-bld/pytorch-recipe_1664817728005/work/torch/csrc/jit/api/object.h:67 (most recent call first):
frame #0: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 92 (0x110c0559c in libc10.dylib)
frame #1: torch::jit::Object::setattr(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, c10::IValue) + 1312 (0x1344d6794 in libtorch_python.dylib)
frame #2: void pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object*)::$_3, void, torch::jit::Object&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, pybind11::object, pybind11::name, pybind11::is_method, pybind11::sibling>(torch::jit::initJitScriptBindings(_object*)::$_3&&, void (*)(torch::jit::Object&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, pybind11::object), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 840 (0x1346775e8 in libtorch_python.dylib)
frame #3: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3464 (0x13400e71c in libtorch_python.dylib)
frame #4: cfunction_call + 80 (0x10434ca68 in python3.10)
frame #5: _PyObject_MakeTpCall + 612 (0x1042fa050 in python3.10)
frame #6: method_vectorcall + 268 (0x1042fd770 in python3.10)
frame #7: call_function + 524 (0x1043e8e38 in python3.10)
frame #8: _PyEval_EvalFrameDefault + 26388 (0x1043e4bc8 in python3.10)
frame #9: _PyEval_Vector + 2056 (0x1043ddba8 in python3.10)
frame #10: call_function + 524 (0x1043e8e38 in python3.10)
frame #11: _PyEval_EvalFrameDefault + 26500 (0x1043e4c38 in python3.10)
frame #12: _PyEval_Vector + 2056 (0x1043ddba8 in python3.10)
frame #13: call_function + 524 (0x1043e8e38 in python3.10)
frame #14: _PyEval_EvalFrameDefault + 26388 (0x1043e4bc8 in python3.10)
frame #15: _PyEval_Vector + 2056 (0x1043ddba8 in python3.10)
frame #16: call_function + 524 (0x1043e8e38 in python3.10)
frame #17: _PyEval_EvalFrameDefault + 26500 (0x1043e4c38 in python3.10)
frame #18: _PyEval_Vector + 2056 (0x1043ddba8 in python3.10)
frame #19: call_function + 524 (0x1043e8e38 in python3.10)
frame #20: _PyEval_EvalFrameDefault + 26388 (0x1043e4bc8 in python3.10)
frame #21: _PyEval_Vector + 2056 (0x1043ddba8 in python3.10)
frame #22: call_function + 524 (0x1043e8e38 in python3.10)
frame #23: _PyEval_EvalFrameDefault + 26388 (0x1043e4bc8 in python3.10)
frame #24: _PyEval_Vector + 2056 (0x1043ddba8 in python3.10)
frame #25: method_vectorcall + 516 (0x1042fd868 in python3.10)
frame #26: _PyEval_EvalFrameDefault + 27276 (0x1043e4f40 in python3.10)
frame #27: _PyEval_Vector + 2056 (0x1043ddba8 in python3.10)
frame #28: call_function + 524 (0x1043e8e38 in python3.10)
frame #29: _PyEval_EvalFrameDefault + 26348 (0x1043e4ba0 in python3.10)
frame #30: _PyEval_Vector + 2056 (0x1043ddba8 in python3.10)
frame #31: run_mod + 216 (0x104438d7c in python3.10)
frame #32: _PyRun_SimpleFileObject + 1264 (0x10443881c in python3.10)
frame #33: _PyRun_AnyFileObject + 240 (0x1044377f4 in python3.10)
frame #34: Py_RunMain + 2340 (0x10445c8ac in python3.10)
frame #35: pymain_main + 1272 (0x10445da68 in python3.10)
frame #36: main + 56 (0x1042a3dc8 in python3.10)
frame #37: start + 2544 (0x198003e50 in dyld)

Linux-cpp-lisp · 2023-02-15T15:45:06Z

@sef43 fantastic! That was quick 😆 , thanks!

Regarding the issues, sort of in order:

I've now removed on develop the explicit dependence of nequip on torch so that the user's installation takes precedence, since pip is really bad about trying to override your existing, correct PyTorch installation. We always use PyTorch from conda instead, but also always take from PyTorch's own channels, something like: conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia... but I'm not sure how that would interact with OpenMM-torch's dependencies.
We support presently PyTorch 1.11 and 1.13 (2.0 should work once released but is untested; 1.12 is broken due to PyTorch bugs).
Using nequip.scripts.deploy.load_deployed_model(..., freeze=False) should fix it.

Implement PBCs properly. Can you provide me with a test case that is a liquid/condensed phase? I will need to be able to run the training, as there are some issues with pytorch versions (more below)

Two easy options would be:

Simple test data on a bulk metal from the EMT classical potential (see minimal_toy_emt.yaml):

dataset: EMTTest                                                                       # type of data set, can be npz or ase
dataset_element: Cu
dataset_supercell: [4, 4, 4]
dataset_num_frames: 50
chemical_symbols:
  - Cu

The silver dataset from our Allegro paper (again a one species bulk metal), if you prefer real data: Ag_warm_nospin.xyz (https://archive.materialscloud.org/record/2022.128)

Both should be in units of eV and Å. If it's easier, I'm happy to train a model quickly and email it to you as well if you prefer.

cleanup/check the way atom types/atomic numbers are used

Am I right in understanding that in OpenMM atom type == atomic number, and that OpenMM only reasons about atoms in terms of their atomic numbers? Is that why you have it set up to possibly allow mapping multiple nequip types to one OpenMM atomic number?

deal with different unit conversions, currently angstrom and kcal/mol from toluene example are hardcoded

👍 yep, NequIP/Allegro is entirely agnostic to units and preserves those of the training data (as you've seen), so a user-facing option in the OpenMM wrapper seems appropriate.

A finally, some random things:

https://github.com/sef43/openmm-ml/blob/nequip/openmmml/models/nequippotential.py#L128

batch is not required when there is only one example in the batch, and should be omitted.

sef43 · 2023-02-15T17:08:26Z

Am I right in understanding that in OpenMM atom type == atomic number, and that OpenMM only reasons about atoms in terms of their atomic numbers? Is that why you have it set up to possibly allow mapping multiple nequip types to one OpenMM atomic number?

I wouldn't read anything into what I have currently done, I need to understand a bit more about the nequip types!

Each particle in OpenMM is an Atom object which will be a specific Element, so you can go atom.element.atomic_number to get the atomic number and atom.element.symbol to get the chemical symbol.

As I understand nequip types are zero indexed and user defined. So there can be multiple types for the same chemical element. OpenMM is not going to know about this from reading in files (it's own concept of atom types and atom classes is tied to which ForceField you are using). The most flexible way would be to explicitly specify an array of length Natoms containing the nequip types of each atom and pass this to the MLP constructor, but not very user friendly

We support presently PyTorch 1.11 and 1.13 (2.0 should work once released but is untested; 1.12 is broken due to PyTorch bugs).

Ah ok I will avoid PyTorch 1.12 then

Linux-cpp-lisp · 2023-02-16T05:09:03Z

As I understand nequip types are zero indexed and user defined. So there can be multiple types for the same chemical element.

This is correct, yes.

The most flexible way would be to explicitly specify an array of length Natoms containing the nequip types of each atom and pass this to the MLP constructor, but not very user friendly

I see--- that sounds like a useful, optional choice to have, but I agree that it is an unpleasant interface for the common cases, which are best handled with a mapping. Probably best to restrict the mapping to one-to-one atomic numbers to types and let more complicated cases be handled explicitly, though, to avoid easy but hard to debug issues.

(Incidentally, the common case for nequip is to use our chemical_symbols option, where the user gives something like:

chemical_symbols:
 - H
 - C
 - O

which then sorts them by atomic number (https://github.com/mir-group/nequip/blob/main/nequip/data/transforms.py#L30-L36) and assigns consecutive atom types starting at zero. The other common case is chemical_symbol_to_type, where the mapping of atomic number to type is explicitly given. These are both train-time options, and are intentionally not propagated to deployed models, where users are expected to be explicit about their simulation setup to avoid challenging bugs for a mismatch of expectations around "reasonable" behavior.)

peastman · 2023-02-16T18:28:17Z

We can generate atom types in pretty much any way we want. The only requirement is that it has to be possible to determine them from the arguments to addForces(). As long as that's true, we can write whatever code is needed.

sef43 · 2023-02-17T12:12:23Z

I've now added PBC support, It seems to work for the minimal_toy_emt.yaml system, where my definition of "working" is that it runs a stable NPT simulation, It need to be tested it on some real systems, and It should be compared to the lammps version.

I have set the NequIP types default behaviour to be a 1 to 1 mapping from the OpenMM atomic element/symbol, with the option of passing a list that explicitly defines the zero indexed NequIP types of each atom. And conversion factors from the model units to OpenMM units need to be given when constructing the potential.

It should work on both GPU and CPU

I have put an example of running a deployed toy EMT model with PBC and NPT:
https://github.com/sef43/openmm-ml/blob/nequip/examples/run_nequip_pbc.ipynb

peastman · 2023-03-18T18:03:11Z

This code

shifts = torch.round(triu_deltas/cell_size)
wrapped_triu_deltas = triu_deltas - shifts*cell_size

is only accurate for rectangular boxes. If you want to support triclinic boxes its a little more complicated. Still not very complicated though.

sef43 · 2023-03-20T10:18:54Z

This code
shifts = torch.round(triu_deltas/cell_size)
wrapped_triu_deltas = triu_deltas - shifts*cell_size
is only accurate for rectangular boxes. If you want to support triclinic boxes its a little more complicated. Still not very complicated though.

Yes that is true, it adds some compute time. I have updated it to support OpenMM triclinic boxes. For completeness here is the benchmarks for the 3 cases:

Wrapping a triclinic box and giving to torch_nl O(n) function
Using the O(n^2) torchscript code with triclinic boxes
using NNPOps neighbour lists, and creating the full pairwise neighbour list and shift vectors from it

Updated code here: https://colab.research.google.com/drive/1Vz4QN8Y_cEUkCL2YxY86QhSruYPxfwtr?usp=sharing

@peastman this nicely demonstrates the speed of the NNPOps implementation!

once openmm/NNPOps#92 is fixed and released to the conda-forge build we can default to using NNPOps if available

Linux-cpp-lisp · 2023-03-20T21:41:25Z

Wow, very nice! What a plot...

asedova · 2023-03-28T17:41:36Z

That looks really nice!

If we want to get it into the main OpenMM-ML distribution, there are a couple of things we need to consider. First, it would be very helpful if all dependencies can be installed with conda. Otherwise we can't build and test the conda packages against them.

The other issue is more of a conceptual question. OpenMM-ML is intended to provide general purpose, ready to use models, not just model architectures. For example, you can write
potential = MLPotential('ani2x')
and immediately start simulating your own system with that potential. What you've implemented just provides support for an architecture. It requires the user to provide their own model. That's also useful, for example if you want to create a special purpose model for your particular system, but it requires a lot more work from the user. The goal of OpenMM-ML is to make ML simulations just as easy to run as ones with conventional force fields.

Are there any plans to create general purpose, ready to use models based on NequIP?

@peastman the types of systems that people using NEQUIP (or DeePMD or other such tools) are studying vary dramatically. For example our team is studying molten salts, organic solvents, ionic liquids, etc. I doubt there will be a totally generalizable general ML potential for any systems from the periodic table any time soon. Therefore, the fact that people have to train their own models is not a deal breaker by any means. We already do all that. Having a fast, Pythonic, efficient way to run the MD inference is going to be really helpful. Especially since we often need many rounds of active-learning iterative training (with potentially some enhanced sampling) before we get stuff right, like with reactive molecular systems. Here are some examples: https://www.sciencedirect.com/science/article/abs/pii/S092058612100136X
https://pubs.acs.org/doi/10.1021/acs.jctc.2c00816?ref=PDF
https://chemrxiv.org/engage/chemrxiv/article-details/638e25b2836ceb2e8b74cdf2

JMorado · 2024-04-23T15:12:48Z

Hi,

I am taking over this task from @sef43. Could you please clarify if there are any changes to the NequIP code that impact this feature? I have been working on this implementation, and it is working fine for me. The major change is that the NNPops neighbour list is now being used.

Perhaps worth discussing is the PyTorch version that is installed when following a standard installation like this:

conda install -c conda-forge openmm-torch nnpops 
pip install git+https://github.com/mir-group/nequip@develop
pip install git+https://github.com/sef43/openmm-ml@nequip

The output of conda list | grep torch gives me the following:

libtorch                  2.1.2           cuda120_h2aa5df7_303    conda-forge
openmm-torch              1.4             cuda120py311h478a873_4    conda-forge
pytorch                   2.1.2           cuda120_py311h25b6552_303    conda-forge
torch-ema                 0.3                      pypi_0    pypi
torch-runstats            0.2.0                    pypi_0    pypi
torchani                  2.2.4           cuda120py311he2766f7_3    conda-forge

From this, we can clearly see that the PyTorch version is greater than 1.11, and therefore, I see the following warning when running this example:

/home/joaomorado/software/miniconda3/envs/nq/lib/python3.11/site-packages/nequip/utils/_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see https://github.com/mir-group/nequip/discussions/311. At present we *strongly* recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.

Unfortunately, I don't think we can install the latest versions of NNPOps with PyTorch 1.11. As far as I understand, we would have to revert to NNPOps 0.2 for compatibility, which does not include the neighbor list.

What's your recommendation in this scenario? Is there any progress in making NequIP compatible with PyTorch >= 2.1.0?

Many thanks in advance.

Linux-cpp-lisp · 2024-05-01T23:10:20Z

Hi @JMorado ,

(CC @cw-tan re neighborlists)

Great to hear, thanks for your efforts on this!

The main issue that remains unclear for newer PyTorch versions is this: #311. I think it is reasonable to try latest stable 2.* versions and see if this issue is present. You can see the potential workaround (as yet not entirely proved) here (https://github.com/mir-group/nequip/blob/develop/nequip/utils/_global_options.py#L85-L107), which may be necessary to implement in the OpenMM interface as well? I see that you use
https://github.com/sef43/openmm-ml/blob/nequip/openmmml/models/nequippotential.py#L168
which should manage these global settings on its own (https://github.com/mir-group/nequip/blob/develop/nequip/scripts/deploy.py#L144). I think the main thing is just to test extended MD and see if there is unexpected behavior.

If you'd like to discuss in more detail, please feel free to email me and we can have a call.

JMorado · 2024-05-02T13:54:56Z

Hi @Linux-cpp-lisp,

Thank you for the clarifications!

Since the workaround is already implemented in _set_global_options, I think it's better for load_deployed_model to handle it instead of explicitly adding the workaround on the OpenMM-ML side.

I'm going to run a long MD simulation with one of the test systems and see how it behaves. I'll report the results later.

Linux-cpp-lisp · 2024-05-02T20:23:42Z

@JMorado I agree, as long as you are using load_deployed_model that should remain responsible for global state like this, as long as that doesn't conflict with some design principle / other code in OpenMM-ML.

Looking forward to hear whether it works!

JMorado · 2024-05-05T14:05:13Z

@Linux-cpp-lisp, I have made tests using toluene and Si. The performance looks pretty stable both in terms of GPU utilization and simulation speed:

I'm using PyTorch version 2.1.2:

 $ micromamba list | grep torch

    libtorch              2.1.2         cuda120_h2aa5df7_303       conda-forge
    openmm-torch          1.4           cuda120py311h478a873_4     conda-forge
    pytorch               2.1.2         cuda120_py311h25b6552_303  conda-forge
    torchani              2.2.4         cuda120py311he2766f7_3     conda-forge

Do you have any idea on why the Si model yields ca. four times more efficient simulations, even though the system is larger?

CalvinGe · 2024-05-10T17:42:02Z

I've now added PBC support, It seems to work for the minimal_toy_emt.yaml system, where my definition of "working" is that it runs a stable NPT simulation, It need to be tested it on some real systems, and It should be compared to the lammps version.

I have set the NequIP types default behaviour to be a 1 to 1 mapping from the OpenMM atomic element/symbol, with the option of passing a list that explicitly defines the zero indexed NequIP types of each atom. And conversion factors from the model units to OpenMM units need to be given when constructing the potential.

It should work on both GPU and CPU

I have put an example of running a deployed toy EMT model with PBC and NPT: https://github.com/sef43/openmm-ml/blob/nequip/examples/run_nequip_pbc.ipynb

Hi @sef43 ! Could you please upload this example again. It is not in the repo at the moment.

CalvinGe · 2024-05-13T22:57:48Z

@JMorado Hi! I have tried to create a mixed system for toluene in explicit water using NequIP for the toluene and TIP3P for the water. However, only the single point energy is correct, there are some weird value for forces on water molecules and the simulation quickly failed. While ANI works well. I guess there are some problems for NequIP implementaion when you setoutputforces=True. I'm using the example from here: https://github.com/sef43/openmm-ml/blob/nequip/test/test_nequip.py Could you check that?

JMorado · 2024-05-14T10:27:31Z

@CalvinGe, thank you for reporting. There is indeed an issue with the forces for periodic systems, which appears to be resolved (at least for pure ML systems) if the positions are not downcasted to float32. I am still trying to determine if it's the same type of issue causing hybrid simulations to fail, or if it's simply a matter of initial conditions. I will report back once I have reached a conclusion.

JMorado · 2024-05-14T15:06:45Z

What was causing your simulations to fail was actually a different issue. Since NequIPForce must return both energies and forces, it must do so for the whole system, not just for the ML region. Therefore, in hybrid ML/MM simulations, a final "zero-padded" forces tensor must be returned with shape $(N,3)$, where $N$ is the total number of atoms of the system, instead of a $(N_{ML},3)$ tensor. This explains why the code worked fine for pure ML systems.

The fix to this issue is here. Would you mind giving it a try? It works smoothly for me using the following script to run a simulation:

import openmm as mm
import openmm.app as app
import openmm.unit as unit
from openmmml import MLPotential
from sys import stdout

# Load the prmtop and inpcrd files
prmtop = app.AmberPrmtopFile("toluene-explicit.prm7")
inpcrd = app.AmberInpcrdFile("toluene-explicit.rst7")

# Define the atoms to be treated with the ML model
mlAtoms = list(range(15))

# Create the MM and ML/MM systems
mmSystem = prmtop.createSystem(nonbondedMethod=app.PME, nonbondedCutoff=0.9 * unit.nanometers)
potential = MLPotential("nequip", modelPath="toluene-nequip.pth", lengthScale=0.1, energyScale=4.184)
mixedSystem = potential.createMixedSystem(prmtop.topology, mmSystem, mlAtoms, interpolate=False)

# Add a barostat
mixedSystem.addForce(mm.MonteCarloBarostat(1 * unit.atmosphere, 300 * unit.kelvin, 25))

# Define the integrator, platform, and simulation
integrator = mm.LangevinMiddleIntegrator(300 * unit.kelvin, 1.0 / unit.picoseconds, 1 * unit.femtoseconds)
platform = mm.Platform.getPlatformByName("CUDA")
simulation = app.Simulation(prmtop.topology, mixedSystem, integrator, platform)

# Set the positions and velocities
simulation.context.setPositions(inpcrd.positions)
simulation.minimizeEnergy()
simulation.context.setVelocitiesToTemperature(300 * unit.kelvin)

# Run the simulation
simulation.reporters.append(
    app.StateDataReporter(
        stdout, 100, step=True, potentialEnergy=True, temperature=True, speed=True
    )
)

simulation.step(1000)

CalvinGe · 2024-05-14T15:17:45Z

@JMorado It works well for my test! The forces for MM region in ML/MM now also matches with the pure MM forces. Thank you very much!

Linux-cpp-lisp · 2024-05-15T03:21:25Z

Just to clarify, this means that the NequIP OpenMM integration supports ML/MM? Are there specific pieces of setup necessary to run in that mode?

Linux-cpp-lisp · 2024-05-15T03:23:21Z

@JMorado

Do you have any idea on why the Si model yields ca. four times more efficient simulations, even though the system is larger?

According to your plot this is a NequIP and an Allegro model, so I would not expect them to be directly comparable.

JMorado · 2024-05-15T12:06:49Z

Just to clarify, this means that the NequIP OpenMM integration supports ML/MM? Are there specific pieces of setup necessary to run in that mode?

Yes, it does support hybrid ML/MM simulations. There aren't really any specific pieces of setup required, just the standard way of first creating the MM and MLPotential, defining the ML atoms, and then setting up the mixed system. The procedure is described here in more detail.

marvcks · 2024-05-28T04:50:19Z

Just to clarify, this means that the NequIP OpenMM integration supports ML/MM? Are there specific pieces of setup necessary to run in that mode?

Yes, it does support hybrid ML/MM simulations. There aren't really any specific pieces of setup required, just the standard way of first creating the MM and MLPotential, defining the ML atoms, and then setting up the mixed system. The procedure is described here in more detail.

How do you deal with the interaction between ML and MM atoms?

JMorado · 2024-05-28T09:16:06Z

Just to clarify, this means that the NequIP OpenMM integration supports ML/MM? Are there specific pieces of setup necessary to run in that mode?

Yes, it does support hybrid ML/MM simulations. There aren't really any specific pieces of setup required, just the standard way of first creating the MM and MLPotential, defining the ML atoms, and then setting up the mixed system. The procedure is described here in more detail.

How do you deal with the interaction between ML and MM atoms?

Mechanical embedding is used, meaning that the interaction between ML and MM atoms is treated at the MM level using standard Coulomb and Lennard-Jones 12-6 potentials if you're using class I FFs.

asoyemi1 · 2024-09-04T19:14:15Z

Hi! I was wondering if one could add some padding to the cutoff distance when building the neighbor list in a similar way as done in LAMMPS when running Nequip in OpenMM.

JMorado · 2024-09-04T20:58:57Z

Hi! I was wondering if one could add some padding to the cutoff distance when building the neighbor list in a similar way as done in LAMMPS when running Nequip in OpenMM.

What do you exacly mean by adding some padding to the cutoff distance when building the neighbours list? I'm not familiar with the LAMMPS implementation.

asoyemi1 · 2024-09-04T21:12:41Z

@JMorado my understanding of what is done in LAMMPS and in Openmm is that adding some extra distance to the cutoff for building the neighbor list for nonbonded interactions lets you build the neighbor list less often and thus push the performance up a little. It seems to me, that this parameter is hardcoded in OpenMM but I was wondering if there's room to optimize on the Nequip side?

peastman · 2024-09-05T01:38:39Z

I don't think you would want to do that. With a point charge force field, building the neighbor list is relatively expensive, so it's worth computing a few extra nonbonded interactions at each step if it lets you rebuild the neighbor list less often. But computing a NequIP model is far more expensive than a simple Coulomb interaction. The cost of the extra interactions would almost certainly outweigh any savings to building the neighbor list.

JMorado · 2024-09-05T10:21:56Z

@peastman, I don't think we would need to calculate the additional interactions using the MLP itself. Instead, we could calculate the list of neighbors every X iterations and, based on that list, recalculate the distances at each iteration. We would then filter out the pairs of edges with interactions outside the cutoff and calculate the shifts for those. That said, the MLP calculation would still be the bottleneck, so I'm not entirely sure that this approach would be useful.

Based on the benchmark shown earlier in the thread, the NNPOps neighbor list implementation is quite performant and system-size independent. I don’t think much would be gained by skipping the NL calculation. Actually, the NNPOps function already provides options to skip the NL list entirely or to choose a maximum number of pairs, so perhaps exposing control of that to the user in the openmm-ml API could be an easy way to test if skipping the NL calculation would lead to any performance improvement in an ideal scenario.

CalvinGe · 2024-09-22T02:49:33Z

@CalvinGe, thank you for reporting. There is indeed an issue with the forces for periodic systems, which appears to be resolved (at least for pure ML systems) if the positions are not downcasted to float32. I am still trying to determine if it's the same type of issue causing hybrid simulations to fail, or if it's simply a matter of initial conditions. I will report back once I have reached a conclusion.

@JMorado I am using the code for periodic systems and wondering if you could elaborate more on the issue with the forces for periodic systems (will these forces be not reliable in periodic systems)? I am also wondering if it is still necessary to avoid downcasting the positions to float32? Besides, will this issue also be avoided in the same way for hybrid ML/MM systems?

JMorado · 2024-09-22T10:12:53Z

@CalvinGe, they are reliable. Initially, I mistakenly assumed this was the issue, but if you read the comment next to that, I explain that the real cause of your simulation failures was actually a different issue.

The positions are not being downcasted to float32. On the OpenMM side, the precision used for the positions is determined by the settings you choose for your simulations ('mixed' and 'double' precision use float64 for positions). The predictions of NequiP should also be float64 (see this), although there are some numerical caveats to consider if you need to compare results between different software, such as ASE and OpenMM.

For hybrid ML/MM systems it should work just as fine.

CalvinGe · 2024-09-22T13:21:46Z

@CalvinGe, they are reliable. Initially, I mistakenly assumed this was the issue, but if you read the comment next to that, I explain that the real cause of your simulation failures was actually a different issue.

The positions are not being downcasted to float32. On the OpenMM side, the precision used for the positions is determined by the settings you choose for your simulations ('mixed' and 'double' precision use float64 for positions). The predictions of NequiP should also be float64 (see this), although there are some numerical caveats to consider if you need to compare results between different software, such as ASE and OpenMM.

For hybrid ML/MM systems it should work just as fine.

@JMorado Thanks! Yes, I know that my previous issue was about padded forces for the whole system. But I am still wondering if it is necessary to avoid downcasting positions (and NequIP prediction) to float32 for periodic system? Would it be an issue if I am actually using float32?

JMorado · 2024-09-24T11:42:16Z

I do not know all the details of your system, but typically it is acceptable to calculate forces in single precision while integration is done in double precision. Generally, it's not recommended to perform all computations in single precision, as you can easily lose accuracy. That said, I'm not the best person to comment on the numerical stability of NequIP models. If you refer back to the link associated with my previous response, you'll find some information about how the model internally handles precision issues, even for float32 models.

asedova added the enhancement New feature or request label Dec 22, 2022

Linux-cpp-lisp changed the title ~~🌟 [FEATURE]~~ 🌟 [FEATURE] OpenMM Dec 23, 2022

asedova mentioned this issue Dec 23, 2022

Feature request: Interface with NEQUIP openmm/openmm-ml#48

Open

Linux-cpp-lisp mentioned this issue Jul 11, 2023

how can i conbine it with gromacs? mir-group/allegro#50

Closed

sef43 mentioned this issue Oct 4, 2023

Add support for NequIP models openmm/openmm-ml#60

Open

🌟 [FEATURE] OpenMM #288

🌟 [FEATURE] OpenMM #288

Comments

asedova commented Dec 22, 2022

asedova commented Dec 22, 2022

Linux-cpp-lisp commented Dec 23, 2022 • edited Loading

Linux-cpp-lisp commented Dec 23, 2022

asedova commented Dec 23, 2022

Linux-cpp-lisp commented Dec 23, 2022

peastman commented Dec 23, 2022

Linux-cpp-lisp commented Dec 23, 2022 • edited Loading

asedova commented Dec 23, 2022 • edited Loading

Linux-cpp-lisp commented Dec 23, 2022

asedova commented Dec 23, 2022

peastman commented Dec 23, 2022

Linux-cpp-lisp commented Dec 23, 2022

Linux-cpp-lisp commented Feb 6, 2023

sef43 commented Feb 13, 2023

asedova commented Feb 13, 2023

Linux-cpp-lisp commented Feb 13, 2023 • edited Loading

Linux-cpp-lisp commented Feb 13, 2023

asedova commented Feb 14, 2023 • edited Loading

Linux-cpp-lisp commented Feb 14, 2023 • edited Loading

sef43 commented Feb 14, 2023

Linux-cpp-lisp commented Feb 14, 2023 • edited Loading

sef43 commented Feb 15, 2023 • edited Loading

Linux-cpp-lisp commented Feb 15, 2023

sef43 commented Feb 15, 2023

Linux-cpp-lisp commented Feb 16, 2023

peastman commented Feb 16, 2023

sef43 commented Feb 17, 2023

peastman commented Mar 18, 2023

sef43 commented Mar 20, 2023

Linux-cpp-lisp commented Mar 20, 2023

asedova commented Mar 28, 2023

JMorado commented Apr 23, 2024 • edited Loading

Linux-cpp-lisp commented May 1, 2024

JMorado commented May 2, 2024

Linux-cpp-lisp commented May 2, 2024

JMorado commented May 5, 2024

CalvinGe commented May 10, 2024

CalvinGe commented May 13, 2024 • edited Loading

JMorado commented May 14, 2024

JMorado commented May 14, 2024 • edited Loading

CalvinGe commented May 14, 2024

Linux-cpp-lisp commented May 15, 2024

Linux-cpp-lisp commented May 15, 2024

JMorado commented May 15, 2024 • edited Loading

marvcks commented May 28, 2024

JMorado commented May 28, 2024 • edited Loading

asoyemi1 commented Sep 4, 2024

JMorado commented Sep 4, 2024

asoyemi1 commented Sep 4, 2024

peastman commented Sep 5, 2024

JMorado commented Sep 5, 2024

CalvinGe commented Sep 22, 2024

JMorado commented Sep 22, 2024

CalvinGe commented Sep 22, 2024

JMorado commented Sep 24, 2024

Linux-cpp-lisp commented Dec 23, 2022 •

edited

Loading

Linux-cpp-lisp commented Dec 23, 2022 •

edited

Loading

asedova commented Dec 23, 2022 •

edited

Loading

Linux-cpp-lisp commented Feb 13, 2023 •

edited

Loading

asedova commented Feb 14, 2023 •

edited

Loading

Linux-cpp-lisp commented Feb 14, 2023 •

edited

Loading

Linux-cpp-lisp commented Feb 14, 2023 •

edited

Loading

sef43 commented Feb 15, 2023 •

edited

Loading

JMorado commented Apr 23, 2024 •

edited

Loading

CalvinGe commented May 13, 2024 •

edited

Loading

JMorado commented May 14, 2024 •

edited

Loading

JMorado commented May 15, 2024 •

edited

Loading

JMorado commented May 28, 2024 •

edited

Loading