Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issue in simulation Step #21

Open
steven60411 opened this issue Dec 17, 2021 · 11 comments
Open

Some issue in simulation Step #21

steven60411 opened this issue Dec 17, 2021 · 11 comments

Comments

@steven60411
Copy link

steven60411 commented Dec 17, 2021

Hi!
During the simulation step, we have this issue
image
I've noticed in the old torchmdnet legacy there is a file called torchmdcalc.py in /torchmdnet/nnp/calculators/
but in the updated one the repo and file name have been changed
I found a file called calculators.py in new torchmdnet repo and assume that it has the same function as torchmdcal.py
After changing the module line in simulation.yaml it shows:
image
Don't know why there is missing file, help! :(

@MaciejMajew
Copy link
Contributor

Hi!
data/train_fulldata/epoch=80.ckp is a pretrained network that can be downloaded from:
wget pub.htmd.org/torchMD_tutorial_data.tar.gz
It is explained in the second cell of the tutorial
https://github.com/torchmd/torchmd-cg/blob/master/tutorial/Chignolin_Coarse-Grained_Tutorial.ipynb
The calculator can be used with any network that you trained using a selected repo.

@hansen7
Copy link

hansen7 commented Jan 24, 2022

Thanks @steven60411. I've also changed is to the module: torchmdnet.calculators

Hi @MaciejMajew ! I can't find the .ckpt file in the provided address, it is either included here: https://github.com/torchmd/torchmd-net/tree/main/examples/pretrained

@MaciejMajew
Copy link
Contributor

Thanks for noticing the missing file. *tar.gz file have been updated. You can find the file by executing the first cells of tutorial:

wget pub.htmd.org/torchMD_tutorial_data.tar.gz
tar -xvf torchMD_tutorial_data.tar.gz

@hansen7
Copy link

hansen7 commented Jan 29, 2022

@MaciejMajew Thanks providing the pre-training weights. I do encounter a few issues in the following steps:

  1. Simulation from pre-trained weights:
(torchmd) [hanchen@qpu-4v100-32g0003 tutorial]$ python ./torchmd/torchmd/run.py --conf simulate.yaml
Force terms:  ['Bonds', 'RepulsionCG']
Traceback (most recent call last):
  File "./torchmd/torchmd/run.py", line 154, in <module>
    mol, system, forces = setup(args)
  File "./torchmd/torchmd/run.py", line 106, in setup
    external = externalmodule.External(args.external["file"], embeddings, device)
  File "/home/hanchen/torch_md/torchmd-cg/tutorial/torchmd-net/torchmdnet/calculators.py", line 7, in __init__
    self.model = load_model(netfile, device=device, derivative=True)
  File "/home/hanchen/torch_md/torchmd-cg/tutorial/torchmd-net/torchmdnet/models/model.py", line 106, in load_model
    model = create_model(args)
  File "/home/hanchen/torch_md/torchmd-cg/tutorial/torchmd-net/torchmdnet/models/model.py", line 15, in create_model
    hidden_channels=args["embedding_dimension"],
KeyError: 'embedding_dimension'

I tried to use the models arguments from ANI1-equivariant_transformer or transformer, then the model params and size are missing:

Missing key(s) in state_dict: "mean", "std", "representation_model.embedding.weight", "representation_model.distance_expansion.means", "representation_model.distance_expansion.betas", "representation_model.neighbor_embedding.embedding.weight", "representation_model.neighbor_embedding.distance_proj.weight", "representation_model.neighbor_embedding.distance_proj.bias", "representation_model.neighbor_embedding.combine.weight", "representation_model.neighbor_embedding.combine.bias", "representation_model.attention_layers.0.layernorm.weight", "representation_model.attention_layers.0.layernorm.bias", "representation_model.attention_layers.0.q_proj.weight", "representation_model.attention_layers.0.q_proj.bias", "representation_model.attention_layers.0.k_proj.weight", "representation_model.attention_layers.0.k_proj.bias", "representation_model.attention_layers.0.v_proj.weight", "representation_model.attention_layers.0.v_proj.bias", "representation_model.attention_layers.0.o_proj.weight", "representation_model.attention_layers.0.o_proj.bias", "representation_model.attention_layers.0.dk_proj.weight", "representation_model.attention_layers.0.dk_proj.bias", "representation_model.attention_layers.0.dv_proj.weight", "representation_model.attention_layers.0.dv_proj.bias", "representation_model.attention_layers.1.layernorm.weight", "representation_model.attention_layers.1.layernorm.bias", "representation_model.attention_layers.1.q_proj.weight", "representation_model.attention_layers.1.q_proj.bias", "representation_model.attention_layers.1.k_proj.weight", "representation_model.attention_layers.1.k_proj.bias", "representation_model.attention_layers.1.v_proj.weight", "representation_model.attention_layers.1.v_proj.bias", "representation_model.attention_layers.1.o_proj.weight", "representation_model.attention_layers.1.o_proj.bias", "representation_model.attention_layers.1.dk_proj.weight", "representation_model.attention_layers.1.dk_proj.bias", "representation_model.attention_layers.1.dv_proj.weight", "representation_model.attention_layers.1.dv_proj.bias", "representation_model.attention_layers.2.layernorm.weight", "representation_model.attention_layers.2.layernorm.bias", "representation_model.attention_layers.2.q_proj.weight", "representation_model.attention_layers.2.q_proj.bias", "representation_model.attention_layers.2.k_proj.weight", "representation_model.attention_layers.2.k_proj.bias", "representation_model.attention_layers.2.v_proj.weight", "representation_model.attention_layers.2.v_proj.bias", "representation_model.attention_layers.2.o_proj.weight", "representation_model.attention_layers.2.o_proj.bias", "representation_model.attention_layers.2.dk_proj.weight", "representation_model.attention_layers.2.dk_proj.bias", "representation_model.attention_layers.2.dv_proj.weight", "representation_model.attention_layers.2.dv_proj.bias", "representation_model.attention_layers.3.layernorm.weight", "representation_model.attention_layers.3.layernorm.bias", "representation_model.attention_layers.3.q_proj.weight", "representation_model.attention_layers.3.q_proj.bias", "representation_model.attention_layers.3.k_proj.weight", "representation_model.attention_layers.3.k_proj.bias", "representation_model.attention_layers.3.v_proj.weight", "representation_model.attention_layers.3.v_proj.bias", "representation_model.attention_layers.3.o_proj.weight", "representation_model.attention_layers.3.o_proj.bias", "representation_model.attention_layers.3.dk_proj.weight", "representation_model.attention_layers.3.dk_proj.bias", "representation_model.attention_layers.3.dv_proj.weight", "representation_model.attention_layers.3.dv_proj.bias", "representation_model.attention_layers.4.layernorm.weight", "representation_model.attention_layers.4.layernorm.bias", "representation_model.attention_layers.4.q_proj.weight", "representation_model.attention_layers.4.q_proj.bias", "representation_model.attention_layers.4.k_proj.weight", "representation_model.attention_layers.4.k_proj.bias", "representation_model.attention_layers.4.v_proj.weight", "representation_model.attention_layers.4.v_proj.bias", "representation_model.attention_layers.4.o_proj.weight", "representation_model.attention_layers.4.o_proj.bias", "representation_model.attention_layers.4.dk_proj.weight", "representation_model.attention_layers.4.dk_proj.bias", "representation_model.attention_layers.4.dv_proj.weight", "representation_model.attention_layers.4.dv_proj.bias", "representation_model.attention_layers.5.layernorm.weight", "representation_model.attention_layers.5.layernorm.bias", "representation_model.attention_layers.5.q_proj.weight", "representation_model.attention_layers.5.q_proj.bias", "representation_model.attention_layers.5.k_proj.weight", "representation_model.attention_layers.5.k_proj.bias", "representation_model.attention_layers.5.v_proj.weight", "representation_model.attention_layers.5.v_proj.bias", "representation_model.attention_layers.5.o_proj.weight", "representation_model.attention_layers.5.o_proj.bias", "representation_model.attention_layers.5.dk_proj.weight", "representation_model.attention_layers.5.dk_proj.bias", "representation_model.attention_layers.5.dv_proj.weight", "representation_model.attention_layers.5.dv_proj.bias", "representation_model.attention_layers.6.layernorm.weight", "representation_model.attention_layers.6.layernorm.bias", "representation_model.attention_layers.6.q_proj.weight", "representation_model.attention_layers.6.q_proj.bias", "representation_model.attention_layers.6.k_proj.weight", "representation_model.attention_layers.6.k_proj.bias", "representation_model.attention_layers.6.v_proj.weight", "representation_model.attention_layers.6.v_proj.bias", "representation_model.attention_layers.6.o_proj.weight", "representation_model.attention_layers.6.o_proj.bias", "representation_model.attention_layers.6.dk_proj.weight", "representation_model.attention_layers.6.dk_proj.bias", "representation_model.attention_layers.6.dv_proj.weight", "representation_model.attention_layers.6.dv_proj.bias", "representation_model.attention_layers.7.layernorm.weight", "representation_model.attention_layers.7.layernorm.bias", "representation_model.attention_layers.7.q_proj.weight", "representation_model.attention_layers.7.q_proj.bias", "representation_model.attention_layers.7.k_proj.weight", "representation_model.attention_layers.7.k_proj.bias", "representation_model.attention_layers.7.v_proj.weight", "representation_model.attention_layers.7.v_proj.bias", "representation_model.attention_layers.7.o_proj.weight", "representation_model.attention_layers.7.o_proj.bias", "representation_model.attention_layers.7.dk_proj.weight", "representation_model.attention_layers.7.dk_proj.bias", "representation_model.attention_layers.7.dv_proj.weight", "representation_model.attention_layers.7.dv_proj.bias", "representation_model.out_norm.weight", "representation_model.out_norm.bias", "output_model.output_network.0.weight", "output_model.output_network.0.bias", "output_model.output_network.2.weight", "output_model.output_network.2.bias", "prior_model.initial_atomref", "prior_model.atomref.weight"
Unexpected key(s) in state_dict: "representation.embedding.weight", "representation.distance_expansion.width", "representation.distance_expansion.offsets", "representation.interactions.0.filter_network.0.weight", "representation.interactions.0.filter_network.0.bias", "representation.interactions.0.filter_network.1.weight", "representation.interactions.0.filter_network.1.bias", "representation.interactions.0.cutoff_network.cutoff", "representation.interactions.0.cfconv.in2f.weight", "representation.interactions.0.cfconv.f2out.weight", "representation.interactions.0.cfconv.f2out.bias", "representation.interactions.0.cfconv.filter_network.0.weight", "representation.interactions.0.cfconv.filter_network.0.bias", "representation.interactions.0.cfconv.filter_network.1.weight", "representation.interactions.0.cfconv.filter_network.1.bias", "representation.interactions.0.cfconv.cutoff_network.cutoff", "representation.interactions.0.dense.weight", "representation.interactions.0.dense.bias", "representation.interactions.1.filter_network.0.weight", "representation.interactions.1.filter_network.0.bias", "representation.interactions.1.filter_network.1.weight", "representation.interactions.1.filter_network.1.bias", "representation.interactions.1.cutoff_network.cutoff", "representation.interactions.1.cfconv.in2f.weight", "representation.interactions.1.cfconv.f2out.weight", "representation.interactions.1.cfconv.f2out.bias", "representation.interactions.1.cfconv.filter_network.0.weight", "representation.interactions.1.cfconv.filter_network.0.bias", "representation.interactions.1.cfconv.filter_network.1.weight", "representation.interactions.1.cfconv.filter_network.1.bias", "representation.interactions.1.cfconv.cutoff_network.cutoff", "representation.interactions.1.dense.weight", "representation.interactions.1.dense.bias", "representation.interactions.2.filter_network.0.weight", "representation.interactions.2.filter_network.0.bias", "representation.interactions.2.filter_network.1.weight", "representation.interactions.2.filter_network.1.bias", "representation.interactions.2.cutoff_network.cutoff", "representation.interactions.2.cfconv.in2f.weight", "representation.interactions.2.cfconv.f2out.weight", "representation.interactions.2.cfconv.f2out.bias", "representation.interactions.2.cfconv.filter_network.0.weight", "representation.interactions.2.cfconv.filter_network.0.bias", "representation.interactions.2.cfconv.filter_network.1.weight", "representation.interactions.2.cfconv.filter_network.1.bias", "representation.interactions.2.cfconv.cutoff_network.cutoff", "representation.interactions.2.dense.weight", "representation.interactions.2.dense.bias", "output_modules.0.out_net.1.out_net.0.weight", "output_modules.0.out_net.1.out_net.0.bias", "output_modules.0.out_net.1.out_net.1.weight", "output_modules.0.out_net.1.out_net.1.bias", "output_modules.0.standardize.mean", "output_modules.0.standardize.stddev".
  1. Training from scratch (using python torchmd-net/scripts/train.py -c train.yaml) also encounters other issues:
Traceback (most recent call last):
  File "torchmd-net/scripts/train.py", line 171, in <module>
    main()
  File "torchmd-net/scripts/train.py", line 110, in main
    args = get_args()
  File "torchmd-net/scripts/train.py", line 92, in get_args
    args = parser.parse_args()
  File "/home/hanchen/anaconda3/envs/torchmd/lib/python3.8/argparse.py", line 1768, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "/home/hanchen/anaconda3/envs/torchmd/lib/python3.8/argparse.py", line 1800, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "/home/hanchen/anaconda3/envs/torchmd/lib/python3.8/argparse.py", line 2006, in _parse_known_args
    start_index = consume_optional(start_index)
  File "/home/hanchen/anaconda3/envs/torchmd/lib/python3.8/argparse.py", line 1946, in consume_optional
    take_action(action, args, option_string)
  File "/home/hanchen/anaconda3/envs/torchmd/lib/python3.8/argparse.py", line 1874, in take_action
    action(self, namespace, argument_values, option_string)
  File "/home/hanchen/torch_md/torchmd-cg/tutorial/torchmd-net/torchmdnet/utils.py", line 105, in __call__
    raise ValueError(f"Unknown argument in config file: {key}")
ValueError: Unknown argument in config file: coords

Would you mind taking a look at this?

@MaciejMajew
Copy link
Contributor

  1. We had some major updates to the code recently, so the old training code was moved to a separate repo: torchmd-net-legacy:
    https://github.com/torchmd/torchmd-net-legacy

The net shared here was trained using the old code, therefore you should use a compatible calculator to simulate that:
https://github.com/torchmd/torchmd-net-legacy/blob/main/torchmdnet/nnp/calculators/torchmdcalc.py

If you use the old code you should execute the tutorial without any issue.

  1. Similar to point 1, in this case train.yaml file is not compatible with the new version of torchmd-net. We changed the names of a few aruments (eg. coords -> coord_files). You can see the example or a new format here:
    https://github.com/torchmd/torchmd-net/blob/main/examples/example.yaml
    and a list of all available args here:
    https://github.com/torchmd/torchmd-net/blob/main/scripts/train.py

The new train.yaml should look like this:

batch_size: 1024
dataset: Custom
coord_files: "data/chignolin_ca_coords.npy"
embed_files: "data/chignolin_ca_embeddings.npy"
force_files: "data/chignolin_ca_deltaforces.npy"
cutoff_upper: 9.0
derivative: true
distributed_backend: ddp
early_stopping_patience: 30
embedding_dimension: 128
log_dir: train_light
lr: 0.0001
lr_factor: 0.8
lr_min: 1.0e-06
lr_patience: 4
lr_warmup_steps: 0
max_num_neighbors: 128
model: graph-network
ngpus: 4
num_epochs: 200
num_layers: 3
num_nodes: 1
num_rbf: 150
num_workers: 8
rbf_type: gauss
save_interval: 1
seed: 1
test_interval: 1
test_size: 0.1
val_size: 0.05

Sorry for the inconvenience, I will try to update torchmd-cg repo and the tutorial as soon as possible. Until then I will put a warning.

@hansen7
Copy link

hansen7 commented Feb 6, 2022

@MaciejMajew Hi, Maciej. Thank you so much for the clarification!

I can now use the current version of tutorial (with newly provided pre-trained weights} + torchmd-net-legacy to run the simulation, the training still has issues, but my issues are resolved!

@MaciejMajew
Copy link
Contributor

I'm glad I could help.

What is the error you get with training?

@steven60411
Copy link
Author

Hi! I'm back
@hansen7 Would you mind teaching me how to do the simulation with torchmd-net-legacy?
Because mine will keep showing

Force terms: ['Bonds', 'RepulsionCG'] Traceback (most recent call last): File "/gscratch/pfaendtner/yuqifang/torchmd/torchmd/run.py", line 151, in <module> mol, system, forces = setup(args) File "/gscratch/pfaendtner/yuqifang/torchmd/torchmd/run.py", line 102, in setup externalmodule = importlib.import_module(args.external["module"]) File "/gscratch/pfaendtner/yuqifang/codes/miniconda/envs/torchmd_cg1/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1030, in _gcd_import File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1030, in _gcd_import File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1030, in _gcd_import File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked ModuleNotFoundError: No module named 'torchmdnet.nnp'

I've git clone the "torchmd-net-legacy" repo, is there any change I need to do in any file?

@MaciejMajew
Copy link
Contributor

It looks like you need to export the path of "torchmd-net-legacy" to your pythonpath:

export PYTHONPATH="PATH/torchmd-net-legacy"

or edit it directly in your bashrc: https://bic-berkeley.github.io/psych-214-fall-2016/using_pythonpath.html
If you don't do that torchmd is not able to find the external calculator that handles the NNP

@steven60411
Copy link
Author

@MaciejMajew Thanks! I'll try it later

@daniel1991zy
Copy link

batch_size: 1024
dataset: Custom
coord_files: "data/chignolin_ca_coords.npy"
embed_files: "data/chignolin_ca_embeddings.npy"
force_files: "data/chignolin_ca_deltaforces.npy"
cutoff_upper: 9.0
derivative: true
distributed_backend: ddp
early_stopping_patience: 30
embedding_dimension: 128
log_dir: train_light
lr: 0.0001
lr_factor: 0.8
lr_min: 1.0e-06
lr_patience: 4
lr_warmup_steps: 0
max_num_neighbors: 128
model: graph-network
ngpus: 4
num_epochs: 200
num_layers: 3
num_nodes: 1
num_rbf: 150
num_workers: 8
rbf_type: gauss
save_interval: 1
seed: 1
test_interval: 1
test_size: 0.1
val_size: 0.05

I have use this file to train a forcefield and it succeed. However, we use the force field in the simulate.yaml, it still errors "KeyError: 'embedding_dimension". Can you give me a updated simulate.yaml file? In adddtion, i look forward to the new mannual of TorchMD.
Append my simulte.yaml:
device: cpu
log_dir: sim_80ep_350K
output: output
topology: /home/daizhongyang/notebook_file/tormd_cg_test/data/chignolin_ca_top.psf
coordinates: /home/daizhongyang/notebook_file/tormd_cg_test/data/chignolin_ca_initial_coords.xtc
replicas: 10
forcefield: /home/daizhongyang/notebook_file/tormd_cg_test/data/chignolin_priors_fulldata.yaml
forceterms:

  • Bonds
  • RepulsionCG
    external:
    module: torchmdnet.calculators
    embeddings: [ 4, 4, 5, 8, 6, 13, 2, 13, 7, 4]
    file: /home/daizhongyang/notebook_file/tormd_cg_test/data/train_fulldata/epoch=80.ckpt
    langevin_gamma: 1
    langevin_temperature: 350
    temperature: 350
    precision: double
    seed: 1
    output_period: 1000
    save_period: 1000
    steps: 10000000
    timestep: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants