Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training step failed #25

Open
sli259 opened this issue Mar 29, 2022 · 0 comments
Open

Training step failed #25

sli259 opened this issue Mar 29, 2022 · 0 comments

Comments

@sli259
Copy link

sli259 commented Mar 29, 2022

After successfully run the tutorial, I built a small system to test (C10H22). All the pre-steps run without problem (the delta-force preparation and embedding preparation).
When I start to train the network I got the following error. Can anyone take a look at this and give me some suggestions where I made a mistake?

0 | model | TorchMD_Net | 276 K
--------------------------------------
276 K     Trainable params
0         Non-trainable params
276 K     Total params
1.107     Total estimated model params size (MB)
/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch_geometric/deprecation.py:12: UserWarning: 'data.DataLoader' is deprecated, use 'loader.DataLoader' instead
  warnings.warn(out)
/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 1, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Traceback (most recent call last):
  File "/scratch/user/sli259/torchmd/torchmd-net/scripts/train.py", line 173, in <module>
    main()
  File "/scratch/user/sli259/torchmd/torchmd-net/scripts/train.py", line 166, in main
    trainer.fit(model, data)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
    self._run(model)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
    self.dispatch()
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
    self.accelerator.start_training(self)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
    self._results = trainer.run_stage()
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
    return self.run_train()
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 842, in run_train
    self.run_sanity_check(self.lightning_module)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1107, in run_sanity_check
    self.run_evaluation()
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 962, in run_evaluation
    output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 174, in evaluation_step
    output = self.trainer.accelerator.validation_step(args)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 226, in validation_step
    return self.training_type_plugin.validation_step(*args)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 322, in validation_step
    return self.model(*args, **kwargs)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 965, in forward
    output = self.module(*inputs, **kwargs)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 57, in forward
    output = self.module.validation_step(*inputs, **kwargs)
  File "/scratch/user/sli259/torchmd/torchmd-net/torchmdnet/module.py", line 64, in validation_step
    return self.step(batch, mse_loss, "val")
  File "/scratch/user/sli259/torchmd/torchmd-net/torchmdnet/module.py", line 75, in step
    pred, deriv = self(batch.z, batch.pos, batch=batch.batch,
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/user/sli259/torchmd/torchmd-net/torchmdnet/module.py", line 56, in forward
    return self.model(z, pos, batch=batch, q=q, s=s)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/user/sli259/torchmd/torchmd-net/torchmdnet/models/model.py", line 170, in forward
    x, v, z, pos, batch = self.representation_model(z, pos, batch, q=q, s=s)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/user/sli259/torchmd/torchmd-net/torchmdnet/models/torchmd_gn.py", line 162, in forward
    x = x + interaction(x, edge_index, edge_weight, edge_attr)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/user/sli259/torchmd/torchmd-net/torchmdnet/models/torchmd_gn.py", line 224, in forward
    x = self.conv(x, edge_index, edge_weight, edge_attr)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/job.3878131/sli259_pyg/tmpf0jhxsc9.py", line 182, in forward
    W = self.net(edge_attr) * C.view(-1, 1)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/scratch/user/sli259/Anaconda3/2020.07/envs/torchmd/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Double

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant