Setting n_workers != 1 in alanine_dipeptide_basics causes process to continue indefinitely #35

abbielear · 2021-11-23T15:38:54Z

Hi all, I've been to reproduce your work in your alanine_dipeptide_basics notebook by running it as a .py from terminal and I had removed the n_workers=1 argument from the energy model to speed up the KLL training section but I've noticed that the program never finishes in this format. I get the same sorts of output as from the notebook - all the graphs appear but the process must be cancelled manually (keyboard interrupt) to get it to finish. From the traceback I get, it looks like the issue is caused by worker processes still existing, traceback points to:
bgflow/distribution/energy/openmm.py", line 380, in run for task in iter(self._task_queue.get, None):
and then to:
python3.7/multiprocessing/queues.py

I've been trying to determine if this behaviour alters the performance of the generator but I'm unsure how to solve the problem, I've not had much experience with multiprocessing. Do you have any ideas about what could be causing the issue?

I'm running this using CUDA version 11.0, driver version 450.51.06 on a GeForce GTX 980 Ti,
cudatoolkit 10.2.89
Python 3.7,
pytorch 1.9.1

@JenkeScheen @jmichel80

The text was updated successfully, but these errors were encountered:

jonkhler · 2022-01-14T13:50:21Z

Hi, sorry for the late reply. Just saw that now. This is a known issue that we have yet not fixed (but plan to). It seems to be a race-condition / dead-lock issue. For now, it is best, to keep n_workers=1.

thelostscout · 2023-11-14T16:17:39Z

Any updates on this? When doing energy based training, the energy evaluation is the slowest part, so it would be really helpful if we could use more workers

jonkhler assigned jonkhler and Olllom and unassigned jonkhler Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting n_workers != 1 in alanine_dipeptide_basics causes process to continue indefinitely #35

Setting n_workers != 1 in alanine_dipeptide_basics causes process to continue indefinitely #35

abbielear commented Nov 23, 2021

jonkhler commented Jan 14, 2022

thelostscout commented Nov 14, 2023

Setting n_workers != 1 in alanine_dipeptide_basics causes process to continue indefinitely #35

Setting n_workers != 1 in alanine_dipeptide_basics causes process to continue indefinitely #35

Comments

abbielear commented Nov 23, 2021

jonkhler commented Jan 14, 2022

thelostscout commented Nov 14, 2023