Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting n_workers != 1 in alanine_dipeptide_basics causes process to continue indefinitely #35

Open
abbielear opened this issue Nov 23, 2021 · 2 comments
Assignees

Comments

@abbielear
Copy link

Hi all, I've been to reproduce your work in your alanine_dipeptide_basics notebook by running it as a .py from terminal and I had removed the n_workers=1 argument from the energy model to speed up the KLL training section but I've noticed that the program never finishes in this format. I get the same sorts of output as from the notebook - all the graphs appear but the process must be cancelled manually (keyboard interrupt) to get it to finish. From the traceback I get, it looks like the issue is caused by worker processes still existing, traceback points to:
bgflow/distribution/energy/openmm.py", line 380, in run for task in iter(self._task_queue.get, None):
and then to:
python3.7/multiprocessing/queues.py

I've been trying to determine if this behaviour alters the performance of the generator but I'm unsure how to solve the problem, I've not had much experience with multiprocessing. Do you have any ideas about what could be causing the issue?

I'm running this using CUDA version 11.0, driver version 450.51.06 on a GeForce GTX 980 Ti,
cudatoolkit 10.2.89
Python 3.7,
pytorch 1.9.1

@JenkeScheen @jmichel80

@jonkhler jonkhler assigned jonkhler and Olllom and unassigned jonkhler Jan 14, 2022
@jonkhler
Copy link
Collaborator

Hi, sorry for the late reply. Just saw that now. This is a known issue that we have yet not fixed (but plan to). It seems to be a race-condition / dead-lock issue. For now, it is best, to keep n_workers=1.

@thelostscout
Copy link

Any updates on this? When doing energy based training, the energy evaluation is the slowest part, so it would be really helpful if we could use more workers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants