Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing raises a RuntimeError when using Python >=3.8 on MacOS #24

Open
samecutler opened this issue Dec 2, 2022 · 5 comments

Comments

@samecutler
Copy link
Contributor

On Mac systems using Python 3.8 and later, the process start method defaults to "spawn" which raises a RuntimeError if mp.Pool is not called within a if __name__ == '__main__': block. I temporarily fixed this by calling mp.set_start_method('fork') when multiprocessing is imported at the start of TemplateGrid, however it may be better to do this by importing multiprocessing at the start of photoz.py and setting the start method to 'fork' then.

@gbrammer
Copy link
Owner

gbrammer commented Dec 6, 2022

Is the fix back-compatible to 3.7, which is still allowed and tested in the CI actions? Can you try submitting a PR, which will then run through the CI tests?

@samecutler
Copy link
Contributor Author

Just wanted to update you that my fix doesn’t seem to solve all the issues on MacOS running newer versions of python. Despite working for smaller data sets (<10k objects), I got a TimeoutError during the rest-frame colors step when trying to do this for a catalog of ~45k objects. As a test, I made a new python environment that uses python 3.7 and it seems to run much quicker and not time out. I’m guessing switching the process start method to "fork" on newer versions of python on MacOS breaks other things which is causing the code to be more CPU intensive and give a TimeoutError.

The python 3.7 workaround seems to work for now, but I’m wondering if it’s going to cause more issues as more packages start to require later versions of python. As for fixes, on python 3.8+ and MacOS, the default process start method is "spawn" which requires any calls to mp.Pool to be protected within a if __name__ == "__main__" block, however this seems to be impossible with the current setup because all the functions being given to multiprocessing need to be serialized or called from the main script.

@christinawilliams
Copy link

This got me too, glad I found this thread. An easy solution in the meantime could be to update the install instructions here to install with python version 3.7 instead of 3.9 https://eazy-py.readthedocs.io/en/latest/eazy/install.html

@gbrammer
Copy link
Owner

gbrammer commented Apr 12, 2023 via email

@samecutler
Copy link
Contributor Author

@christinawilliams John Weaver had the same issue and found a hack that solves the process start method AND the TimeoutError. You can set the start method when the TemplateGrid is initialized via mp.set_start_method('fork') and then disable multiprocessing for the rest_frame_fluxes method by setting n_proc=0 in that function definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants