Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jax disagreement with pandarallel #147

Open
jgallowa07 opened this issue Mar 22, 2024 · 1 comment
Open

jax disagreement with pandarallel #147

jgallowa07 opened this issue Mar 22, 2024 · 1 comment
Assignees

Comments

@jgallowa07
Copy link
Member

Note that, for the time being, the issue described below is not causing any real problems (except the annoying warning). But it worth documenting here as it would be nice to patch at some point.

The Problem

pandarallel is conflicting with the jax code since it explicitly sets the context to be "fork".
and thus we get the warning

../../../../mambaforge-pypy3/envs/multidms-dev/lib/python3.12/multiprocessing/popen_fork.py:66: 15 warnings
multidms/data.py: 15 warnings
tests/test_data.py: 210 warnings
  /home/jared/mambaforge-pypy3/envs/multidms-dev/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=99501) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

The Cause

likely it's throwing this warning just because jax is loaded in the forked processes. However, because no jax operations happen within the context of the forked processes, no actual deadlock or issues arise.

The Solution

It seems that pandarallel is no longer being maintained, and so it may be nice to remove it completely, and replace it with something a little better like swifter, or [polars] for fast table operations. Though it's unclear whether these will have the same issues.

Ultimately, removing the jax import from the Data module is reasonable thing to do. The training data could simply be converted to jax pytrees at the time of fitting so long as the memory burden of copying the training data each time you fit is feasible. Testing will need to be done.

@jgallowa07 jgallowa07 self-assigned this Mar 22, 2024
@jgallowa07 jgallowa07 mentioned this issue Mar 22, 2024
@jgallowa07
Copy link
Member Author

  • Make utility function that uses pandarallel or other to make this faster
  • Check all places that use this code are updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant