Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open MPI hangs on Modin tests #342

Open
YarShev opened this issue Oct 2, 2023 · 2 comments
Open

Open MPI hangs on Modin tests #342

YarShev opened this issue Oct 2, 2023 · 2 comments
Assignees
Labels
bug 🦗 Something isn't working MPI MPI backend related issues P0 Highest priority tasks requiring immediate investigation and fix

Comments

@YarShev
Copy link
Collaborator

YarShev commented Oct 2, 2023

When running the following Modin tests on Open MPI the flow hangs both in CI and locally.

MODIN_ENGINE=unidist mpiexec --oversubscribe -x UNIDIST_MPI_SHARED_OBJECT_STORE=True -n 1 python -m pytest modin/pandas/test/internals/test_benchmark_mode.py

However, it works on Intel MPI.

@YarShev YarShev added bug 🦗 Something isn't working MPI MPI backend related issues labels Oct 2, 2023
@YarShev YarShev changed the title Open MPI hangs on all CPUs using --oversubscribe Open MPI hangs on Modin tests Oct 2, 2023
@YarShev
Copy link
Collaborator Author

YarShev commented Oct 5, 2023

The issue is reproducible even before introducing the shared memory feature.

@YarShev YarShev added the P0 Highest priority tasks requiring immediate investigation and fix label Oct 31, 2023
@anmyachev
Copy link

anmyachev commented Nov 5, 2023

I see another test that hangs with unidist:
https://github.com/modin-project/modin/actions/runs/6756756901/job/18366579579?pr=6707
or
https://github.com/modin-project/modin/actions/runs/6737323284/job/18314544412?pr=6697

mpiexec aborting job...
modin/pandas/test/test_io.py::TestCsv::test_dataframe_to_csv 
job aborted:
[ranks] message

[0] job terminated by the user

[1-4] terminated

---- error analysis -----

[0] on fv-az836-953
ctrl-c was hit. job aborted by the user.

---- error analysis -----

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working MPI MPI backend related issues P0 Highest priority tasks requiring immediate investigation and fix
Projects
None yet
Development

No branches or pull requests

3 participants