Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support optional env variable to skip any attempt to import MPI for Cray MPI #209

Open
heather999 opened this issue Apr 8, 2022 · 2 comments

Comments

@heather999
Copy link

HI
This is related to #113 and #173
I have a conda environment at NERSC with the cray mpich libraries and mpi4py available. I want to support users who may use the batch nodes as well as the login nodes and jupyterhub at NERSC (which effectively runs on their login nodes). In the case of running on a batch node, everything works fine but when running in this conda env on a NERSC login node and doing import pymultinest we receive the dreaded:

[Thu Apr  7 21:29:11 2022] [unknown] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1
Aborted

This is specific to Cray MPI since it fails looking for specific hardware that is not on NERSC's login nodes. I also reached out to the mpi4py developer and he kindly explained that the try block added in #173 doesn't help in this case, because error handles can only be set after MPI_Init(), so we cannot catch this fatal error.
It would be helpful if we could add an environment variable that, if set, causes pymultinest to completely skip attempting the MPI import here.

I'm happy to help submit a PR to get that going. Does that seem reasonable?

@heather999 heather999 changed the title support optional env variable to skip any attempt to import MPI support optional env variable to skip any attempt to import MPI for Cray MPI Apr 8, 2022
@JohannesBuchner
Copy link
Owner

Further solutions for such interactive environments:

  • uninstall mpi4py in that conda env or
  • remove the libmultinest_mpi.so library file or
  • hide the libmultinest_mpi.so library file, by pointing the LD_LIBRARY_PATH to a folder which only contains libmultinest.so

I am a bit hesitant to include more and more code which circumvents broken MPI setups.

@heather999
Copy link
Author

The point is to provide a single conda environment that supports both interactive use and running on batch nodes at HPC centers that have Cray MPI available. Uninstalling mpi4py, removing or hiding libmultinest_mpi.so are not reasonable solutions to allow a single conda environment to support all the potential use cases.

MPI is not meant to be used on the login nodes of these centers so I can understand why the setup is "broken", yet users do occasionally want to run python from a conda environment interactively and they should be able to do that. Another option is to maintain two separate conda environments, one that properly supports MPI and another that doesn't - but that's not a very user or environment maintainer friendly solution either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants