Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: when calling Python script from D, how to properly setup env (for loading dynamical libraries .so)? #156

Open
mw66 opened this issue May 24, 2021 · 0 comments

Comments

@mw66
Copy link
Contributor

mw66 commented May 24, 2021

Hi,

I encountered a strange error, my Python program behaves differently between stand-alone run v.s called by D program via pyd, I noticed it could caused by the loading dynamical libraries .so differently:

Python stand-alone, run log:
"""
2021-05-24 18:48:06.939476: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-24 18:48:06.958735: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
2021-05-24 18:48:07.603718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-24 18:48:10.760915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 18:48:10.763691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
"""
please note: libcublas.so.11 is loaded first, and the run succeeds.

When the same Python script called by pyd, the run log is:
"""
2021-05-24 19:03:30.943183: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-24 19:03:31.098807: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
[New Thread 0x7ffe397fa700 (LWP 23546)]
[Thread 0x7ffe397fa700 (LWP 23546) exited]
[New Thread 0x7ffe397fa700 (LWP 23547)]
[New Thread 0x7ffe38ff9700 (LWP 23548)]
[New Thread 0x7ffe39ffb700 (LWP 23549)]
[New Thread 0x7ffcf3fff700 (LWP 23550)]
2021-05-24 19:03:35.463795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 19:03:38.561243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-24 19:03:57.926635: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
Traceback (most recent call last):
"""
please note: libcublas.so.11 is NOT loaded, and the 1st load become libcublasLt.so.11; and then the run fails.

I tried very hard to make sure that at shell command level, I'm setting the same env vars in the two scenarios.

But why the Python program called by pyd from D skip loading some dynamic library (i.e. libcublas.so.11 in this case)?

Is there something (env var) I need to setup in the D program when calling pyd?

(Another thing that looks suspicious is: there are some thread activity going on before loading those library, this only happened in the pyd run, not sure if it's related).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant