You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered a strange error, my Python program behaves differently between stand-alone run v.s called by D program via pyd, I noticed it could caused by the loading dynamical libraries .so differently:
Python stand-alone, run log:
"""
2021-05-24 18:48:06.939476: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-24 18:48:06.958735: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
2021-05-24 18:48:07.603718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-24 18:48:10.760915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 18:48:10.763691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
"""
please note: libcublas.so.11 is loaded first, and the run succeeds.
When the same Python script called by pyd, the run log is:
"""
2021-05-24 19:03:30.943183: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-24 19:03:31.098807: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
[New Thread 0x7ffe397fa700 (LWP 23546)]
[Thread 0x7ffe397fa700 (LWP 23546) exited]
[New Thread 0x7ffe397fa700 (LWP 23547)]
[New Thread 0x7ffe38ff9700 (LWP 23548)]
[New Thread 0x7ffe39ffb700 (LWP 23549)]
[New Thread 0x7ffcf3fff700 (LWP 23550)]
2021-05-24 19:03:35.463795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 19:03:38.561243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-24 19:03:57.926635: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
Traceback (most recent call last):
"""
please note: libcublas.so.11 is NOT loaded, and the 1st load become libcublasLt.so.11; and then the run fails.
I tried very hard to make sure that at shell command level, I'm setting the same env vars in the two scenarios.
But why the Python program called by pyd from D skip loading some dynamic library (i.e. libcublas.so.11 in this case)?
Is there something (env var) I need to setup in the D program when calling pyd?
(Another thing that looks suspicious is: there are some thread activity going on before loading those library, this only happened in the pyd run, not sure if it's related).
The text was updated successfully, but these errors were encountered:
Hi,
I encountered a strange error, my Python program behaves differently between stand-alone run v.s called by D program via pyd, I noticed it could caused by the loading dynamical libraries .so differently:
Python stand-alone, run log:
"""
2021-05-24 18:48:06.939476: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-24 18:48:06.958735: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
2021-05-24 18:48:07.603718: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-24 18:48:10.760915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 18:48:10.763691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
"""
please note: libcublas.so.11 is loaded first, and the run succeeds.
When the same Python script called by pyd, the run log is:
"""
2021-05-24 19:03:30.943183: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-05-24 19:03:31.098807: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3492480000 Hz
[New Thread 0x7ffe397fa700 (LWP 23546)]
[Thread 0x7ffe397fa700 (LWP 23546) exited]
[New Thread 0x7ffe397fa700 (LWP 23547)]
[New Thread 0x7ffe38ff9700 (LWP 23548)]
[New Thread 0x7ffe39ffb700 (LWP 23549)]
[New Thread 0x7ffcf3fff700 (LWP 23550)]
2021-05-24 19:03:35.463795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-24 19:03:38.561243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-24 19:03:57.926635: E tensorflow/stream_executor/cuda/cuda_blas.cc:197] failed to set new cublas math mode: CUBLAS_STATUS_INVALID_VALUE
Traceback (most recent call last):
"""
please note: libcublas.so.11 is NOT loaded, and the 1st load become libcublasLt.so.11; and then the run fails.
I tried very hard to make sure that at shell command level, I'm setting the same env vars in the two scenarios.
But why the Python program called by pyd from D skip loading some dynamic library (i.e. libcublas.so.11 in this case)?
Is there something (env var) I need to setup in the D program when calling pyd?
(Another thing that looks suspicious is: there are some thread activity going on before loading those library, this only happened in the pyd run, not sure if it's related).
The text was updated successfully, but these errors were encountered: