Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudarc fails to load libraries on official nvidia ubuntu images #274

Closed
manifest opened this issue Jul 15, 2024 · 11 comments
Closed

cudarc fails to load libraries on official nvidia ubuntu images #274

manifest opened this issue Jul 15, 2024 · 11 comments

Comments

@manifest
Copy link

docker image: nvidia/cuda:12.5.1-runtime-ubuntu24.04
cudarc version: 0.11.7

Error message:

Unable to dynamically load the "cublas" shared library - searched for library names: ["cublas", "cublas64", "cublas64_12", "cublas64_125", "cublas64_125_0", "cublas64_120_5", "cublas64_10", "cublas64_120_0", "cublas64_9"]. Ensure that `LD_LIBRARY_PATH` has the correct path to the installed library. If the shared library is present on the system under a different name than one of those listed above, please open a GitHub issue.

Location of the libraries on nvidia/cuda:12.5.1-runtime-ubuntu24.04:

/usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so.1
/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.11
/usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so.11
/usr/local/cuda/targets/x86_64-linux/lib/libcufile.so.0
/usr/local/cuda/targets/x86_64-linux/lib/libcufile_rdma.so.1
/usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10
/usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.11
/usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so.11
/usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppc.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppial.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppif.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppig.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppim.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppist.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnpps.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnvJitLink.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnvfatbin.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so.12
/usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so.12.5
/usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.12
@coreylowman
Copy link
Owner

Hmm I've always used the cuda devel docker images (e.g. 12.5.1-cudnn-devel-ubuntu20.04) and those have worked for me.

Can you try the devel images? If runtime images are necessary for you I can look into why they are different (I'm thinking the .12 at the end of the library name is messing up the dynamic loading searching).

Alternatively - You can disable dynamic loading in favor of using dynamic linking, and that will likely work.

@manifest
Copy link
Author

Can you try the devel images? If runtime images are necessary for you I can look into why they are different (I'm thinking the .12 at the end of the library name is messing up the dynamic loading searching).

That could be the reason, because creating symlinks for the libraries above resolves the issue on runtime image. That work around works, but I would love to get rid of it :-)

We use devel image for the first build stage and then move binary to the runtime image to keep the image size small.

Hmm I've always used the cuda devel docker images (e.g. 12.5.1-cudnn-devel-ubuntu20.04) and those have worked for me.

We used nvidia/cuda:11.8.0-runtime-ubuntu22.04 with cudarc 0.10.0 and it worked fine. After upgrade to cudarc 0.11.7, we've got the problem.

I would have tested cudarc 0.11.7 against cuda:11, but some other dependency in our application now requires cuda:12.

@manifest
Copy link
Author

Can you try the devel images?

I've just tried nvidia/cuda:12.5.1-devel-ubuntu24.04, it works fine.

Alternatively - You can disable dynamic loading in favor of using dynamic linking, and that will likely work.

How can I do that? We use cudarc as a dependency of candle.

@coreylowman
Copy link
Owner

Hmm looks like the main branch of candle is using dynamic linking already - are ya'll on an older version or a branch?

Also FYI there was a bug with 0.11.7, so recommend either upgrading to 0.11.8 or downgrading to 0.11.6 (which is the version candle is targetting).

I'll play around and see if I can get the dynamic loader to account for postfixes to the path. I'm not sure if we have that much control over pre & post fixes though (e.g. adding a .<major> after the .so), so symlinks may have to suffice for now.

@coreylowman
Copy link
Owner

BTW I don't see the driver library libcuda.so, it seems like the docker image only has the runtime library (libcudart.so). At minimum I think this would be blocked until #262 is merged AND candle would then have to swap over to using the runtime api feature which might take a bit longer.

Did ya'll see any errors related to not finding cuda? I would expect cudarc to fail first on that over cublas, so am wondering if ya'll already included that somehow.

@manifest
Copy link
Author

I've upgraded to 0.11.8.

In logs, I see only the following message. The same message as before.

thread 'main' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.8/src/lib.rs:98:5:
Unable to dynamically load the "cublas" shared library - searched for library names: ["cublas", "cublas64", "cublas64_12", "cublas64_125", "cublas64_125_0", "cublas64_120_5", "cublas64_10", "cublas64_120_0", "cublas64_9"]. Ensure that `LD_LIBRARY_PATH` has the correct path to the installed library. If the shared library is present on the system under a different name than one of those listed above, please open a GitHub issue.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@coreylowman
Copy link
Owner

Ah yeah, sorry for miscommunicating - upgrading to 0.11.8 won't fix the message in this issue

@manifest
Copy link
Author

I've got you. Just wanted to clarify that I on the latest version in case you want me to test something :-)

@maulberto3
Copy link

maulberto3 commented Aug 9, 2024

@coreylowman Wondering if LibreCuda might help?

@manifest
Copy link
Author

I've built candle with cudarc from the master branch. This issue has been resolved by the latest commit. Thanks.

@coreylowman Are you planning on making a release?

@manifest
Copy link
Author

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants