Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locate nvvm, libdevice and nvrtc from nvidia-cuda-nvcc-cu12 wheels #155

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

brandon-b-miller
Copy link
Collaborator

Closes #66
Closes #65

WIP, current code finds nvvm/libdevice which is enough to launch kernels, nvrtc support is next. Logic vendored from nvmath-python

Comment on lines +84 to +94
if sp is not None:
dso_dir = os.path.join(
sp,
"nvidia",
"cuda_nvcc",
"nvvm",
dso_dir
)
dso_path = os.path.join(dso_dir, dso_path)
if os.path.exists(dso_path):
return str(Path(dso_path).parent)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on this in NVIDIA/cuda-python#441 (comment), but we may want to just consider trying to import the nvidia package and then subsequently trying to import the cuda_nvcc package from the nvidia package instead of manually traversing the paths? We can then use nvidia.cuda_nvcc.__path__ which always resolve to sp/nvidia/cuda_nvcc and will follow the general python rules for which package takes priority properly.

def _get_nvvm_wheel():
site_paths = [
site.getusersitepackages()
] + site.getsitepackages() + ["conda", None]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add a conda path here? If someone installed the wheel in a conda environment it would presumably be in the site.getsitepackages()?

Comment on lines 72 to 82
# The SONAME is taken based on public CTK 12.x releases
if sys.platform.startswith("linux"):
dso_dir = "lib64"
# Hack: libnvvm from Linux wheel
# does not have any soname (CUDAINST-3183)
dso_path = "libnvvm.so"
elif sys.platform.startswith("win32"):
dso_dir = "bin"
dso_path = "nvvm64_40_0.dll"
else:
raise AssertionError()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pull this out of the site_paths loop I think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be good to raise the exception with some explanation of what is wrong?

('Debian package', get_debian_pkg_libdevice()),
('NVIDIA NVCC Wheel', get_libdevice_wheel()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to be looking for this ahead of the Debian package, otherwise Debian-packaged versions will always get in front of the wheel.

]
libdevice_ctk_dir = get_system_ctk('nvvm', 'libdevice')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we move the system toolkit after the Debian-packaged version? I think we want to preserve the order if we can.

# Keep only the max (most recent version) of the bitcode files.
out = max(candidates, default=None)
if by == "NVIDIA NVCC Wheel":
# The NVVM path is a directory, not a file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relevance of the NVVM path here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised this is a copy/paste error from below.

# The NVVM path is a directory, not a file
out = os.path.join(libdir, "libdevice.10.bc")
else:
# Search for pattern
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's called libdevice.10.bc in all supported toolkit versions, so this logic is probably no longer needed - will check and update here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked - even back in 11.2 it is called libdevice.10.bc - so we don't need to search and choose from a set of candidates anymore.

candidates = find_lib('nvvm', path)
path = max(candidates) if candidates else None
if by == "NVIDIA NVCC Wheel":
# The NVVM path is a directory, not a file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out what this comment means / adds - can you explain / reword / delete it?

Copy link
Collaborator

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions on the diff - in addition, do we plan to add a CI config that installs these from wheels so that we know it will continue to work?

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Mar 13, 2025
@brandon-b-miller
Copy link
Collaborator Author

A few questions on the diff - in addition, do we plan to add a CI config that installs these from wheels so that we know it will continue to work?

Yes, I'll see about adding a separate CI job for this


# remove cuda-nvvm-12-5 leaving libnvvm.so from nvidia-cuda-nvcc-cu12 only
apt-get update
apt remove --purge cuda-nvvm-12-5 -y
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This combined with the addition of nvidia-cuda-nvcc-cu12 was the easiest way I could think of to get to the relevant test environment, but I'm by no means married to it, this would have to be dynamic wrt the minor version as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the installed package name with something like

CUDA_NVVM_PACKAGE=`dpkg --get-selections | grep cuda-nvvm | awk '{print $1}'`

@gmarkall gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 19, 2025
rwgk added a commit to rwgk/cuda-python that referenced this pull request Mar 19, 2025
rwgk added a commit to rwgk/cuda-python that referenced this pull request Mar 19, 2025
@ZzEeKkAa
Copy link
Contributor

ZzEeKkAa commented Mar 21, 2025

I've merged this branch with main (fbbc040) and tested on nvmath-python. I was able successfully get rid of this patch:

    # our device apis only support cuda 12+
    _utils.force_loading_nvrtc("12")
    nvrtc.NVRTC.__new__ = __nvrtc_new__

But can't get rid of

    # Patch Numba to support wheels
    _utils.patch_numba_nvvm(nvvm)

I'm getting the error:

> python ./examples/device/cublasdx_simple_gemm_fp32.py
/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py:663: NumbaPerformanceWarning: Grid size 1 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
Traceback (most recent call last):
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/./examples/device/cublasdx_simple_gemm_fp32.py", line 78, in <module>
    main()
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/./examples/device/cublasdx_simple_gemm_fp32.py", line 68, in main
    f[1, block_dim](a_d, b_d, c_d, alpha, beta, o_d)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 666, in __call__
    return self.dispatcher.call(args, self.griddim, self.blockdim,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 808, in call
    kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 816, in _compile_for_args
    return self.compile(tuple(argtypes))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 1065, in compile
    kernel = _Kernel(self.py_func, argtypes, **self.targetoptions)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 156, in __init__
    cres = compile_cuda(self.py_func, types.void, self.argtypes,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/compiler.py", line 290, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 739, in compile_extra
    return pipeline.compile_extra(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 439, in compile_extra
    return self._compile_bytecode()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 505, in _compile_bytecode
    return self._compile_core()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 481, in _compile_core
    raise e
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 473, in _compile_core
    pm.run(self.state)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 363, in run
    raise e
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 272, in check
    mangled = func(compiler_state)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typed_passes.py", line 112, in run_pass
    typemap, return_type, calltypes, errs = type_inference_stage(
                                            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typed_passes.py", line 93, in type_inference_stage
    errs = infer.propagate(raise_errors=raise_errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 1066, in propagate
    errors = self.constraints.propagate(self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 160, in propagate
    constraint(typeinfer)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 566, in __call__
    self.resolve(typeinfer, typevars, fnty)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 589, in resolve
    sig = typeinfer.resolve_call(fnty, pos_args, kw_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 1560, in resolve_call
    return self.context.resolve_function_type(fnty, pos_args, kw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typing/context.py", line 195, in resolve_function_type
    res = self._resolve_user_function_type(func, args, kws)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typing/context.py", line 247, in _resolve_user_function_type
    return func.get_call_type(self, args, kws)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/types/functions.py", line 538, in get_call_type
    self.dispatcher.get_call_template(args, kws)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 979, in get_call_template
    self.compile_device(tuple(args))
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 1016, in compile_device
    cres = compile_cuda(self.py_func, return_type, args,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/compiler.py", line 290, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 739, in compile_extra
    return pipeline.compile_extra(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 439, in compile_extra
    return self._compile_bytecode()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 505, in _compile_bytecode
    return self._compile_core()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 481, in _compile_core
    raise e
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 473, in _compile_core
    pm.run(self.state)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 363, in run
    raise e
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 272, in check
    mangled = func(compiler_state)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typed_passes.py", line 466, in run_pass
    lower = self.lowering_class(targetctx, library, fndesc, interp,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/lowering.py", line 40, in __init__
    self.module = self.library.create_ir_module(self.fndesc.unique_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/codegen.py", line 574, in create_ir_module
    ir_module = self._codegen._create_empty_module(name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/codegen.py", line 399, in _create_empty_module
    ir_module.data_layout = nvvm.NVVM().data_layout
                            ^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cudadrv/nvvm.py", line 139, in __new__
    inst.driver = open_cudalib('nvvm')
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cudadrv/libs.py", line 83, in open_cudalib
    path = get_cudalib(lib)
           ^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cudadrv/libs.py", line 54, in get_cudalib
    return get_cuda_paths()['nvvm'].info or _dllnamepattern % 'nvvm'
           ^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cuda_paths.py", line 290, in get_cuda_paths
    'nvvm': _get_nvvm_path(),
            ^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cuda_paths.py", line 263, in _get_nvvm_path
    by, path = _get_nvvm_path_decision()
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cuda_paths.py", line 60, in _get_nvvm_path_decision
    if os.path.exists(nvvm_ctk_dir):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 19, in exists
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Context:
pynvjitlink is on and lto set to True

@brandon-b-miller
Copy link
Collaborator Author

Hi @ZzEeKkAa , there's a couple pieces of this that are still WIP, I think you'll probably run into bugs right now. I'm working this PR over the next few days so hopefully some more updates soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on reviewer Waiting for reviewer to respond to author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support for loading NVRTC from a wheel [FEA] Support for loading NVVM from a wheel
4 participants