Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cuda.bindings and cuda.core for Linker #133

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

brandon-b-miller
Copy link
Collaborator

WIP
xref #129

@leofang
Copy link
Member

leofang commented Feb 22, 2025

Thanks, @brandon-b-miller. Remember our goal is to drop every Linker subclasses inside Numba, in favor of cuda.core.Linker. The current PR is not what we want. Also note that to help pynvjitlink to phase out, we already have rapidsai/pynvjitlink#111 which is essentially what this PR does today.

@brandon-b-miller brandon-b-miller changed the title Use cuda.bindings and cuda.core for nvjitlink Use cuda.bindings and cuda.core for Linker Feb 24, 2025
Copy link

copy-pr-bot bot commented Feb 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.


# Load
cufunc = module.get_function(self._entry_name)
#cufunc = module.get_function(self._entry_name)
cufunc = cubin.get_kernel(self._entry_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we switched to context independent loading API, the CUDADispatcher.bind method should probably be renamed since it no longer binds context via calling the get_cufunc function.

@gmarkall gmarkall added the 2 - In Progress Currently a work in progress label Mar 7, 2025
@brandon-b-miller
Copy link
Collaborator Author

/ok to test

@brandon-b-miller
Copy link
Collaborator Author

@gmarkall @leofang numba-cuda contains nvjitlink tests, should we maintain support for these as part of this PR or drop them in favor of testing in upstream cuda-python?

@gmarkall
Copy link
Collaborator

@gmarkall @leofang numba-cuda contains nvjitlink tests, should we maintain support for these as part of this PR or drop them in favor of testing in upstream cuda-python?

I think we need to maintain the tests that test Numba-CUDA's interaction with the linker, like the TestLinkerUsage class and the test_*_with_linkable_code (tests with names like that). I don't think we need to keep the tests that purely test the PyNvJitLinker API like the ones that test passing different flags etc. to it.

@gmarkall
Copy link
Collaborator

Also, I think we can probably delete the PyNvJitLinker class in this PR as well - is there any reason to keep it around?

I'm comfortable with:

  • Using cuda.core.Linker when the user asks for pynvjitlink or the NVIDIA bindings, and
  • Using the ctypes linker otherwise

which is what this PR seems to offer. (correct me if I've read it wrong 🙂)

@brandon-b-miller
Copy link
Collaborator Author

Also, I think we can probably delete the PyNvJitLinker class in this PR as well - is there any reason to keep it around?

I'm comfortable with:

  • Using cuda.core.Linker when the user asks for pynvjitlink or the NVIDIA bindings, and
  • Using the ctypes linker otherwise

which is what this PR seems to offer. (correct me if I've read it wrong 🙂)

Correct, this is the outcome I am aiming for.

@brandon-b-miller
Copy link
Collaborator Author

@gmarkall on second thought, we might need to leave the MVCLinker in in some capacity as long as we're supporting cuda 11. I don't think that cuda-python supports the functionality that cubinlinker enables.

@gmarkall
Copy link
Collaborator

@brandon-b-miller Sorry, yes - I had that in mind but didn't write it down.

@brandon-b-miller
Copy link
Collaborator Author

@brandon-b-miller Sorry, yes - I had that in mind but didn't write it down.

Ok, just to have it written down somewhere, after this PR we will:

For cuda 11, maintain the current way of configuring which bindings to use:

  • Default ctypes bindings, optional cuda-python bindings with NUMBA_CUDA_USE_NVIDIA_BINDING=1, optional MVCLinker with NUMBA_CUDA_ENABLE_MINOR_VERSION_COMPATIBILITY=1.

for cuda 12, we will have:

  • Default ctypes bindings, optional cuda-python bindings with NUMBA_CUDA_USE_NVIDIA_BINDING=1
  • Use of pynvjitlink through cuda-python if NUMBA_CUDA_ENBALE_PYNVJITLINK=1

This will leave us with 3 linkers:

  • The ctypes linker which is used by default regardless of cuda version
  • the mvc linker which is used in a cuda 11 environment when mvc is required, regardless of what binding is being used
  • the new linker which is used in a cuda 12 enviornment either the cuda-python bindings or pynvjitlink is enabled

@gmarkall
Copy link
Collaborator

Thanks for the summary! To look a little further ahead, we want to end up with only one linker, which is the new linker. This would be achieved by deprecating / removing the other linkers as soon as appropriate:

  • MVCLinker can be removed as soon as CUDA 11 support is dropped.
  • The ctypes linker can be deprecated and removed whenever we can have a hard dependency on cuda.core and had tested the new linker in use for a bit to shake out any issues.

Are you in alignment with the above plan @brandon-b-miller ?

@brandon-b-miller
Copy link
Collaborator Author

Thanks for the summary! To look a little further ahead, we want to end up with only one linker, which is the new linker. This would be achieved by deprecating / removing the other linkers as soon as appropriate:

  • MVCLinker can be removed as soon as CUDA 11 support is dropped.
  • The ctypes linker can be deprecated and removed whenever we can have a hard dependency on cuda.core and had tested the new linker in use for a bit to shake out any issues.

Are you in alignment with the above plan @brandon-b-miller ?

Yup this sounds good to me.

self._object_codes.append(obj)


def add_library(self, lib, name='<cudapy-lib>'):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few cases in this class where I'm finding I have to do something like this. We discussed wanting to avoid having an ObjectCode constructor for every type of possible input, but it seems like numba requires the ability to assemble the eventual inputs to the link from a pretty broad variety of sources.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we need an API to be able to construct ObjectCode instances from inputs of arbitrary kinds - archives, cubins, LTOIR, fatbins, etc.. It seems like the existing interface is aimed at providing something relatively safe, but doesn't allow for the lower-level / fine-grained control we need here to set up and complete a link.

Is that understanding correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I've arrived at doing, @leofang and I have had a few conversations about how this capability would fit into the broader goals of the cuda-python API, I'd be curious his thoughts.

@gmarkall
Copy link
Collaborator

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants