Ensure correct handling of buffers allocated with `LegacyPinnedMemoryResource.allocate` as kernel parameters #717

shwina · 2025-06-18T14:09:44Z

Description

Buffers allocated using the LegacyPinnedMemoryResource have , but the kernel param handler doesn't know how to handle these buffers. This PR fixes that.

Closes #715

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-06-18T14:09:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

shwina · 2025-06-18T14:19:13Z

cuda_core/cuda/core/experimental/_kernel_arg_handler.pyx

@@ -212,7 +212,13 @@ cdef class ParamHolder:
        for i, arg in enumerate(kernel_args):
            if isinstance(arg, Buffer):
                # we need the address of where the actual buffer address is stored
-                self.data_addresses[i] = <void*><intptr_t>(arg.handle.getPtr())
+                if isinstance(arg.handle, int):


Can we stomach the cost of an isinstance check here?

One alternative is to use a try..except, where entering the try block is cheap, but entering the except block is expensive.

Another alternative, which will eliminate the need to make any changes to the kernel arg handling logic here:

introduce a new type HostPtr which wraps an integer representing a pointer, and exposes a getPtr() method to get it.

Expand the return type of Buffer.handle to DevicePtrT | HostPtr

Change LegacyPinnedMemoryResource to return a buffer whose handle is a HostPtr.

I think isinstance in Cython is cheap and what you have here is good. I don't want to introduce more types than needed, partly because we want MR providers to focus on the MR properties (is_host_accessible etc), which is nicer for programmatic checks. I actually think that Buffer.handle should be of Any type so as to not get in the way of the MR providers. From both CUDA and cccl-rt perspectives they should be all void*. We don't want to encode the memory space information as part of the type.

I actually think that Buffer.handle should be of Any type so as to not get in the way of the MR providers.

If we did type it as Any, how would _kernel_arg_handler know how to grab the pointer from underneath the Buffer?

Well Python does not care about type annotations, right? 🙂

My concern wasn't so much about the type annotation, but more that the kernel handler won't know what to do with a Buffer whose .handle is any arbitrary type.

Prior to this PR it could only handle the case when .handle is a CUdeviceptr, or something that has a .getPtr() method.

cuda-python/cuda_core/cuda/core/experimental/_kernel_arg_handler.pyx

Lines 213 to 215 in 24fde17

if isinstance(arg, Buffer):

# we need the address of where the actual buffer address is stored

self.data_addresses[i] = <void*><intptr_t>(arg.handle.getPtr())

This PR adds the ability to handle int.

Technically, .handle is also allowed to be None:

cuda-python/cuda_core/cuda/core/experimental/_memory.py

Line 22 in 24fde17

DevicePointerT = Union[driver.CUdeviceptr, int, None]

Ahh, I see, you meant the mini dispatcher here needs to enumerate all possible types.

Let me think about it. What you have is good and a generic treatment can follow later.

Most likely with #564 we could rewrite the dispatcher that looks like this

if isinstance(arg, Buffer): prepare_arg[intptr_t](self.data, self.data_addresses, get_cuda_native_handle(arg.handle), i)

On the MR provider side, we just need them to implement a protocol

class IsHandleT(Protocol): def __int__(self) -> int: ...

if they are not using generic cuda.bindings or Python types. (FWIW we already have IsStreamT.) So maybe eventually Buffer.handle can be typed as

DevicePointerT = Optional[Union[IsHandleT, int]]

It seems like a reasonable approach and I agree it would simplify the handling here. A couple of comments:

Perhaps we should rename DevicePointerT to just PointerT? In the case of pinned memory for instance, it doesn't actually represent a device pointer AFAIU.

If we use the protocol as written, then Union[IsHandleT, int] is equivalent to just IsHandleT (int type implements __int__). The protocol would also allow types like float or bool.

I feel like this discussion has been had before, but it might be worth considering a protocol with a __cuda_handle__() method or something, rather than __int__()

leofang

Thanks for catching the bug, Ashwin! I left some comments.

It seems to be a miss (due to the way we implicitly test LegacyPinnedMemoryResource through NumPy today) that we do not test passing Buffer from various MRs to launch(). Could you add some tests to test_launcher.py? In a comment below, I suggest we could move part of what you have there.

leofang · 2025-06-18T17:50:12Z

cuda_core/cuda/core/experimental/_kernel_arg_handler.pyx

@@ -212,7 +212,13 @@ cdef class ParamHolder:
        for i, arg in enumerate(kernel_args):
            if isinstance(arg, Buffer):
                # we need the address of where the actual buffer address is stored
-                self.data_addresses[i] = <void*><intptr_t>(arg.handle.getPtr())
+                if isinstance(arg.handle, int):


I think isinstance in Cython is cheap and what you have here is good. I don't want to introduce more types than needed, partly because we want MR providers to focus on the MR properties (is_host_accessible etc), which is nicer for programmatic checks. I actually think that Buffer.handle should be of Any type so as to not get in the way of the MR providers. From both CUDA and cccl-rt perspectives they should be all void*. We don't want to encode the memory space information as part of the type.

cuda_core/examples/memory_ops.py

leofang

Thanks, Ashwin! The added test LGTM. Left a few comments mainly for the code sample correctness.

cuda_core/examples/memory_ops.py

leofang · 2025-06-24T13:31:02Z

/ok to test 40df59e

leofang · 2025-06-24T14:02:43Z

I think the memory_ops.py sample have 1+ bugs that cause later tests to segfault (pytest runs the sample tests first).

We should early-return in the sample if NumPy is <2.1.0, because np.from_dlpack had a bug. In one of the samples I did this:

cuda-python/cuda_core/examples/thread_block_cluster.py

Lines 12 to 14 in 24fde17

    
           if cuda_path is None: 
        
               print("this demo requires a valid CUDA_PATH environment variable set", file=sys.stderr) 
        
               sys.exit(0)

shwina · 2025-06-24T15:25:52Z

I'm also seeing this failure mode in addition to the segfault:

>           assert cp.allclose(array, original * 3.0), f"{memory_resource_class.__name__} operation failed"
E           AssertionError: DeviceMemoryResource operation failed
E           assert array(False)
E            +  where array(False) = <function allclose at 0xf5ce6f977740>(array([0.30819342, 0.78738075, 0.71034026, ..., 0.3719493 , 0.7937206 ,\n       0.90285766], shape=(1024,), dtype=float32), (array([0.30819342, 0.78738075, 0.71034026, ..., 0.3719493 , 0.7937206 ,\n       0.90285766], shape=(1024,), dtype=float32) * 3.0))
E            +    where <function allclose at 0xf5ce6f977740> = <module 'cupy' from '/opt/hostedtoolcache/Python/3.13.5/arm64/lib/python3.13/site-packages/cupy/__init__.py'>.allclose

tests/test_launcher.py:296: AssertionError

shwina · 2025-06-24T16:47:13Z

/ok to test 0b2e207

leofang · 2025-06-24T18:19:12Z

/ok to test 0b2e207

shwina · 2025-06-24T18:50:19Z

OK, now I'm seeing:

>           array[:] = rng.random(size, dtype=dtype)
E           ValueError: assignment destination is read-only

This one I can reproduce with older numpy.

cuda_core/examples/memory_ops.py

shwina · 2025-06-25T10:26:23Z

/ok to test 2699ff1

cuda_core/tests/test_launcher.py

leofang · 2025-06-26T00:02:05Z

/ok to test 7e3c468

leofang · 2025-06-26T00:02:45Z

pre-commit.ci autofix

leofang · 2025-06-26T00:03:22Z

/ok to test c2ff8cc

github-actions · 2025-06-26T00:57:24Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

shwina added 2 commits June 18, 2025 09:47

Add memory ops example

0784b39

Fix handling of buffer with int handle

e1d1f9e

github-project-automation bot added this to CCCL Jun 18, 2025

github-project-automation bot moved this to Todo in CCCL Jun 18, 2025

shwina commented Jun 18, 2025

View reviewed changes

pre-commit fixes

fb65ab8

leofang requested changes Jun 18, 2025

View reviewed changes

github-project-automation bot moved this from Todo to In Progress in CCCL Jun 18, 2025

leofang assigned shwina Jun 18, 2025

leofang added this to the cuda.core beta 5 milestone Jun 18, 2025

leofang added bug Something isn't working P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Jun 18, 2025

shwina added 2 commits June 18, 2025 15:10

Simplify pinned memory example

55bd3b1

Copy pinned memory tests to test_launcher.py

70f0da6

shwina mentioned this pull request Jun 19, 2025

Add a "Getting Started" page to the documentation #720

Merged

2 tasks

leofang requested changes Jun 23, 2025

View reviewed changes

cuda_core/examples/memory_ops.py Show resolved Hide resolved

cuda_core/examples/memory_ops.py Show resolved Hide resolved

cuda_core/examples/memory_ops.py Outdated Show resolved Hide resolved

leofang reviewed Jun 23, 2025

View reviewed changes

cuda_core/examples/memory_ops.py Show resolved Hide resolved

Remove dlpack assertions and address other review comments

40df59e

leofang previously approved these changes Jun 24, 2025

View reviewed changes

github-project-automation bot moved this from In Progress to In Review in CCCL Jun 24, 2025

leofang enabled auto-merge (squash) June 24, 2025 13:32

This comment has been minimized.

Sign in to view

Try addressing issues that may be causing CI failures

0b2e207

shwina dismissed leofang’s stale review via 0b2e207 June 24, 2025 15:34

leofang reviewed Jun 24, 2025

View reviewed changes

cuda_core/examples/memory_ops.py Outdated Show resolved Hide resolved

Use per device device MR, add numpy requirement to test

2699ff1

leofang reviewed Jun 25, 2025

View reviewed changes

cuda_core/tests/test_launcher.py Outdated Show resolved Hide resolved

leofang mentioned this pull request Jun 25, 2025

Provide a nice wrapper over DeviceMemoryResource and _SynchronousMemoryResource #726

Open

Use SynchronousMemoryResource if memory pools are not supported

246e8a1

leofang reviewed Jun 25, 2025

View reviewed changes

cuda_core/tests/test_launcher.py Outdated Show resolved Hide resolved

leofang previously approved these changes Jun 25, 2025

View reviewed changes

apply nit

7e3c468

leofang dismissed their stale review via 7e3c468 June 26, 2025 00:01

leofang previously approved these changes Jun 26, 2025

View reviewed changes

[pre-commit.ci] auto code formatting

c2ff8cc

pre-commit-ci bot dismissed leofang’s stale review via c2ff8cc June 26, 2025 00:02

leofang approved these changes Jun 26, 2025

View reviewed changes

leofang merged commit fd8e07b into NVIDIA:main Jun 26, 2025
53 checks passed

github-project-automation bot moved this from In Review to Done in CCCL Jun 26, 2025

leofang mentioned this pull request Jun 26, 2025

Ensure the mempool wrapped behind DeviceMemoryResource can be customized #727

Open

rwgk mentioned this pull request Jul 1, 2025

Restore option to run testing without cupy installed. #741

Merged

	if isinstance(arg, Buffer):
	# we need the address of where the actual buffer address is stored
	self.data_addresses[i] = <void*><intptr_t>(arg.handle.getPtr())

Ensure correct handling of buffers allocated with LegacyPinnedMemoryResource.allocate as kernel parameters #717

Ensure correct handling of buffers allocated with LegacyPinnedMemoryResource.allocate as kernel parameters #717

Uh oh!

Conversation

shwina commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot bot commented Jun 18, 2025

Uh oh!

shwina Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leofang Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shwina Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

shwina Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leofang Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

shwina Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

leofang Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leofang commented Jun 24, 2025

Uh oh!

This comment has been minimized.

leofang commented Jun 24, 2025

Uh oh!

shwina commented Jun 24, 2025

Uh oh!

shwina commented Jun 24, 2025

Uh oh!

leofang commented Jun 24, 2025

Uh oh!

shwina commented Jun 24, 2025

Uh oh!

Uh oh!

shwina commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

leofang commented Jun 26, 2025

Uh oh!

leofang commented Jun 26, 2025

Uh oh!

leofang commented Jun 26, 2025

Uh oh!

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

Ensure correct handling of buffers allocated with `LegacyPinnedMemoryResource.allocate` as kernel parameters #717

Ensure correct handling of buffers allocated with `LegacyPinnedMemoryResource.allocate` as kernel parameters #717

shwina commented Jun 18, 2025 •

edited

Loading

shwina Jun 18, 2025 •

edited

Loading

leofang Jun 18, 2025 •

edited

Loading

shwina Jun 18, 2025 •

edited

Loading

shwina Jun 19, 2025 •

edited

Loading

leofang Jun 18, 2025 •

edited

Loading