-
Notifications
You must be signed in to change notification settings - Fork 175
Ensure correct handling of buffers allocated with LegacyPinnedMemoryResource.allocate
as kernel parameters
#717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -212,7 +212,13 @@ cdef class ParamHolder: | |||
for i, arg in enumerate(kernel_args): | |||
if isinstance(arg, Buffer): | |||
# we need the address of where the actual buffer address is stored | |||
self.data_addresses[i] = <void*><intptr_t>(arg.handle.getPtr()) | |||
if isinstance(arg.handle, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we stomach the cost of an isinstance
check here?
-
One alternative is to use a
try..except
, where entering thetry
block is cheap, but entering theexcept
block is expensive. -
Another alternative, which will eliminate the need to make any changes to the kernel arg handling logic here:
- introduce a new type
HostPtr
which wraps an integer representing a pointer, and exposes agetPtr()
method to get it. - Expand the return type of
Buffer.handle
toDevicePtrT | HostPtr
- Change
LegacyPinnedMemoryResource
to return a buffer whose handle is aHostPtr
.
- introduce a new type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isinstance
in Cython is cheap and what you have here is good. I don't want to introduce more types than needed, partly because we want MR providers to focus on the MR properties (is_host_accessible
etc), which is nicer for programmatic checks. I actually think that Buffer.handle
should be of Any
type so as to not get in the way of the MR providers. From both CUDA and cccl-rt perspectives they should be all void*
. We don't want to encode the memory space information as part of the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that Buffer.handle should be of Any type so as to not get in the way of the MR providers.
If we did type it as Any
, how would _kernel_arg_handler
know how to grab the pointer from underneath the Buffer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well Python does not care about type annotations, right? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern wasn't so much about the type annotation, but more that the kernel handler won't know what to do with a Buffer
whose .handle
is any arbitrary type.
Prior to this PR it could only handle the case when .handle
is a CUdeviceptr
, or something that has a .getPtr()
method.
cuda-python/cuda_core/cuda/core/experimental/_kernel_arg_handler.pyx
Lines 213 to 215 in 24fde17
if isinstance(arg, Buffer): | |
# we need the address of where the actual buffer address is stored | |
self.data_addresses[i] = <void*><intptr_t>(arg.handle.getPtr()) |
This PR adds the ability to handle int
.
Technically, .handle
is also allowed to be None
:
DevicePointerT = Union[driver.CUdeviceptr, int, None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see, you meant the mini dispatcher here needs to enumerate all possible types.
Let me think about it. What you have is good and a generic treatment can follow later.
Most likely with #564 we could rewrite the dispatcher that looks like this
if isinstance(arg, Buffer):
prepare_arg[intptr_t](self.data, self.data_addresses, get_cuda_native_handle(arg.handle), i)
On the MR provider side, we just need them to implement a protocol
class IsHandleT(Protocol):
def __int__(self) -> int: ...
if they are not using generic cuda.bindings or Python types. (FWIW we already have IsStreamT
.) So maybe eventually Buffer.handle
can be typed as
DevicePointerT = Optional[Union[IsHandleT, int]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a reasonable approach and I agree it would simplify the handling here. A couple of comments:
- Perhaps we should rename
DevicePointerT
to justPointerT
? In the case of pinned memory for instance, it doesn't actually represent a device pointer AFAIU. - If we use the protocol as written, then
Union[IsHandleT, int]
is equivalent to justIsHandleT
(int
type implements__int__
). The protocol would also allow types likefloat
orbool
.- I feel like this discussion has been had before, but it might be worth considering a protocol with a
__cuda_handle__()
method or something, rather than__int__()
- I feel like this discussion has been had before, but it might be worth considering a protocol with a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching the bug, Ashwin! I left some comments.
It seems to be a miss (due to the way we implicitly test LegacyPinnedMemoryResource
through NumPy today) that we do not test passing Buffer
from various MRs to launch()
. Could you add some tests to test_launcher.py
? In a comment below, I suggest we could move part of what you have there.
@@ -212,7 +212,13 @@ cdef class ParamHolder: | |||
for i, arg in enumerate(kernel_args): | |||
if isinstance(arg, Buffer): | |||
# we need the address of where the actual buffer address is stored | |||
self.data_addresses[i] = <void*><intptr_t>(arg.handle.getPtr()) | |||
if isinstance(arg.handle, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isinstance
in Cython is cheap and what you have here is good. I don't want to introduce more types than needed, partly because we want MR providers to focus on the MR properties (is_host_accessible
etc), which is nicer for programmatic checks. I actually think that Buffer.handle
should be of Any
type so as to not get in the way of the MR providers. From both CUDA and cccl-rt perspectives they should be all void*
. We don't want to encode the memory space information as part of the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Ashwin! The added test LGTM. Left a few comments mainly for the code sample correctness.
/ok to test 40df59e |
This comment has been minimized.
This comment has been minimized.
I think the We should early-return in the sample if NumPy is <2.1.0, because cuda-python/cuda_core/examples/thread_block_cluster.py Lines 12 to 14 in 24fde17
|
I'm also seeing this failure mode in addition to the segfault: > assert cp.allclose(array, original * 3.0), f"{memory_resource_class.__name__} operation failed"
E AssertionError: DeviceMemoryResource operation failed
E assert array(False)
E + where array(False) = <function allclose at 0xf5ce6f977740>(array([0.30819342, 0.78738075, 0.71034026, ..., 0.3719493 , 0.7937206 ,\n 0.90285766], shape=(1024,), dtype=float32), (array([0.30819342, 0.78738075, 0.71034026, ..., 0.3719493 , 0.7937206 ,\n 0.90285766], shape=(1024,), dtype=float32) * 3.0))
E + where <function allclose at 0xf5ce6f977740> = <module 'cupy' from '/opt/hostedtoolcache/Python/3.13.5/arm64/lib/python3.13/site-packages/cupy/__init__.py'>.allclose
tests/test_launcher.py:296: AssertionError |
/ok to test 0b2e207 |
/ok to test 0b2e207 |
OK, now I'm seeing:
This one I can reproduce with older numpy. |
/ok to test 2699ff1 |
/ok to test 7e3c468 |
pre-commit.ci autofix |
/ok to test c2ff8cc |
|
Description
Buffers allocated using the
LegacyPinnedMemoryResource
have , but the kernel param handler doesn't know how to handle these buffers. This PR fixes that.Closes #715
Checklist