Implement IPC-enabled events. #1145

Andy-Jost · 2025-10-16T20:07:38Z

Description

Adds construction keywords to Event to allow creation of IPC-enabled events. Adds reduce methods to allow events to be sent to other processes. Updates tests.

Closes #1040

…x_size` memory resource attribute to size_t from int32. Various updates and additions to test helpers.

copy-pr-bot · 2025-10-16T20:07:41Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Andy-Jost · 2025-10-16T20:08:31Z

/ok to test 3d16913

…CBufferTestHelper` to `PatternGen` and combine `flipped` and `starting_from` arguments to just `seed`. Rename `compare_buffers` to `compare_equal_buffers` and have it return a Boolean.

… tests.

…e latch-based test.

Andy-Jost · 2025-10-17T17:41:30Z

/ok to test 5ad48ca

Andy-Jost

This PR is ready for review.

Andy-Jost · 2025-10-17T16:30:58Z

cuda_core/cuda/core/experimental/_event.pyx

+        self._busy_waited = ipc_descriptor._busy_waited
+        self._ipc_enabled = True
+        self._ipc_descriptor = ipc_descriptor
+        self._device_id = -1  # ??


Events hold a device and context handle. I could not find these being used anywhere except the property getters in this class. For imported events, these are not available. The current implementation returns None for these properties for IPC-imported events.

I'm curious about the intended use of the device and context. Do we need setters for these?

Andy-Jost · 2025-10-17T16:35:09Z

cuda_core/cuda/core/experimental/_event.pyx

        self._device_id = device_id
        self._ctx_handle = ctx_handle
+        if opts.ipc_enabled:
+            self.get_ipc_descriptor()


multiprocessing serializes arguments in a separate thread. If get_ipc_descriptor is called for the first time during serialization, then cuIpcGetEventHandle raises an error saying no CUDA context is bound to that thread. The best solution I found is this: if an event is created with IPC support, then create the descriptor right away (in the main thread) and cache it.

cuda_core/cuda/core/experimental/_event.pyx

Andy-Jost · 2025-10-17T16:53:41Z

cuda_core/cuda/core/experimental/_event.pyx

+
+    cdef:
+        bytes _reserved
+        bint _busy_waited


I send the _busy_waited property to the child because there is no driver API to query it. I failed to devise a test that actually checks whether an event blocks or busy-waits in the child process, so I cannot confirm whether the property is accurate.

Andy-Jost · 2025-10-17T16:58:00Z

cuda_core/tests/helpers/__init__.py

 except ImportError:
    # Import shared platform helpers for tests across repos
-    sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[2] / "cuda_python_test_helpers"))
+    sys.path.insert(0, str(pathlib.Path(__file__).resolve().parents[3] / "cuda_python_test_helpers"))


I made helpers.py into a package and merged the IPC utility.py into it.

Andy-Jost · 2025-10-17T17:25:47Z

cuda_core/tests/memory_ipc/test_event_ipc.py

+        log("done")
+
+
+def test_event_is_monadic(ipc_device):


We can discuss this design decision. My view is that it's better for users to potentially trip over these limitations (~5 minute fix) rather than risk creating a race with mutable events (hours or days).

Andy-Jost · 2025-10-17T17:29:54Z

cuda_core/tests/memory_ipc/test_memory_ipc.py

        # Set up the IPC-enabled memory pool and share it.
        device = ipc_device
        mr = ipc_memory_resource
+        pgen = PatternGen(device, NBYTES)


There are lots of changes coming from the rename of IPCBufferTestHelper to PatternGen, but the new class is much more sane and useful.

Could it make sense to break the rename out as a separate PR?

I'm open to it. I think there's not much risk of this change causing trouble for being overly complex, because the actual changes to cuda.core are so small. IMO, moving to the next project has more value but I'm not set in stone.

because the actual changes to cuda.core are so small.

That's very difficult for me to see at the moment. Let's discuss offline. I don't want to generate extra work.

Andy-Jost · 2025-10-17T17:34:09Z

cuda_core/tests/test_event.py

-
-    # This kernel is designed to busy loop until a signal is received
-    code = """
-#include <cuda/atomic>


Factored this out to helpers/latch.py

Andy-Jost · 2025-10-17T17:35:38Z

cuda_core/tests/test_memory.py

        raise RuntimeError("the pinned memory resource is not bound to any GPU")


-class DummyUnifiedMemoryResource(MemoryResource):


Moved to helpers/buffers.py

Andy-Jost · 2025-10-17T21:37:42Z

/ok to test 0fe70cf

Andy-Jost · 2025-10-17T23:26:38Z

/ok to test 820bf1b

rparolin · 2025-10-20T23:36:38Z

cuda_core/cuda/core/experimental/_device.pyx

 from cuda.core.experimental._stream cimport default_stream


+


Necessary whitespace?

Probably not. I assume the pre-commit hooks will reformat as needed, but maybe not in this case?

rparolin · 2025-10-20T23:41:48Z

cuda_core/cuda/core/experimental/_event.pxd

 cdef class Event:

    cdef:
        cydriver.CUevent _handle


Does cython throw warnings when data members inject padding overhead that could be eliminated by reordering members?

I don't know. FYI, CUevent is 32 bytes IIRC

rparolin · 2025-10-20T23:56:30Z

cuda_core/cuda/core/experimental/_event.pyx

+        """Export an event allocated for sharing between processes."""
+        if self._ipc_descriptor is not None:
+            return self._ipc_descriptor
+        if not self.is_ipc_enabled:


if is_ipc_enabled is false would we want this error to fire even if has some value in the _ipc_descriptor field?

It should not be possible to fill the _ipc_descriptor field in that case. It is only ever set non-None below, in this function, after the check.

rparolin · 2025-10-20T23:57:56Z

cuda_core/cuda/core/experimental/_event.pyx

+    def from_ipc_descriptor(cls, ipc_descriptor: IPCEventDescriptor) -> Event:
+        """Import an event that was exported from another process."""
+        cdef cydriver.CUipcEventHandle data
+        memcpy(data.reserved, <const void*><const char*>(ipc_descriptor._reserved), sizeof(data.reserved))


is the explicit case to void* required here? It should be implicitly convertible for pointer types.

rparolin · 2025-10-21T00:05:14Z

cuda_core/tests/helpers/latch.py

+
+    def __init__(self, device):
+        if helpers.CUDA_INCLUDE_PATH is None:
+            pytest.skip("need CUDA header")


Can we provide more information to the user if they hit this? What would their next steps be?

I'm not sure, I moved this out of the existing test_events.py test.

Andy-Jost · 2025-10-21T17:57:40Z

/ok to test 6ef6b03

greptile-apps

_{20 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-21T21:02:44Z

cuda_core/tests/helpers/latch.py

+        mr = LegacyPinnedMemoryResource()
+        self.buffer = mr.allocate(4)
+        self.busy_wait_flag[0] = 0


logic: Property busy_wait_flag (defined on line 62) is accessed before the property decorator is evaluated. This will raise AttributeError at runtime

Suggested change

mr = LegacyPinnedMemoryResource()

self.buffer = mr.allocate(4)

self.busy_wait_flag[0] = 0

mr = LegacyPinnedMemoryResource()

self.buffer = mr.allocate(4)

ctypes.cast(int(self.buffer.handle), ctypes.POINTER(ctypes.c_int32))[0] = 0

greptile-apps · 2025-10-21T21:02:45Z

cuda_core/tests/memory_ipc/test_workerpool.py

+        pgen = PatternGen(device, NBYTES)
        for buffer in buffers:
-            IPCBufferTestHelper(device, buffer).verify_buffer(flipped=True)
+            pgen.verify_buffer(buffer, seed=True)
            buffer.close()


logic: PatternGen instance created outside the worker pool context but used to verify buffers modified by child processes - verify that PatternGen.verify_buffer generates the same pattern across processes when seed=True. Does PatternGen use a deterministic seed that produces identical patterns in parent and child processes?

greptile-apps · 2025-10-21T21:02:46Z

cuda_core/tests/helpers/buffers.py

+    def device_id(self) -> int:
+        return self.device


logic: device_id property returns self.device, but the type is unclear – if device is a Device object (line 20-21), this returns the wrong type. Should return self.device.device_id or cast to int. Is self.device an integer device ID or a Device object?

greptile-apps · 2025-10-21T21:02:46Z

cuda_core/cuda/core/experimental/_event.pyx

+        self._device_id = -1  # ??
+        self._ctx_handle = None  # ??


logic: Device ID and context handle left uninitialized for imported events. May cause AttributeError or incorrect behavior if device or context properties accessed. Should imported events query the current device/context, or is it expected that these properties will return None?

greptile-apps · 2025-10-21T21:02:47Z

cuda_core/cuda/core/experimental/_event.pyx

+    def __eq__(self, IPCEventDescriptor rhs):
+        # No need to check self._busy_waited.
+        return self._reserved == rhs._reserved


logic: Equality check ignores _busy_waited flag. Two descriptors with different sync behavior will compare equal. Is the busy_waited flag intentionally excluded from equality to allow interoperability, or should it be part of the comparison?

greptile-apps · 2025-10-21T21:02:48Z

cuda_core/tests/memory_ipc/test_event_ipc.py

+        latch.release()
+        process.join()


logic: Latch is released after waiting for the child acknowledgment (line56), but the child's copy (line 83) is enqueued on stream2 which waits on the event from stream1 (line 81). If the latch blocks stream1, the event may never complete and the child will hang. Verify that the event is recorded before the latch blocks or reorder the release to before the child wait. Does stream1.record guarantee that the event timestamp is captured before any subsequent kernel on stream1 starts executing, or does the event wait for the kernel to finish?

github-actions · 2025-10-22T01:01:20Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

This reverts commit 20f29e9.

This reverts commit bcd40ff.

* Restore "Implement IPC-enabled events. (#1145)" This reverts commit bcd40ff. * Add timeout to LatchKernel. * Fix test_latchkernel crash on Windows. * Add timeout to child process join in tests. * Minor test fix

Initial implementation of IPC-enabled events. Changes the type of `ma…

e13182b

…x_size` memory resource attribute to size_t from int32. Various updates and additions to test helpers.

Andy-Jost self-assigned this Oct 16, 2025

Andy-Jost marked this pull request as draft October 16, 2025 20:07

This comment has been minimized.

Sign in to view

Move contents of memory_ipc/utility module to helpers. Rename `IP…

26678bf

…CBufferTestHelper` to `PatternGen` and combine `flipped` and `starting_from` arguments to just `seed`. Rename `compare_buffers` to `compare_equal_buffers` and have it return a Boolean.

Andy-Jost force-pushed the ipc_events branch from 3d16913 to c3f41d9 Compare October 16, 2025 23:21

leofang added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Oct 17, 2025

leofang self-requested a review October 17, 2025 04:04

leofang modified the milestones: cuda.core beta 9, cuda.core beta 8 Oct 17, 2025

leofang linked an issue Oct 17, 2025 that may be closed by this pull request

Enable IPC events and update tests to test stream-ordered IPC #1040

Open

Andy-Jost added 3 commits October 17, 2025 09:09

Remove redundant supports_ipc_mempool checks.

556a9ad

Rename test_ipc_event.py to test_event_ipc.py to align with other…

88c25c4

… tests.

Remove the copy-based IPC event test, which is less efficient than th…

438ebd4

…e latch-based test.

Andy-Jost force-pushed the ipc_events branch from c3f41d9 to 438ebd4 Compare October 17, 2025 16:09

Simplify PatternGen.

02975e9

Andy-Jost force-pushed the ipc_events branch 2 times, most recently from fedc205 to 8a84c51 Compare October 17, 2025 17:32

Andy-Jost requested review from cpcloud and rwgk October 17, 2025 17:36

Simplify LatchKernel and fix formatting.

f98a286

Andy-Jost force-pushed the ipc_events branch from 8a84c51 to f98a286 Compare October 17, 2025 17:38

Andy-Jost added 2 commits October 17, 2025 10:39

Merge branch 'main' into ipc_events

fc6937d

Fix cython-lint error.

5ad48ca

Andy-Jost marked this pull request as ready for review October 17, 2025 17:41

Andy-Jost commented Oct 17, 2025

View reviewed changes

Test fix.

820bf1b

Andy-Jost force-pushed the ipc_events branch from 0fe70cf to 820bf1b Compare October 17, 2025 23:25

rparolin reviewed Oct 20, 2025

View reviewed changes

rparolin reviewed Oct 21, 2025

View reviewed changes

rparolin approved these changes Oct 21, 2025

View reviewed changes

Merge branch 'main' into ipc_events

6ef6b03

Andy-Jost enabled auto-merge (squash) October 21, 2025 17:56

greptile-apps bot reviewed Oct 21, 2025

View reviewed changes

Andy-Jost merged commit 20f29e9 into NVIDIA:main Oct 22, 2025
66 of 71 checks passed

Andy-Jost added a commit to Andy-Jost/cuda-python that referenced this pull request Oct 22, 2025

Revert "Implement IPC-enabled events. (NVIDIA#1145)"

1d5bd94

This reverts commit 20f29e9.

Andy-Jost added a commit that referenced this pull request Oct 22, 2025

Revert "Implement IPC-enabled events. (#1145)" (#1173)

bcd40ff

This reverts commit 20f29e9.

Andy-Jost added a commit to Andy-Jost/cuda-python that referenced this pull request Oct 22, 2025

Restore "Implement IPC-enabled events. (NVIDIA#1145)"

9dd067e

This reverts commit bcd40ff.

		raise RuntimeError("the pinned memory resource is not bound to any GPU")


		class DummyUnifiedMemoryResource(MemoryResource):

Implement IPC-enabled events. #1145

Implement IPC-enabled events. #1145

Uh oh!

Conversation

Andy-Jost commented Oct 16, 2025 • edited by rparolin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

copy-pr-bot bot commented Oct 16, 2025

Uh oh!

Andy-Jost commented Oct 16, 2025

Uh oh!

This comment has been minimized.

Andy-Jost commented Oct 17, 2025

Uh oh!

Andy-Jost left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Oct 17, 2025

Uh oh!

Andy-Jost commented Oct 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Oct 21, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Oct 16, 2025 •

edited by rparolin

Loading