[libcu++] Add make_device_buffer, make_pinned_buffer, and make_managed_buffer#8250
[libcu++] Add make_device_buffer, make_pinned_buffer, and make_managed_buffer#8250pciolkosz wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
| //! @param __stream The stream used for allocation. | ||
| //! @param __device The device whose default memory pool will be used. | ||
| template <class _Tp, class _Env = ::cuda::std::execution::env<>> | ||
| _CCCL_HOST_API auto make_device_buffer(stream_ref __stream, device_ref __device, const _Env& __env = {}) |
There was a problem hiding this comment.
I believe there was another PR with a similar question: Is the device argument obolete given that stream and device must match?
If so should we verify that
There was a problem hiding this comment.
The stream and device doesn't need to match. Stream is used for sequencing and device argument selects the device where the memory should reside.
We could provide another overload where the device is taken from the stream, it might be a good idea. I was a bit hesitant, because the pinned and managed variants only take a stream and the device variant not taking an explicit device could make it seem like the pinned and managed variants also are localized to the stream's device, where its not the case. I am 50/50 on this right now, but we can always add another overload later
| //! @param __stream The stream used for allocation and copy. | ||
| //! @param __range The input range to be copied into the buffer. | ||
| template <class _Tp, class _Range, class _Env = ::cuda::std::execution::env<>> | ||
| _CCCL_HOST_API auto make_pinned_buffer(stream_ref __stream, _Range&& __range, const _Env& __env = {}) |
There was a problem hiding this comment.
Critical: None of those APIs is constrained. We really need to add some constraints to disambiguate the different overloads
There was a problem hiding this comment.
I wanted to avoid repeating all the constraints, but you are right they could be ambiguous. I reworked these wrappers to just blindly forward arguments to make_buffer, the error message will be marginally worse, but at least we don't need to list all those overloads.
This approach doesn't work with the initializer_list overload, but I am questioning if its needed at all and maybe we should just deprecate it
Convenience wrappers around make_buffer that use the default memory pools: - make_device_buffer: uses device_default_memory_pool(device_ref) - make_pinned_buffer: uses pinned_default_memory_pool() (CTK 12.9+) - make_managed_buffer: uses managed_default_memory_pool() (CTK 13.0+) They forward arguments to make_buffer
7353175 to
2abd228
Compare
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 4h 05m: Pass: 100%/99 | Total: 1d 00h | Max: 3h 41m | Hits: 99%/258072See results here. |
This PR adds
make_device/managed/pinned_bufferas a shorthand formake_bufferwithdevice/managed/pinned_default_memory_pool.Also fly-by fix to add
_CCCL_HOST_APItomake_buffer.Closes #8240