Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Thrust algorithms invoked with par_nosync should asynchronously allocate temporary storage #4117

Open
brycelelbach opened this issue Mar 12, 2025 · 0 comments

Comments

@brycelelbach
Copy link
Contributor

cudax::stream stream;
auto policy = thrust::cuda::par_nosync.on(stream.get());

thrust::device_vector A(...);

auto it = thrust::inclusive_scan(policy, A.begin(), A.end(), A.begin());
// Should allocate temporary storage asynchronously and not block.

Proposed semantics:

  • If an algorithm that allocates temporary storage is invoked with a par_nosync policy and the allocator attached to the policy supports asynchronous allocation, the temporary storage should be allocated asynchronously.
  • The default allocator for par_nosync should be an allocator that supports asynchronous allocation, so that you get asynchronous allocation with par_nosync by default.

Why: There's a lot of existing Thrust code out there today that uses algorithms that allocate temporary storage but don't have a compute-dependent result (like inclusive_scan and sort). This existing code could benefit from par_nosync, but doesn't today because the synchronous free of the temporary storage will block.

How:

  • The Scary Way: Temporary allocations always go through temporary_array (which is implemented with contiguous_buffer), and uses Thrust allocator traits. I believe temporary_array takes a policy on construction, not an allocator, so we have the information we need. We could extend allocator traits to support async allocation.
  • The Easy Way: We could just check for the async allocator/MR concepts and then use more modern libcu++ interfaces directly in temporary_array/contiguous_buffer.
  • I believe at some point in Thrust's history we had a get_temporary_buffer/return_temporary_buffer interface that was intended to be customizable by users. I don't think it's still around, but if it is, that may be worth exploring. I recall it being incredibly hazardous though.

Concerns and considerations:

  • Pooling of allocations.
  • Exploding memory footprints.
  • Touching Thrust temporary_array/contiguous_buffer
  • Touching Thrust allocator traits is terrifying. Be careful
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant