You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cudax::stream stream;
auto policy = thrust::cuda::par_nosync.on(stream.get());
thrust::device_vector A(...);
auto it = thrust::inclusive_scan(policy, A.begin(), A.end(), A.begin());
// Should allocate temporary storage asynchronously and not block.
Proposed semantics:
If an algorithm that allocates temporary storage is invoked with a par_nosync policy and the allocator attached to the policy supports asynchronous allocation, the temporary storage should be allocated asynchronously.
The default allocator for par_nosync should be an allocator that supports asynchronous allocation, so that you get asynchronous allocation with par_nosync by default.
Why: There's a lot of existing Thrust code out there today that uses algorithms that allocate temporary storage but don't have a compute-dependent result (like inclusive_scan and sort). This existing code could benefit from par_nosync, but doesn't today because the synchronous free of the temporary storage will block.
How:
The Scary Way: Temporary allocations always go through temporary_array (which is implemented with contiguous_buffer), and uses Thrust allocator traits. I believe temporary_array takes a policy on construction, not an allocator, so we have the information we need. We could extend allocator traits to support async allocation.
The Easy Way: We could just check for the async allocator/MR concepts and then use more modern libcu++ interfaces directly in temporary_array/contiguous_buffer.
I believe at some point in Thrust's history we had a get_temporary_buffer/return_temporary_buffer interface that was intended to be customizable by users. I don't think it's still around, but if it is, that may be worth exploring. I recall it being incredibly hazardous though.
Concerns and considerations:
Pooling of allocations.
Exploding memory footprints.
Touching Thrust temporary_array/contiguous_buffer
Touching Thrust allocator traits is terrifying. Be careful
The text was updated successfully, but these errors were encountered:
Proposed semantics:
par_nosync
policy and the allocator attached to the policy supports asynchronous allocation, the temporary storage should be allocated asynchronously.par_nosync
should be an allocator that supports asynchronous allocation, so that you get asynchronous allocation withpar_nosync
by default.Why: There's a lot of existing Thrust code out there today that uses algorithms that allocate temporary storage but don't have a compute-dependent result (like
inclusive_scan
andsort
). This existing code could benefit frompar_nosync
, but doesn't today because the synchronous free of the temporary storage will block.How:
temporary_array
(which is implemented withcontiguous_buffer
), and uses Thrust allocator traits. I believetemporary_array
takes a policy on construction, not an allocator, so we have the information we need. We could extend allocator traits to support async allocation.temporary_array
/contiguous_buffer
.get_temporary_buffer
/return_temporary_buffer
interface that was intended to be customizable by users. I don't think it's still around, but if it is, that may be worth exploring. I recall it being incredibly hazardous though.Concerns and considerations:
The text was updated successfully, but these errors were encountered: