You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’d very much rather have Thrust raise a compile-time error when thrust::something(thrust::cuda::par_nosync, ...); is called and something can only be blocking. I don’t want new API names because the API surface is already very large and I need my muscle memory. However, I do want to know the run-time behavior at compile time, so that I can rely on the compiler to tell me to fix my code. Is it already done today?
We should do this. Even if you don't want to expose this to users (which we should), we want something like this so that we can write unit tests that verify whether algorithms block or not, and improve our documentation of which algorithms block.
Options:
Add a new nosync policy type that will fail at compile time if blocking would happen.
Add a macro that would make the existing par_nosync fail at compile time if blocking happens. I don't like macros for this.
Any of the above options, but with compile time warnings instead of failures.
To both implement this feature and to improve our docs, we need to funnel all blocking operations through a common interface.
We already do this for some obvious operations like cuda(Stream|Device)Synchronize.
We don't do this for cudaFree, which always blocks.
We don't do this for cudaMalloc, which rarely blocks, but can under some circumstances on multi-GPU systems which cross-GPU allocation visibility enabled.
There may be other CUDA runtime interfaces that block under subtle conditions. We should speak with someone from the CUDA driver team and get a comprehensive list.
Does the CUDA driver/runtime accurately document which interfaces can block?
The text was updated successfully, but these errors were encountered:
From @leofang
We should do this. Even if you don't want to expose this to users (which we should), we want something like this so that we can write unit tests that verify whether algorithms block or not, and improve our documentation of which algorithms block.
Options:
par_nosync
fail at compile time if blocking happens. I don't like macros for this.To both implement this feature and to improve our docs, we need to funnel all blocking operations through a common interface.
cuda(Stream|Device)Synchronize
.cudaFree
, which always blocks.cudaMalloc
, which rarely blocks, but can under some circumstances on multi-GPU systems which cross-GPU allocation visibility enabled.Does the CUDA driver/runtime accurately document which interfaces can block?
The text was updated successfully, but these errors were encountered: