What part of `&CudaSlice<T>` can be mutated by a `CudaFunction`? #272

workingjubilee · 2024-07-10T16:33:25Z

Currently, it is defined thusly:

pub struct CudaSlice<T> {
    pub(crate) cu_device_ptr: sys::CUdeviceptr,
    pub(crate) len: usize,
    pub(crate) device: Arc<CudaDevice>,
    pub(crate) host_buf: Option<Pin<Vec<T>>>,
}

That's 4 fields. Which of these is potentially mutated by a CudaFunction, per this note?

The text was updated successfully, but these errors were encountered:

coreylowman · 2024-07-16T19:28:48Z

The data on the GPU pointed to by cu_device_ptr. Note that kernels can do anything with that pointer, so if a kernel extends past that pointer it will modify other data in gpu device memory (not the same as host memory).

The only field passed to the kernel when using a CudaSlice is the cu_device_ptr. The other fields are not passed unless you explicitly include that as a parameter in launch.

Would probably be good to add this information to the docs

workingjubilee · 2024-07-16T20:54:52Z

It would be useful, yes.

I suggest relating the slice to being essentially &[UnsafeCell<T>], because of this. That, or to require CudaFunctions to only execute after all views into the CudaSlice from Rust have ended.

Certainly, &mut CudaSlice<T> should not exist in Rust while a CudaFunction is running.

coreylowman · 2024-07-16T21:10:27Z

Ideally yes, but practically that makes the API very hard to use. Instead we just make everything involved in calling a kernel unsafe, so all users of the launching api have to ensure that the constraints are not violated themselves, because we cannot do that in this library without making the API really complex (feel free to try a different approach if you believe you can make it work!)

I'll also add that you really can only mutate CudaSlice<T> using the cuda kernels, so we at least don't have to worry about users modifying the data on the rust side at the same time as a cuda kernel is. Every function (at the safe level) that transfers from device to host has synchronous calls inside of it to cover this case (i.e. wait for all kernels to finish running before you can copy data from the device to host side).

coreylowman added documentation Improvements or additions to documentation question Further information is requested labels Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What part of `&CudaSlice<T>` can be mutated by a `CudaFunction`? #272

What part of `&CudaSlice<T>` can be mutated by a `CudaFunction`? #272

workingjubilee commented Jul 10, 2024

coreylowman commented Jul 16, 2024

workingjubilee commented Jul 16, 2024

coreylowman commented Jul 16, 2024 •

edited

Loading

What part of &CudaSlice<T> can be mutated by a CudaFunction? #272

What part of &CudaSlice<T> can be mutated by a CudaFunction? #272

Comments

workingjubilee commented Jul 10, 2024

coreylowman commented Jul 16, 2024

workingjubilee commented Jul 16, 2024

coreylowman commented Jul 16, 2024 • edited Loading

What part of `&CudaSlice<T>` can be mutated by a `CudaFunction`? #272

What part of `&CudaSlice<T>` can be mutated by a `CudaFunction`? #272

coreylowman commented Jul 16, 2024 •

edited

Loading