Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CudaDevice::htod_copy_into is unsound #314

Open
CatsAreFluffy opened this issue Jan 20, 2025 · 3 comments
Open

CudaDevice::htod_copy_into is unsound #314

CatsAreFluffy opened this issue Jan 20, 2025 · 3 comments

Comments

@CatsAreFluffy
Copy link
Contributor

If two htod_copy_intos to the same CudaSlice are issued in quick succession, the source for the first copy may be freed before it completes. I haven't been able to find an example that behaves incorrectly because of this, though.

@coreylowman
Copy link
Owner

Ah are you saying this because we overwrite the first owned vec with the 2nd dst.host_buf = Some(Pin::new(src));?

I supposed that's true. Perhaps we should add:

if self.host_buf.is_some() {
    self.synchronize()?;
}

@CatsAreFluffy
Copy link
Contributor Author

You could also do it by making an event when you copy, storing it in the CudaSlice, and synchronizing on it whenever you drop the host memory. This would block execution less if you queue more work between the copies. Synchronization also needs to happen for anything else that can free the host memory (eg leak and drop), so I think the easiest way to do that would be to make a struct containing an event and a Pin<Vec<_>> that synchronizes on the event when it's dropped, and use that for storing the host memory.

Also, now that I think about it, I probably wasn't able to cause issues with this since copies from unpinned memory (in the CUDA sense) are always synchronous, and Vecs are always allocated in unpinned, so you can't actually get any use-after-frees. I'm not sure if it's a good idea to rely on that in general, though.

@coreylowman
Copy link
Owner

coreylowman commented Jan 21, 2025

Oh yeah good points. For future reference: while we are using rust side pins, this is not the same as cuda side pinned memory (which requires you used a specific api to allocate the memory #80 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants