Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Send + Sync for CudaStream #254

Closed
wants to merge 1 commit into from

Conversation

sebhtml
Copy link
Contributor

@sebhtml sebhtml commented Jun 12, 2024

Hi !

First, thank you for the cudarc project. It's a nice piece of engineering.

In my Novigrad project, I use cudarc.

My models are declared like in PyTorch (but in Rust).
Then, I generate instructions with opcodes and operands.

Using a dependency analysis, I assign each instruction to a logical stream. Logical streams have dependencies too.
At runtime, a logical stream is mapped to a physical CUDA stream.

For example, for multi-head attention, with 12 attention heads, the dependency analysis figures out that each head can be executed on a separate logical stream. Then the logical stream for Concat depends on the 12 logical streams of the heads.

I use a scheduler with a controller and execution units in a multi-threaded implementation.
It requires CudaStream to implement Send + Sync.

I ran the tests (cargo test --release). I don't currently have cuddn and nccl, so those tests failed.

failures:

---- cudnn::safe::tests::test_create_descriptors stdout ----
thread 'cudnn::safe::tests::test_create_descriptors' panicked at src/cudnn/sys/mod.rs:61:9:
Unable to find cudnn lib under the names ["cudnn", "cudnn64", "cudnn64_11", "cudnn64_115", "cudnn64_115_0", "cudnn64_110_5", "cudnn64_10", "cudnn64_110_0"]. Please open GitHub issue.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- cudnn::safe::tests::test_conv2d_pick_algorithms stdout ----
thread 'cudnn::safe::tests::test_conv2d_pick_algorithms' panicked at src/cudnn/sys/mod.rs:61:9:
Unable to find cudnn lib under the names ["cudnn", "cudnn64", "cudnn64_11", "cudnn64_115", "cudnn64_115_0", "cudnn64_110_5", "cudnn64_10", "cudnn64_110_0"]. Please open GitHub issue.

---- cudnn::safe::tests::test_conv1d stdout ----
thread 'cudnn::safe::tests::test_conv1d' panicked at src/cudnn/sys/mod.rs:61:9:
Unable to find cudnn lib under the names ["cudnn", "cudnn64", "cudnn64_11", "cudnn64_115", "cudnn64_115_0", "cudnn64_110_5", "cudnn64_10", "cudnn64_110_0"]. Please open GitHub issue.

---- cudnn::safe::tests::test_conv3d stdout ----
thread 'cudnn::safe::tests::test_conv3d' panicked at src/cudnn/sys/mod.rs:61:9:
Unable to find cudnn lib under the names ["cudnn", "cudnn64", "cudnn64_11", "cudnn64_115", "cudnn64_115_0", "cudnn64_110_5", "cudnn64_10", "cudnn64_110_0"]. Please open GitHub issue.

---- cudnn::safe::tests::test_reduction stdout ----
thread 'cudnn::safe::tests::test_reduction' panicked at src/cudnn/sys/mod.rs:61:9:
Unable to find cudnn lib under the names ["cudnn", "cudnn64", "cudnn64_11", "cudnn64_115", "cudnn64_115_0", "cudnn64_110_5", "cudnn64_10", "cudnn64_110_0"]. Please open GitHub issue.

---- nccl::result::tests::multi_thread stdout ----
thread 'nccl::result::tests::multi_thread' panicked at src/nccl/sys/mod.rs:61:9:
Unable to find nccl lib under the names ["nccl", "nccl64", "nccl64_11", "nccl64_115", "nccl64_115_0", "nccl64_110_5", "nccl64_10", "nccl64_110_0"]. Please open GitHub issue.

---- nccl::safe::tests::test_all_reduce stdout ----
thread 'nccl::safe::tests::test_all_reduce' panicked at src/nccl/sys/mod.rs:61:9:
Unable to find nccl lib under the names ["nccl", "nccl64", "nccl64_11", "nccl64_115", "nccl64_115_0", "nccl64_110_5", "nccl64_10", "nccl64_110_0"]. Please open GitHub issue.

---- nccl::result::tests::single_thread stdout ----
thread 'nccl::result::tests::single_thread' panicked at src/nccl/sys/mod.rs:61:9:
Unable to find nccl lib under the names ["nccl", "nccl64", "nccl64_11", "nccl64_115", "nccl64_115_0", "nccl64_110_5", "nccl64_10", "nccl64_110_0"]. Please open GitHub issue.


failures:
    cudnn::safe::tests::test_conv1d
    cudnn::safe::tests::test_conv2d_pick_algorithms
    cudnn::safe::tests::test_conv3d
    cudnn::safe::tests::test_create_descriptors
    cudnn::safe::tests::test_reduction
    nccl::result::tests::multi_thread
    nccl::result::tests::single_thread
    nccl::safe::tests::test_all_reduce

test result: FAILED. 137 passed; 8 failed; 1 ignored; 0 measured; 0 filtered out; finished in 2.80s


@coreylowman
Copy link
Owner

Thanks for this PR. Do you know if the CUDA docs mention thread safety for the same CUstream object? That's the main thing here. Specifically can multiple threads use the same CUstream object at the same time? If not, the thing to do here would be to create the stream object inside the thread.

@sebhtml
Copy link
Contributor Author

sebhtml commented Jun 13, 2024

Good point !

On StackOverflow, Robert Crovella (an NVIDIA employee) seems to indicate that CUstream objects are thread-safe but the order of operations from thread 1 and thread 2 can non-deterministic.

See https://stackoverflow.com/questions/57775326/multiple-threads-access-the-same-cuda-stream

But I think that your design is superior. It makes more sense to create the streams in each thread.

Thanks !

@sebhtml
Copy link
Contributor Author

sebhtml commented Jun 13, 2024

Hi @coreylowman
So your design suggestion works for what I want to achieve with cudarc in novigrad.
Thanks !
You can close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants