Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 13 additions & 93 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,116 +2,36 @@
<h1>The Rust CUDA Project</h1>

<p>
<strong>An ecosystem of libraries and tools for writing and executing extremely fast GPU code fully in
<a href="https://www.rust-lang.org/">Rust</a></strong>
<strong>An ecosystem of libraries and tools for writing and executing extremely fast GPU code
fully in <a href="https://www.rust-lang.org/">Rust</a></strong>
</p>

<h3>
<a href="https://rust-gpu.github.io/rust-cuda/index.html">Guide</a>
<span> | </span>
<a href="https://rust-gpu.github.io/rust-cuda/guide/getting_started.html">Getting Started</a>
<span> | </span>
<a href="https://rust-gpu.github.io/rust-cuda/features.html">Features</a>
</h3>
<strong>⚠️ The project is still in early development, expect bugs, safety issues, and things that don't work ⚠️</strong>
</div>

<br/>

> [!IMPORTANT]
> This project is no longer dormant and is [being
> rebooted](https://rust-gpu.github.io/blog/2025/01/27/rust-cuda-reboot). Read the [latest status update](https://rust-gpu.github.io/blog/2025/08/11/rust-cuda-update).
> Please contribute!
>
> The project is still in early development, however. Expect bugs, safety issues, and things that
> don't work.

## Goal

The Rust CUDA Project is a project aimed at making Rust a tier-1 language for extremely fast GPU computing
using the CUDA Toolkit. It provides tools for compiling Rust to extremely fast PTX code as well as libraries
for using existing CUDA libraries with it.

## Background

Historically, general purpose high performance GPU computing has been done using the CUDA toolkit. The CUDA toolkit primarily
provides a way to use Fortran/C/C++ code for GPU computing in tandem with CPU code with a single source. It also provides
many libraries, tools, forums, and documentation to supplement the single-source CPU/GPU code.

CUDA is exclusively an NVIDIA-only toolkit. Many tools have been proposed for cross-platform GPU computing such as
OpenCL, Vulkan Computing, and HIP. However, CUDA remains the most used toolkit for such tasks by far. This is why it is
imperative to make Rust a viable option for use with the CUDA toolkit.

However, CUDA with Rust has been a historically very rocky road. The only viable option until now has been to use the LLVM PTX
backend, however, the LLVM PTX backend does not always work and would generate invalid PTX for many common Rust operations, and
in recent years it has been shown time and time again that a specialized solution is needed for Rust on the GPU with the advent
of projects such as rust-gpu (for Rust -> SPIR-V).

Our hope is that with this project we can push the Rust GPU computing industry forward and make Rust an excellent language
for such tasks. Rust offers plenty of benefits such as `__restrict__` performance benefits for every kernel, An excellent module/crate system,
delimiting of unsafe areas of CPU/GPU code with `unsafe`, high level wrappers to low level CUDA libraries, etc.

## Structure

The scope of the Rust CUDA Project is quite broad, it spans the entirety of the CUDA ecosystem, with libraries and tools to make it
usable using Rust. Therefore, the project contains many crates for all corners of the CUDA ecosystem.

The current line-up of libraries is the following:

- `rustc_codegen_nvvm` Which is a rustc backend that targets NVVM IR (a subset of LLVM IR) for the [libnvvm](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html) library.
- Generates highly optimized PTX code which can be loaded by the CUDA Driver API to execute on the GPU.
- For the near future it will be CUDA-only, but it may be used to target amdgpu in the future.
- `cuda_std` for GPU-side functions and utilities, such as thread index queries, memory allocation, warp intrinsics, etc.
- _Not_ a low level library, provides many utility functions to make it easier to write cleaner and more reliable GPU kernels.
- Closely tied to `rustc_codegen_nvvm` which exposes GPU features through it internally.
- [`cudnn`](https://github.com/Rust-GPU/rust-cuda/tree/master/crates/cudnn) for a collection of GPU-accelerated primitives for deep neural networks.
- `cust` for CPU-side CUDA features such as launching GPU kernels, GPU memory allocation, device queries, etc.
- High level with features such as RAII and Rust Results that make it easier and cleaner to manage the interface to the GPU.
- A high level wrapper for the CUDA Driver API, the lower level version of the more common CUDA Runtime API used from C++.
- Provides much more fine grained control over things like kernel concurrency and module loading than the C++ Runtime API.
- `gpu_rand` for GPU-friendly random number generation, currently only implements xoroshiro RNGs from `rand_xoshiro`.
- `optix` for CPU-side hardware raytracing and denoising using the CUDA OptiX library.

In addition to many "glue" crates for things such as high level wrappers for certain smaller CUDA libraries.

## Related Projects

Other projects related to using Rust on the GPU:

- 2016: [glassful](https://github.com/kmcallister/glassful) Subset of Rust that compiles to GLSL.
- 2017: [inspirv-rust](https://github.com/msiglreith/inspirv-rust) Experimental Rust MIR -> SPIR-V Compiler.
- 2018: [nvptx](https://github.com/japaric-archived/nvptx) Rust to PTX compiler using the `nvptx` target for rustc (using the LLVM PTX backend).
- 2020: [accel](https://github.com/termoshtt/accel) Higher-level library that relied on the same mechanism that `nvptx` does.
- 2020: [rlsl](https://github.com/MaikKlein/rlsl) Experimental Rust -> SPIR-V compiler (predecessor to rust-gpu)
- 2020: [rust-gpu](https://github.com/Rust-GPU/rust-gpu) `rustc` compiler backend to compile Rust to SPIR-V for use in shaders, similar mechanism as our project.

## Usage
```bash
## setup your environment like:
### export OPTIX_ROOT=/opt/NVIDIA-OptiX-SDK-9.0.0-linux64-x86_64
### export OPTIX_ROOT_DIR=/opt/NVIDIA-OptiX-SDK-9.0.0-linux64-x86_64

## build proj
cargo build
```

## Use Rust CUDA in Container Environments

The distribution related Dockerfile are located in `container` folder.
Taking ubuntu 24.04 as an example, run the following command in repository root:
```bash
docker build -f ./container/ubuntu24-cuda12/Dockerfile -t rust-cuda-ubuntu24 .
docker run --rm --runtime=nvidia --gpus all -it rust-cuda-ubuntu24
```
## Documentation

A sample `.devcontainer.json` file is also included, configured for Ubuntu 24.02. Copy this to `.devcontainer/devcontainer.json` to make additional customizations.
Please see [The Rust CUDA Guide](https://rust-gpu.github.io/rust-cuda/) for documentation on Rust
CUDA.

## License

Licensed under either of

- Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
- Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
http://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your discretion.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in
the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.
3 changes: 0 additions & 3 deletions guide/src/README.md

This file was deleted.

10 changes: 7 additions & 3 deletions guide/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# Summary

- [Introduction](README.md)
- [Supported Features](features.md)
- [Frequently Asked Questions](faq.md)
- [Introduction](introduction.md)
- [Guide](guide/README.md)
- [Getting Started](guide/getting_started.md)
- [Compute Capability Gating](guide/compute_capabilities.md)
Expand All @@ -18,3 +16,9 @@
- [Types](nvvm/types.md)
- [PTX Generation](nvvm/ptxgen.md)
- [Debugging](nvvm/debugging.md)

----

[Supported Features](features.md)
[Frequently Asked Questions](faq.md)

File renamed without changes
File renamed without changes
21 changes: 18 additions & 3 deletions guide/src/guide/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,12 +276,24 @@ There are two ways to build and run this example: natively, and with docker.

### Native

If you have all the required libraries installed, try building with `cargo build`. If you get an
error "libnvvm.so.4: cannot open shared object file", you will need to adjust `LD_LIBRARY_PATH`,
something like this:
If you have all the required libraries installed, try building with `cargo build`.

If you get an error "libnvvm.so.4: cannot open shared object file", you will need to adjust
`LD_LIBRARY_PATH`, something like this:
```
export LD_LIBRARY_PATH="/usr/local/cuda/nvvm/lib64:${LD_LIBRARY_PATH}"
```

If you get an error "error: couldn't load codegen backend" on Windows, you will need to adjust
`PATH`, something like this with CUDA 12:
```
$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\nvvm\bin"
```
or this with CUDA 13:
```
$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\nvvm\bin\x64"
```

You should then be able to `cargo run`, and see the expected output:
```
c = [3.0, 5.0, 7.0, 9.0]
Expand Down Expand Up @@ -328,6 +340,9 @@ is recognized.
`make`ing and running the [`deviceQuery`] sample. If all is well it will print various details
about your GPU.

A sample `.devcontainer.json` file is also included, configured for Ubuntu 24.04. Copy this to
`.devcontainer/devcontainer.json` to make additional customizations.

[`deviceQuery`]: https://github.com/NVIDIA/cuda-samples/tree/ba04faaf7328dbcc87bfc9acaf17f951ee5ddcf3/Samples/deviceQuery

## More examples
Expand Down
24 changes: 12 additions & 12 deletions guide/src/guide/kernel_abi.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,31 +52,31 @@ by reference (by allocating a device box):

```rs
let foo = Foo {
a: 5,
b: 6,
c: 7
a: 5,
b: 6,
c: 7
};

unsafe {
launch!(
module.kernel<<<1, 1, 0, stream>>>(foo)
)?;
launch!(
module.kernel<<<1, 1, 0, stream>>>(foo)
)?;
}
```

And not

```rs
let foo = DeviceBox::new(Foo {
a: 5,
b: 6,
c: 7
a: 5,
b: 6,
c: 7
});

unsafe {
launch!(
module.kernel<<<1, 1, 0, stream>>>(foo.as_device_ptr())
)?;
launch!(
module.kernel<<<1, 1, 0, stream>>>(foo.as_device_ptr())
)?;
}
```

Expand Down
81 changes: 81 additions & 0 deletions guide/src/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Introduction

Welcome to the Rust CUDA Guide!

## Goal

The Rust CUDA Project is a project aimed at making Rust a tier-1 language for GPU computing using
the CUDA Toolkit. It provides tools for compiling Rust to fast PTX code as well as libraries for
using existing CUDA libraries with it.

## Background

Historically, general-purpose high-performance GPU computing has been done using the CUDA toolkit.
The CUDA toolkit primarily provides a way to use Fortran/C/C++ code for GPU computing in tandem
with CPU code with a single source. It also provides many libraries, tools, forums, and
documentation to supplement the single-source CPU/GPU code.

CUDA is exclusively an NVIDIA-only toolkit. Many tools have been proposed for cross-platform GPU
computing such as OpenCL, Vulkan Computing, and HIP. However, CUDA remains the most used toolkit
for such tasks by far. This is why it is imperative to make Rust a viable option for use with the
CUDA toolkit.

However, CUDA with Rust has been a historically very rocky road. The only viable option until now
has been to use the LLVM PTX backend. However, the LLVM PTX backend does not always work and would
generate invalid PTX for many common Rust operations. In recent years it has been shown time and
time again that a specialized solution is needed for Rust on the GPU with the advent of projects
such as rust-gpu (for translating Rust to SPIR-V).

Our hope is that with this project we can push the Rust on GPUs forward and make Rust an excellent
language for such tasks. Rust offers plenty of benefits such as `__restrict__` performance benefits
for every kernel, an excellent module/crate system, delimiting of unsafe areas of CPU/GPU code with
`unsafe`, high-level wrappers to low-level CUDA libraries, etc.

## Structure

The scope of the Rust CUDA Project is broad, spanning the entirety of the CUDA ecosystem, with
libraries and tools to make it usable using Rust. Therefore, the project contains many crates for
all corners of the CUDA ecosystem.

- `rustc_codegen_nvvm` is a rustc backend that targets NVVM IR (a subset of LLVM IR) for the
[libnvvm](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html) library.
- Generates highly optimized PTX code which can be loaded by the CUDA Driver API to execute on
the GPU.
- For now it is CUDA-only, but it may be used to target AMD GPUs in the future.
- `cuda_std` contains GPU-side functions and utilities, such as thread index queries, memory
allocation, warp intrinsics, etc.
- It is _not_ a low level library. It provides many utility functions to make it easier to write
cleaner and more reliable GPU kernels.
- It is Closely tied to `rustc_codegen_nvvm` which exposes GPU features through it internally.
- `cust` contains CPU-side CUDA features such as launching GPU kernels, GPU memory allocation,
device queries, etc.
- It is a high-level wrapper for the CUDA Driver API, the lower level alternative to the more
common CUDA Runtime API used from C++. It provides more fine-grained control over things like
kernel concurrency and module loading than the Runtime API.
- High-level Rust features such as RAII and `Result` make it easier and cleaner to manage
the interface to the GPU.
- `cudnn` is a collection of GPU-accelerated primitives for deep neural networks.
- `gpu_rand` does GPU-friendly random number generation. It currently only implements xoroshiro
RNGs from `rand_xoshiro`.
- `optix` provides CPU-side hardware raytracing and denoising using the CUDA OptiX library.
(This library is currently commented out because the OptiX SDK is difficult to install.)

There are also several "glue" crates for things such as high level wrappers for certain smaller
CUDA libraries.

## Related Projects

Other projects related to using Rust on the GPU:

- 2016: [glassful](https://github.com/kmcallister/glassful) translates a subset of Rust to GLSL.
- 2017: [inspirv-rust](https://github.com/msiglreith/inspirv-rust) is an experimental
Rust-MIR-to-SPIR-V compiler.
- 2018: [nvptx](https://github.com/japaric-archived/nvptx) is a Rust-to-PTX compiler using the
`nvptx` target for rustc (using the LLVM PTX backend).
- 2020: [accel](https://github.com/termoshtt/accel) is a higher-level library that relied on the
same mechanism that `nvptx` does.
- 2020: [rlsl](https://github.com/MaikKlein/rlsl) is an experimental Rust-to-SPIR-V compiler
(and a predecessor to rust-gpu).
- 2020: [rust-gpu](https://github.com/Rust-GPU/rust-gpu) is a `rustc` compiler backend to compile
Rust to SPIR-V for use in shaders. Like Rust CUDA, it is part of the broader [Rust
GPU](https://rust-gpu.github.io/) project.
8 changes: 5 additions & 3 deletions guide/src/nvvm/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,10 @@ which I will add to the project soon.
## Miscompilations

Miscompilations are rare but annoying. They usually result in one of two things happening:
- CUDA rejecting the PTX as a whole (throwing an InvalidPtx error). This is rare but the most common cause is declaring invalid
extern functions (just grep for `extern` in the PTX file and check if it's odd functions that aren't CUDA syscalls like vprintf, malloc, free, etc).
- CUDA rejecting the PTX as a whole (throwing an `InvalidPtx` error). Run `ptxas` on the `.ptx`
file to get a more informative error message. This is rare but the most common cause is declaring
invalid extern functions (just grep for `extern` in the PTX file and check if it's odd functions
that aren't CUDA syscalls like vprintf, malloc, free, etc).
- The PTX containing invalid behavior. This is very specific and rare but if you find this, the best way to debug it is:
- Try to get a minimal working example so we don't have to search through megabytes of LLVM IR/PTX.
- Use `RUSTFLAGS="--emit=llvm-ir"` and find `crate_name.ll` in `target/nvptx64-nvidia-cuda/<debug/release>/deps/` and attach it in any bug report.
Expand All @@ -51,4 +53,4 @@ If you set up the codegen backend for debug, it should give you a mapping from R

Here is an example of the screen you should see:

![](../../../assets/nsight.png)
![](../../assets/nsight.png)
Loading