-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add amdgpu target #823
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed. Concerns or objections to the proposal should be discussed on Zulip and formally registered here by adding a comment with the following syntax:
Concerns can be lifted with:
See documentation at https://forge.rust-lang.org cc @rust-lang/compiler |
This target will require quite a bit of consideration by rustc due to being unusual in many ways, but I see no problem with adding it on an experimental basis (so, tier 3), especially given that we've already added PTX targets. @rustbot second |
Just a quick note, |
We are very interested in address spaces as well, /cc @eddyb |
@rustbot label -final-comment-period +major-change-accepted |
Proposal
Add the
amdgpu
target to rustc that allows to generate code for AMD GPUs.The LLVM backend has good support for this backend. The goal is to expose this to Rust, enabling Rust as another language on these GPUs. The main target is compute capabilities. This is in contrast to the rust-gpu project, which targets graphics capabilities through spir-v (the graphics variant). The base runtime to run compute programs on AMD GPUs is HSA (Heterogeneous System Architecture), which is implemented in ROCR-Runtime. Therefore, the Rust backend should target the amdhsa OS. To support the target in rustc, LLVM needs to be compiled with the amdgpu backend enabled. On Windows (and Linux), HIP can be used to load the same compiled amdgpu programs.
There are two points in which the amdgpu target is different from other (mainstream/x86) targets.
Address spaces
Address spaces can be thought of as denoting different physical memory areas (this is a thought concept, it can be that way in hardware, but it does not need to be). In LLVM IR, each pointer has an address space, defaulting to
addrspace(0)
(this is implicit in textual IR, which is why you won’t see it there). Different address spaces have different properties, e.g. they can have a different pointer size, the nullptr can be different (e.g.0
vs-1
) and the machine instructions used to access them can be different.The amdgpu LLVM backend makes heavy use of address spaces. This is also a problem for other targets that want to support Rust (though mostly more exotic ones). The use of address spaces leads to situations, where a pointer in one address space needs to be casted to a pointer in a different address space. In LLVM IR,
bitcast
is invalid for this case,addrspacecast
needs to be used.The changes to rustc code should be mostly about fixing problems in a rather contained way. (I don’t know for sure how contained it will be, but compiling
core
required surprisingly few changes as can be seen in rust-lang/rust#134740; disclaimer: I only tried running a very simple program with-Zbuild-std=core
so far.) The changes will bring the rust llvm backend closer to how LLVM envisions address spaces, which should make it easier to support other future targets that use more address spaces (there was already some work for other targets, which in turn made it easier for amdgpu).To get a feeling for what LLVM address spaces are used for, here is the list of the important amdgpu address spaces (from https://llvm.org/docs/AMDGPUUsage.html#address-spaces):
addrspace(0)
may go to any of the below address spaces. The hardware switches at runtime, depending on the pointer. This works for all pointers, but is the slowest.groupshared
in HLSL orshared
in GLSL.)alloca
s need to be inaddrspace(5)
. A thread can only access its own private memory.Basic support for the amdgpu target means using address spaces 1 (incoming pointers), 5 (allocas) and 0 (if we don’t know which one it is). Support for groupshared memory in the language requires its own RFC (something similar to
thread_local
probably makes sense).Casting pointers to
addrspace(0)
before useI experimented more and came to the conclusion that all pointers need to be casted to
addrspace (0)
before they are used (this affects alloca and global variables). If we don’t do that, thingsgo wrong with the below code:
results in
This may store a 32-bit pointer and read it back as a 64-bit pointer, which is obviously wrong and
cannot work. Instead, we need to
addrspacecast %local to ptr addrspace(0)
, then we store and loadthe correct type.
So, I think the way to go is casting every pointer to
addrspace(0)
immediately after creating analloca
or a global. For alloca, the change is just 2 lines, for globals it is a bit more involved due to vtables, where the global variable is modified after it is created and therefore we need to look through theaddrspacecast
constexpr when adding attributes.See https://rust-lang.zulipchat.com/#narrow/channel/233931-t-compiler.2Fmajor-changes/topic/Add.20amdgpu.20target.20compiler-team.23823/near/491175958.
Many processors / target-cpus
Every generation of GPUs uses different machine code (to some extent). LLVM supports them as different “cpu”s or processors (
-Ctarget-cpu=
argument for rustc).There are two challenges for this regarding Rust support
core
), it would need to be for all processors to be useful.There is no obvious choice for the default processor. If some processor is the default and a user tries to use the amdgpu target without overwriting the
target-cpu
, it likely results in an unusable binary. We could use a non-existing “cpu” as the default, resulting in compiler errors, to make users aware of the need to set atarget-cpu
.E.g
"please specify -Ctarget-cpu"
results in failing compilations and warnings:Alternatively, some choice can be made, like
gfx900
.There is a PR that fails compilation if no cpu is specified explicitly: rust-lang/rust#135030
Regarding 2., there may be a generic backend for amdgpu in the future, relying on spir-v (the compute variant) which would solve both these issues, but that is a long way to go and unsure if it eventually happens. For the reason of binary size alone, it does not make sense for Rust to distribute pre-compiled code for the amdgpu backend. Users should instead specify their processor via
-Ctarget-cpu=
and compile core via-Zbuild-std=core
or similar means.The list of processors supported by LLVM is here: https://llvm.org/docs/AMDGPUUsage.html#processors
Related issues
Mentors or Reviewers
None yet, this is my first Rust contribution :)
Process
The main points of the Major Change Process are as follows:
@rustbot second
.-C flag
, then full team check-off is required.@rfcbot fcp merge
on either the MCP or the PR.You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.
The text was updated successfully, but these errors were encountered: