diff --git a/text/3838-repr-scalable.md b/text/3838-repr-scalable.md new file mode 100644 index 00000000000..7646153a303 --- /dev/null +++ b/text/3838-repr-scalable.md @@ -0,0 +1,386 @@ +- Feature Name: `repr_scalable` +- Start Date: 2025-07-07 +- RFC PR: [rust-lang/rfcs#3838](https://github.com/rust-lang/rfcs/pull/3838) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Extends Rust's existing SIMD infrastructure, `#[repr(simd)]`, with a +complementary scalable representation, `#[repr(scalable(N)]`, to support +scalable vector types, such as Arm's Scalable Vector Extension (SVE), or +RISC-V's Vector Extension (RVV). + +Like the existing `repr(simd)` representation, `repr(scalable(N))` is internal +compiler infrastructure that will be used only in the standard library to +introduce scalable vector types which can then be stablised. Only the +infrastructure to define these types are introduced in this RFC, not the types +or intrinsics that use it. + +This RFC builds on Rust's existing SIMD infrastructure, introduced in +[rfcs#1199: SIMD Infrastructure][rfcs#1199]. It depends on +[rfcs#3729: Hierarchy of Sized traits][rfcs#3729]. + +SVE is used in examples throughout this RFC, but the proposed features should be +sufficient to enable support for similar extensions in other architectures, such +as RISC-V's V Extension. + +# Motivation +[motivation]: #motivation + +SIMD types and instructions are a crucial element of high-performance Rust +applications and allow for operating on multiple values in a single instruction. +Many processors have SIMD registers of a known fixed length and provide +intrinsics which operate on these registers. For example, Arm's Neon extension +is well-supported by Rust and provides 128-bit registers and a wide range of +intrinsics. + +Instead of releasing more extensions with ever increasing register bit widths, +AArch64 has introduced a Scalable Vector Extension (SVE). Similarly, RISC-V has +a Vector Extension (RVV). These extensions have vector registers whose width +depends on the CPU implementation and bit-width-agnostic intrinsics for +operating on these registers. By using scalable vectors, code won't need to be +re-written using new architecture extensions with larger registers, new types +and intrinsics, but instead will work on newer processors with different vector +register lengths and performance characteristics. + +Scalable vectors have interesting and challenging implications for Rust, +introducing value types with sizes that can only be known at runtime, requiring +significant changes to the language's notion of sizedness - this support is +being proposed in the [rfcs#3729]. + +Hardware is generally available with SVE, and key Rust stakeholders want to be +able to use these architecture features from Rust. In a [recent discussion on +SVE, Amanieu, co-lead of the library team, said][quote_amanieu]: + +> I've talked with several people in Google, Huawei and Microsoft, all of whom +> have expressed a rather urgent desire for the ability to use SVE intrinsics in +> Rust code, especially now that SVE hardware is generally available. + +Without support in the compiler, leveraging the +[*Hierarchy of Sized traits*][rfcs#3729] proposal, it is not possible to +introduce intrinsics and types exposing the scalable vector support in hardware. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +None of the infrastructure proposed in this RFC is intended to be used directly +by Rust users. + +`repr(scalable)` as described later in +[*Reference-level explanation*][reference-level-explanation] is perma-unstable +and exists only enables scalable vector types to be defined in the standard +library. The specific vector types are intended to eventually be stabilised, but +none are proposed in this RFC. + +## Using scalable vectors +[using-scalable-vectors]: #using-scalable-vectors + +Scalable vector types correspond to vector registers in hardware with unknown +size at compile time. However, it will be a known and fixed size at runtime. +Additional properties could be known during compilation, depending on the +architecture, such as a minimum or maximum size or that the size must be a +multiple of some factor. + +As previously described, users will not define their own scalable vector types +and instead use intrinsics from `std::arch`, and this RFC is not proposing any +such intrinsics, just the infrastructure. However, to illustrate how the types +and intrinsics that this infrastructure will enable can be used, consider the +following example that sums two input vectors: + +```rust +fn sve_add(in_a: Vec, in_b: Vec, out_c: &mut Vec) { + let len = in_a.len(); + unsafe { + // `svcntw` returns the actual number of elements that are in a 32-bit + // element vector + let step = svcntw() as usize; + for i in (0..len).step_by(step) { + let a = in_a.as_ptr().add(i); + let b = in_b.as_ptr().add(i); + let c = out_c as *mut f32; + let c = c.add(i); + + // `svwhilelt_b32` generates a mask based on comparing the current + // index against the `len` + let pred = svwhilelt_b32(i as _, len as _); + + // `svld1_f32` loads a vector register with the data from address + // `a`, zeroing any elements in the vector that are masked out + let sva = svld1_f32(pred, a); + let svb = svld1_f32(pred, b); + + // `svadd_f32_m` adds `a` and `b`, any lanes that are masked out will + // take the keep value of `a` + let svc = svadd_f32_m(pred, sva, svb); + + // `svst1_f32` will store the result without accessing any memory + // locations that are masked out + svst1_f32(svc, pred, c); + } + } +} +``` + +From a user's perspective, writing code for scalable vectors isn't too different +from when writing code with a fixed sized vector. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +Types annotated with the `#[repr(simd)]` attribute contains either an array +field or multiple fields to indicate the intended size of the SIMD vector that +the type represents. + +Similarly, a `scalable(N)` representation is introduced to define a scalable +vector type. `scalable(N)` accepts an integer to determine the minimum number of +elements the vector contains. For example: + +```rust +#[repr(simd, scalable(4))] +pub struct svfloat32_t { _ty: [f32], } +``` + +As with the existing `repr(simd)`, `_ty` is purely a type marker, used to get +the element type for the codegen backend. + +`svfloat32_t` is a scalable vector with a minimum of four `f32` elements and +potentially more depending on the length of the vector register at runtime. + +## Choosing `N` +[choosing-n]: #choosing-n + +Many intrinsics using scalable vectors accept both a predicate vector argument +and data vector arguments. Predicate vectors determine whether a lane is on or +off for the operation performed by any given intrinsic. Predicate vectors may +use different registers of sizes to the vectors containing data. +`repr(scalable)` is used to define vectors containing both data and predicates. + +As `repr(scalable(N))` is intended to be a permanently unstable attribute, any +value of `N` is accepted by the attribute and it is the responsibility of +whomever is defining the type to provide a valid value. A correct value for `N` +depends on the purpose of the specific scalable vector type and the +architecture. + +For example, with SVE, the scalable vector register length is a minimum of 128 +bits, must be a multiple of 128 bits and a power of 2; and predicate registers +have one bit for each byte in the vector registers. So, for `svfloat32_t` +defined shown above, an `f32` is 32-bits and with `N=4`, the entire minimum +register length of 128 bits is used (4 x 32 = 128). An intrinsic that takes a +`svfloat32_t` may also want to accept as an argument a predicate vector with a +matching four elements (`N=4`), which would only use 4 bits of the predicate +register rather than the full 16 bits. + +See +[*Manually-chosen or compiler-calculated element count*][manual-or-calculated-element-count] +for a discussion on why `N` is not calculated by the compiler. + +## Properties of scalable vectors +[properties-of-scalable-vector-types]: #properties-of-scalable-vectors + +Scalable vectors are necessarily non-`const Sized` (from [rfcs#3729]) as they +behave like value types but the exact size cannot be known at compilation time. + +[rfcs#3729] allows these types to implement `Clone` (and consequently `Copy`) as +`Clone` only requires an implementation of `Sized`, irrespective of constness. + +Scalable vector types have some further restrictions due to limitations of the +codegen backend: + +- Cannot be stored in compound types (structs, enums, etc) + + - Including coroutines, so these types cannot be held across an await + boundary in async functions + + - `repr(transparent)` newtypes could be permitted with scalable vectors + +- Cannot be used in arrays + +- Cannot be the type of a static variable. + +Some of these limitations may be able to be lifted in future depending on what +is supported by rustc's codegen backends. + +## ABI +[abi]: #abi + +Rust currently always passes SIMD vectors on the stack to avoid ABI mismatches +between functions annotated with `target_feature` - where the relevant vector +register is guaranteed to be present - and those without - where the relevant +vector register might not be present. + +However, this approach will not work for scalable vector types as the relevant +target feature must to be present to use the instruction that can allocate the +correct size on the stack for the scalable vector. + +Therefore, there is an additional restriction that these types cannot be used in +the argument or return types of functions unless those functions are annotated +with the relevant target feature. + +## Target features +[target-features]: #target-features + +Similarly to the issues with the ABI of scalable vectors, without the relevant +target features, few operations can actually be performed on scalable vectors - +causing issues for the use of scalable vectors in generic code and with traits. +For example, implementations of traits like `Clone` would not be able to +actually perform a clone, and generic functions that are instantiated with +scalable vectors would during instruction selection in the codegen backend. + +When a scalable vector is instantiated into a generic function during +monomorphisation, or a trait method is being implemented for a scalable vector, +then the relevant target feature will be added to the function. + +For example, when instantiating `std::mem::size_of_val` with a scalable vector +during monomorphisation, the relevant target feature will be added to `size_of_val` +for codegen. + +## Implementing `repr(scalable)` +[implementing-repr-scalable]: #implementing-reprscalable + +Implementing `repr(scalable)` largely involves lowering scalable vectors to the +appropriate type in the codegen backend. LLVM has robust support for scalable +vectors and is the default backend, so this section will focus on implementation +in the LLVM codegen backend. Other codegen backends can implement support when +scalable vectors are supported by the backend. + +Most of the complexity of SVE is handled by LLVM: lowering Rust's scalable +vectors to the correct type in LLVM and the `vscale` modifier that is applied to +LLVM's vector types. + +LLVM's scalable vector type is of the form ``. +`vscale` is the scaling factor determined by the hardware at runtime, it can be +any value providing it gives a legal vector register size for the architecture. + +For example, a `` is a scalable vector with a minimum of four +`f32` elements and with SVE, `vscale` could then be any power of two which would +result in register sizes of 128, 256, 512, 1024 or 2048 and 4, 8, 16, 32, or 64 +`f32` elements respectively. + +The `N` in the `#[repr(scalable(N))]` determines the `element_count` used in the +LLVM type for a scalable vector. + +While it is possible to change the vector length at runtime using a +[`prctl()`][prctl] call to the kernel, this would require that `vscale` change, +which is unsupported. `prctl` must only be used to set up the vector length for +child processes, not to change the vector length of the current process. As Rust +cannot prevent users from doing this, it will be documented as undefined +behaviour, consistent with C and C++. + +# Drawbacks +[drawbacks]: #drawbacks + +- `repr(scalable(N))` is inherently additional complexity to the language, + despite being largely hidden from users. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +Without support for scalable vectors in the language and compiler, it is not +possible to leverage hardware with scalable vectors from Rust. As extensions +with scalable vectors are available in architectures as either the only or +recommended way to do SIMD, lack of support in Rust would severely limit Rust's +suitability on these architectures compared to other systems programming +languages. + +By aligning with the approach taken by C (discussed in the +[*Prior art*][prior-art] below), most of the documentation that already exists +for scalable vector intrinsics in C should still be applicable to Rust. + +## Manually-chosen or compiler-calculated element count +[manual-or-calculated-element-count]: #manually-chosen-or-compiler-calculated-element-count + +`repr(scalable(N))` expects `N` to be provided rather than calculating it. This +avoids needing to teach the compiler how to calculate the required `element` +count, which isn't always trivial. + +Many of the intrinsics which accept scalable vectors as an argument also accept +a predicate vector. Predicate vectors decide which lanes are on or off for an +operation (i.e. which elements in the vector are operated on). Predicate vectors +can be in different and smaller registers than the data. For example, +`` could be the predicate vector for a `` +vector + +For non-predicate scalable vectors, it will be typical that `N` will be +`$minimum_register_length / $type_size` (e.g. `4` for `f32` or `8` for `f16` +with a minimum 128-bit register length). In this circumstance, `N` could be +trivially calculated by the compiler. + +For predicate vectors, it is desirable to be able to to define types where `N` +matches the number of elements in the non-predicate vector, i.e. a +`` to match a ``, `` to +match ``, or `` to match +``. In this circumstance, it might still be possible to give +rustc all of the relevant information such that it could compute `N`, but it +would add extra complexity. + +This RFC takes the position that the additional complexity required to have the +compiler always be able to calculate `N` isn't justified given the permanently +unstable nature of the `repr(scalable(N))` attribute and the scalable vector +types defined in `std::arch` are likely to be few in number, automatically +generated and well-tested. + +# Prior art +[prior-art]: #prior-art + +There are not many languages with support for scalable vectors: + +- SVE in C takes a similar approach as this proposal by using sizeless + incomplete types to represent scalable vectors. However, sizeless types are + not part of the C specification and Arm's C Language Extensions (ACLE) provide + [an edit to the C standard][acle_sizeless] which formally define "sizeless + types". +- [.NET 9 has experimental support for SVE][dotnet], but as a managed language, + the design and implementation considerations in .NET are quite different to + Rust. + +[rfcs#3268] was a previous iteration of this RFC. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +There are currently no unresolved questions. + +# Future possibilities +[future-possibilities]: #future-possibilities + +There are a handful of future possibilities enabled by this RFC: + +## General mechanism for target-feature-affected types +[general-mechanism-target-feature-types]: #general-mechanism-for-target-feature-affected-types + +A more general mechanism for enforcing that SIMD types are only used in +`target_feature`-annotated functions would be useful, as this would enable SVE +types to have fewer distinct restrictions than other SIMD types, and would +enable SIMD vectors to be passed by-register, a performance improvement. + +Such a mechanism would need be introduced gradually to existing SIMD types with +a forward compatibility lint. This will be addressed in a forthcoming RFC. + +## Relaxed restrictions +[relaxed-restrictions]: #relaxed-restrictions + +Some of the restrictions on these types (e.g. use in compound types) could be +relaxed at a later time either by extending rustc's codegen or leveraging newly +added support in LLVM. + +However, as C also has restriction and scalable vectors are nevertheless used in +production code, it is unlikely there will be much demand for those restrictions +to be relaxed. + +## Portable SIMD +[portable-simd]: #portable-simd + +Given that there are significant differences between scalable vectors and +fixed-length vectors, and that `std::simd` is unstable, it is worth +experimenting with architecture-specific support and implementation initially. +Later, there are a variety of approaches that could be taken to incorporate +support for scalable vectors into Portable SIMD. + +[acle_sizeless]: https://arm-software.github.io/acle/main/acle.html#formal-definition-of-sizeless-types +[dotnet]: https://github.com/dotnet/runtime/issues/93095 +[prctl]: https://www.kernel.org/doc/Documentation/arm64/sve.txt +[rfcs#1199]: https://rust-lang.github.io/rfcs/1199-simd-infrastructure.html +[rfcs#3268]: https://github.com/rust-lang/rfcs/pull/3268 +[rfcs#3729]: https://github.com/rust-lang/rfcs/pull/3729 +[quote_amanieu]: https://github.com/rust-lang/rust/pull/118917#issuecomment-2202256754