Skip to content

repr(scalable) #3838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

repr(scalable) #3838

wants to merge 5 commits into from

Conversation

davidtwco
Copy link
Member

@davidtwco davidtwco commented Jul 14, 2025

Supercedes #3268.

Extends Rust's existing SIMD infrastructure, #[repr(simd)], with a complementary scalable representation, #[repr(scalable)], to support scalable vector types, such as Arm's Scalable Vector Extension (SVE), or RISC-V's Vector Extension (RVV).

Like the existing repr(simd) representation, repr(scalable) is internal compiler infrastructure that will be used only in the standard library to introduce scalable vector types which can then be stablised. Only the infrastructure to define these types are introduced in this RFC, not the types or intrinsics that use it.

  • It is hoped that the majority of the interesting properties of these types from the perspective of the language team are proposed in Hierarchy of Sized traits #3729.
    • Any further restrictions that this RFC proposes are those that are strictly necessary due to limitations of the codegen backend.
    • It is not expected that this RFC would be accepted in the short-term, only after Hierarchy of Sized traits #3729 and its dependencies are accepted.
  • This RFC isn't a significant departure from RFC: Add a scalable representation to allow support for scalable vectors #3268, but it is being re-opened as much of the discussion on it is no longer relevant - there were many iterations and changes in scope since being opened in 2022.
  • Given that this proposes a perma-unstable attribute for use in the standard library and Hierarchy of Sized traits #3729 defines the majority of the interesting properties of these types, I hope that this RFC can be under the purview of t-compiler/t-libs - I've asked about this in #t-lang on Zulip.

Rendered


Similarly, a `scalable` repr is introduced to define a scalable vector type.
`scalable` accepts an integer to determine the minimum number of elements the
vector contains. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be that the amount of elements in the vector needs to be a whole multiple of the scalable value? Otherwise scalable(4) would allow 5 elements, which would need to be represented as <vscale x 1 x f32> in LLVM rather than <vscale x 4 x f32>.

Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked the same question in the previous RFC, unfortunately that never got resolved and the new RFC perpetuates the same confusion. Also see this subthread. @davidtwco It would be good to ensure that all the valuable feedback the previous RFC got is not just deleted and forgotten for the 2nd revision.

For the question at hand:
Yeah, this description of the argument as "minimum" is extremely misleading. This field apparently must be set to the "hardware scaling unit" (I don't know the proper term for this, but it's 128bit for ARM) divided by the size of the field. Everything else would at best be a giant waste, if it even works. For instance, IIUC, putting scalable(2) here means only half the register will ever get used (no matter the register size). I wonder why we even let the code pick that field at all; it seems to me the compiler should just compute it. Is there ever a situation where a 4-byte element type should not use scalable(4) on ARM?

Copy link
Member Author

@davidtwco davidtwco Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I follow - the N in repr(scalable(N)) is the N in <vscale x N x f32>. There's a correct value for that N for any given valid type that the user of repr(scalable) needs to write. We'd need to get that correct when defining these types in stdarch but because this is an internal-facing thing, that's fine, we can make sure they're correct.

I can only really speak to SVE, but as I understand it, N is basically "how many of this type can you fit in the minimum vector length?". For a f32 on SVE, that's four, because the minimum vector length is 128 bits and a f32 is 32 bits. vscale is the hardware-dependent part that we don't pick, on some processors, it'll be one and you just have the 4x f32, on some it'll be two and you have 8x f32, etc.

Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part of my comment is confusing? I even asked a concrete question:

Is there ever a situation where a 4-byte element type should not use scalable(4) on ARM?

It seems the answer is "no", which then means there is apparently no motivation for having this degree of freedom in the first place?

I think you confirmed my theory from above:

  • Calling this the minimum is very confusing. If I set the minimum to 2 I would expect that an array of size 8 still uses the most efficient representation for that size, but that's not what happens here.
  • There is only one desired value for this "minimum size", and it can be computed by the compiler.

I don't understand why we should be exposing this very confusing parameter, even if it's just exposed to libcore. That still needs good documentation and testing for all possible values etc. The RFC surely does not do a good job documenting it. (And no, relating it to some obscure LLVM concept doesn't count. This needs a self-contained explanation.)

Copy link
Member Author

@davidtwco davidtwco Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part of my comment is confusing?

GitHub only showed me bjorn's comment when I replied, just seeing yours now, apologies.

I asked the #3268 (comment) in the previous RFC, unfortunately that never got resolved and the new RFC perpetuates the same confusion. Also see #3268 (comment). @davidtwco It would be good to ensure that all the valuable feedback the previous RFC got is not just deleted and forgotten for the 2nd revision.

Apologies, I tried to capture anything that seemed like it was still relevant but may have missed some threads.

Yeah, this description of the argument as "minimum" is extremely misleading. This field apparently must be set to the "hardware scaling unit" (I don't know the proper term for this, but it's 128bit for ARM) divided by the size of the field. Everything else would at best be a giant waste, if it even works. For instance, IIUC, putting scalable(2) here means only half the register will ever get used (no matter the register size). I wonder why we even let the code pick that field at all; it seems to me the compiler should just compute it. Is there ever a situation where a 4-byte element type should not use scalable(4) on ARM?

It seems the answer is "no", which then means there is apparently no motivation for having this degree of freedom in the first place?

I think you confirmed my theory from above:

Calling this the minimum is very confusing. If I set the minimum to 2 I would expect that an array of size 8 still uses the most efficient representation for that size, but that's not what happens here.
There is only one desired value for this "minimum size", and it can be computed by the compiler.
I don't understand why we should be exposing this very confusing parameter, even if it's just exposed to libcore.

I phrased it as "minimum" considering the number of elements that might be present when one includes vscale in that calculation. e.g. for a svint32_t, the minimum number of elements is four (when you have a 128-bit register and vscale=1), but it could be greater than four: it could be eight (when you have a 256-bit register and vscale=2) or it could be sixteen (when you have a 512-bit register and vscale=4), etc. This wording is clearly confusing though, so I'll try to make it clearer.

In the "Reference-level explanation", I do mention why the compiler doesn't compute N (though that should probably be in the "Rationale and alternatives" section):

repr(scalable) expects the number of elements to be provided rather than calculating it. This avoids needing to teach the compiler how to calculate the required element count, particularly as some of these scalable types can have different element counts. For instance, the predicates used in SVE have different element counts depending on the types they are a predicate for.

I'll expand this in the RFC also, but to elaborate: For SVE, many of the intrinsics take a predicate vector alongside the data vectors, and the predicate vector decides which lanes are on or off for the operation (i.e. which elements in the vector are operated on: all of them, none of them, even-numbered indices, whatever you want). Predicate vectors are in different and smaller registers than the data, these have a bit for every byte in the vector register (for a minimum size of 16 bits). e.g. <vscale x 16 x i8> vector has a <vscale x 16 x i1> predicate.

For the non-predicate vectors, you're right that we're basically only going to want to define them with the minimum element count that uses the whole minimum register size. However, for predicate vectors, we want to define types where the N matches the number of elements in the non-predicate vector, i.e. a <vscale x 4 x i1> to match a <vscale x 4 x f32>, <vscale x 8 x i1> to match <vscale x 8 x u16>, or <vscale x 16 x i1> to match <vscale x 16 x u8>. Some of the sign-extension intrinsics use types with non-128-bit multiples inside their definitions too, though I'd need to refresh my memory on the specifics for those.

That makes makes it trickier than just $min_register_size / $type_size, which would always be <vscale x 16 x i1>. We could do something more complicated that gives the compiler all the information it needs to be able to calculate this in every circumstance, but that's a bunch of extra complexity for types that are going to be defined once in the standard library and then well-tested.

That still needs good documentation and testing for all possible values etc. The RFC surely does not do a good job documenting it. (And no, relating it to some obscure LLVM concept doesn't count. This needs a self-contained explanation.)

I don't think it needs testing for all possible values. Just those that we end up defining to use with the intrinsics, and those will all get plenty of testing.


It's a bit of a balancing act how much detail to provide about how the scalable vectors work or what rustc's contract with the codegen backend is, and how much to omit as too much detail. Especially when somewhat familiar with these types/extensions, it's easy to miss where we've gone too far one way or the other, so I appreciate the feedback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch more wording along the lines of what I've written in my previous comment in 5f5f7d2, let me know if that clears things up.

Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub only showed me bjorn's comment when I replied, just seeing yours now, apologies.

Ah, that's fair. :)

Thanks for clarifying. I guess with predicates the point is that they do indeed not use the entire width of the corresponding register but that's fine -- we only want one bit per element, the hardware has one bit per byte, and deals with the mismatch?

Added a bunch more wording along the lines of what I've written in my previous comment in 5f5f7d2, let me know if that clears things up.

You still introduce this number as the "minimum", which is still confusing. It's not just the minimum, it is the "quantum" or whatever you want to call it -- the vector will always have a length that is an integer multiple of this number. (And moreover, it'll be the same integer multiple for all types.) It is formally correct that this number is also the minimum size, but by leading with that you send the reader down the entirely wrong track.

I don't think it needs testing for all possible values. Just those that we end up defining to use with the intrinsics, and those will all get plenty of testing.

I can almost guarantee that some library contributor will one day pick a wrong value for this and they'll have a really bad time if the result is some odd garbage behavior. Even internal interfaces need to be designed with care and tested in an "open source hive mind" project like Rust. The only fundamental difference to external interfaces is that internal interfaces are perma-unstable so we can more easily adjust them if mistakes have been made. But in terms of documentation, the absolutely need as much care as stable features, and they should get proper testing too.

Comment on lines +92 to +93
fn sve_add(in_a: Vec<f32>, in_b: Vec<f32>, out_c: &mut Vec<f32>) {
let len = in_a.len();
Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are assuming here that len is a multiple of step? The scalar version would not typically make such an assumption, so this is worth calling out.

EDIT: Or does the svwhilelt_b32 part handle that? The comments are not clear enough for me to tell. The mask being "based on" index and len doesn't say how it is based on them...

Comment on lines +104 to +105
// `svwhilelt_b32` generates a mask based on comparing the current
// index against the `len`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// `svwhilelt_b32` generates a mask based on comparing the current
// index against the `len`
// `svwhilelt_b32` deals with the tail of the iteration: it generates a mask that
// is enabled for the first `len` elements overall, but disables the last `len % step`
// elements in the last iteration.

Is this correct? I am guessing here.

let pred = svwhilelt_b32(i as _, len as _);

// `svld1_f32` loads a vector register with the data from address
// `a`, zeroing any elements in the vector that are masked out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And crucially, those "zeroed" elements never even get loaded, so it is okay of they are out-of-bounds?

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is the correct approach nor implementation. It is invested in reusing an existing framework in repr(simd) which has many undesirable features.

Comment on lines +128 to +137
# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Types annotated with the `#[repr(simd)]` attribute contains either an array
field or multiple fields to indicate the intended size of the SIMD vector that
the type represents.

Similarly, a `scalable(N)` representation is introduced to define a scalable
vector type. `scalable(N)` accepts an integer to determine the minimum number of
elements the vector contains. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use repr(C) because otherwise some hypothetical CStruct repr type would have to be parameterized by every single arrangement of types, in an ordered fashion.

But repr(simd) does not pair with either repr(packed) or repr(align(N)) coherently, and neither would repr(scalable).

I do not think this should be handled by a new repr... modeling it on a still-unstable repr(simd)... which, as we have discovered over time with existing reprs, has a million footguns for how they work and get compiled. A lang item seems more to-the-point. I intend to make this true for repr(simd) as well anyways.


```rust
#[repr(simd, scalable(4))]
pub struct svfloat32_t { _ty: [f32], }
Copy link
Member

@workingjubilee workingjubilee Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the aspects of using these random repr attributes is it is misleading. This looks like it is unsized to begin with, but these are unconst Sized.

Comment on lines +216 to +218
Therefore, there is an additional restriction that these types cannot be used in
the argument or return types of functions unless those functions are annotated
with the relevant target feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not mean that these types cannot be used in function pointers, as we have no knowledge of the target features of a function pointer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason it would also clash with dyn-compatible traits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is similar to the existing checks we have for __m256 only being allowed in extern "C" functions when AVX is available. They can be used in function ptrs just fine, because someone somewhere must have made the unsafe promise that it is okay to call a #[target_feature] function since we actually do have the target feature.

Comment on lines +159 to +163
As `repr(scalable(N))` is intended to be a permanently unstable attribute, any
value of `N` is accepted by the attribute and it is the responsibility of
whomever is defining the type to provide a valid value. A correct value for `N`
depends on the purpose of the specific scalable vector type and the
architecture.
Copy link
Member

@workingjubilee workingjubilee Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of abdication of responsibility in practice, when the rules are simply not that hard to work out, just leaves it harder to maintain the compiler in the future as it amounts to saying "We don't have to write down any reasoning about validity, not even in documentation, and we will allow someone in the future to discover why our decisions were fragile and sometimes incorrect without protecting them against more easily-induced miscompilations". I realize rust-lang has accepted this kind of reasoning in the past, (edited by moderator) future maintainers once you can no longer ask the previous ones due to their inevitable burnout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if you engaged on this RFC without describing decisions you disagree with as "sucks ass" or "pissing in the eyes of future maintainers". I'm open to discussing the details of this proposal and changing them, as with any proposal, but I'm much more open to that when concerns are phrased like this concern was in Ralf's comment rather than how you've chosen to phrase yours.

Comment on lines +216 to +218
Therefore, there is an additional restriction that these types cannot be used in
the argument or return types of functions unless those functions are annotated
with the relevant target feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason it would also clash with dyn-compatible traits.

Comment on lines +230 to +232
When a scalable vector is instantiated into a generic function during
monomorphisation, or a trait method is being implemented for a scalable vector,
then the relevant target feature will be added to the function.
Copy link

@hanna-kruppe hanna-kruppe Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding target features to functions based on the types being mentioned in them was also proposed for a different reason in #3525 but that ran into all sorts of complications (I'm not sure off-hand which of these are relevant here).

Unlike that other RFC, what's proposed here isn't even visible in the function signature and not tied to some "if a value of this type exists, the target feature must be enabled, so it's safe" reasoning, which causes new problems. So far the unsafe enforcement and manual safety reasoning for target features assumes it's clear which target features a function has, long before monomorphization. For example, consider this program:

// crate A
fn foo<T: Sized>() -> usize {
    size_of::<T>()
}

// crate B
use crate_a::foo;
fn main() {
    println!("vector size: {}", foo::<svfloat32_t>());
}

If the instantiation of foo gets the target feature, it needs to be unsafe to call. How can the compiler ensure this? And how is it surfaced to the programmer so they can write the correct #[cfg] or is_aarch64_feature_detected!(...) condition?

Copy link
Member

@RalfJung RalfJung Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding target features depending on the concrete instantiation is also fundamentally incompatible with how target features currently work in rustc. Of course the architecture can be changed, but it's a quite non-trivial change, and it has non-trivial ramifications: the MIR inliner crucially relies (for soundness) on knowing the target features of a function without knowing the concrete instance of the generics (inlining a function with more features into a function with fewer features is unsound), so functions that are generic in their target feature set risk serious perf regressions due the MIR inliner becoming less effective. Making every generic function potentially have target-feature thus seems to be like a non-starter both in terms of soundness (that's @hanna-kruppe's issue) and because it completely kills the MIR inliner for generic code (which is the code where we most care about the MIR inliner).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a scalable vector is instantiated into a generic function during
monomorphisation

It's also somewhat unclear what this means -- does instantiating T with fn() -> svfloat32_t also add the target feature? It pretty much has to, for soundness.

But then what about something like this

trait Tr { type Assoc; fn make() -> Self::Assoc }

fn foo<T: Tr>() {
  size_of_val(&T::make());
}

Now if I call foo with a type T such that T::Assoc is svfloat32_t, I also need the target feature:

impl Tr for i32 { type Assoc = svfloat32_t; ...}

foo::<i32>(); // oopsie, this must be unsafe

The target feature is needed any time svfloat32_t is in any way reachable via the given trait instance. This is not something the type system can even express, which also means I don't think it can possibly be sound.

Comment on lines +304 to +307
For non-predicate scalable vectors, it will be typical that `N` will be
`$minimum_register_length / $type_size` (e.g. `4` for `f32` or `8` for `f16`
with a minimum 128-bit register length). In this circumstance, `N` could be
trivially calculated by the compiler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this even this part doesn't really work for RVV, which has the added dimension of LMUL: there's not a single $minimum_register_length but several useful sizes of N for every element type. See the table at https://llvm.org/docs//RISCV/RISCVVectorExtension.html#mapping-to-llvm-ir-types


Therefore, there is an additional restriction that these types cannot be used in
the argument or return types of functions unless those functions are annotated
with the relevant target feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean for generic functions? I could write a fn foo<T>(x: Box<T>) that internally passes a T to some function by-value. Exactly during which stage of the compiler do we stop compilation if T ends up being a scalable vector type?

It seems to me that the only practical answer is "during monomorphization". Post-mono errors are rather undesirable though (and this would be easily observable by users on stable) so this would definitely need t-lang signoff. Please explicitly not this as a drawback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed the sentence about automatically adding target features to generic functions... for reasons discussed here, that is unfortunately not practical, so I discarded it as an option in my comment above.


`repr(scalable)` as described later in
[*Reference-level explanation*][reference-level-explanation] is perma-unstable
and exists only enables scalable vector types to be defined in the standard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and exists only enables scalable vector types to be defined in the standard
and exists only to enable scalable vector types to be defined in the standard


```rust
fn sve_add(in_a: Vec<f32>, in_b: Vec<f32>, out_c: &mut Vec<f32>) {
let len = in_a.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let len = in_a.len();
assert_eq!(a.len(), b.len());
assert_eq!(a.len(), c.len());
let len = in_a.len();

[reference-level-explanation]: #reference-level-explanation

Types annotated with the `#[repr(simd)]` attribute contains either an array
field or multiple fields to indicate the intended size of the SIMD vector that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as of rust-lang/rust#129403, #[repr(simd)] only supports an array field.

Without support for scalable vectors in the language and compiler, it is not
possible to leverage hardware with scalable vectors from Rust. As extensions
with scalable vectors are available in architectures as either the only or
recommended way to do SIMD, lack of support in Rust would severely limit Rust's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming you're referring to RISC-V V by "only", you can use fixed-length vectors (e.g. std::simd::Simd<f32, 16>) just fine, LLVM just uses the minimum guaranteed size when deciding how many registers to use, so assuming the V extension (which implies Zvl128b giving a minimum length of 128 bits), a std::simd::Simd<f32, 16> would take 4 registers (though LLVM would probably just set LMUL to 4 and use 1 register).

compiler always be able to calculate `N` isn't justified given the permanently
unstable nature of the `repr(scalable(N))` attribute and the scalable vector
types defined in `std::arch` are likely to be few in number, automatically
generated and well-tested.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want there to be a std::simd way to use scalable vectors at some point, so we would need a way for the compiler to either generate the correct N or be able to correctly handle any N (within some architecture-independent bounds, e.g. N is always 1, 2, 4, 8, or 16) by translating to some other N that the current architecture can handle.

types to have fewer distinct restrictions than other SIMD types, and would
enable SIMD vectors to be passed by-register, a performance improvement.

Such a mechanism would need be introduced gradually to existing SIMD types with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Such a mechanism would need be introduced gradually to existing SIMD types with
Such a mechanism would need to be introduced gradually to existing SIMD types with


However, as C also has restriction and scalable vectors are nevertheless used in
production code, it is unlikely there will be much demand for those restrictions
to be relaxed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structs containing a scalable SIMD type may actually end up being in large demand due to project-portable-simd:
One idea IIRC we've talked about is that we would end up with something kinda like a SIMD version of ArrayVec where we'd have a struct that contains both an active length (like Vec::len, not to be confused with vscale which is more like Vec::capacity) and a scalable SIMD type, so we could do operations where the tail of the SIMD type is not valid but user code wouldn't need unsafe since those extra elements are ignored by using auto-generated masks or setvl.

this would need structs like so:

pub struct SimdArrayVec<T> {
    len: usize,
    value: ScalableSimd<T>,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants