Skip to content

Conversation

Firestar99
Copy link
Member

@Firestar99 Firestar99 commented Sep 8, 2025

Requires glam spirv PRs:

Requires rust-gpu PRs:

Old embark PR trying to do the same, though I made this PR is made from scratch due to the branches being too different: https://github.com/EmbarkStudios/rust-gpu/pull/1158/files

Objective

Currently, rust-gpu has special layout rules for glam types, which gives them a higher alignment than necessary. This causes structs using them to have a different layout (size, alignment, member offsets) in rust-gpu than they do on the CPU, causing a lot of UB when sending data between host and device. Worst of them is Vec3, which has a size of 12 on the CPU but 16 in rust-gpu, which is why we always recommended not to use them in structs shared between targets.

Another common workaround was to use glam's feature cuda, which would give all glam vectors a higher than necessary alignment. These alignments seem (not sure!) match those of rust-gpu, though they are defined in very different places and could drift out of sync.

This is a mess, and this PR tries to clean it up by:

  • have SpirvType::Vector size and alignment to match layout of the type
  • remove all spirv-specific layout hacks in glam
  • replacing #[repr(SIMD)] with a new custom #[rust_gpu::vector::v1]
  • glam/cuda feature will continue to work, though must be turned on on both targets!
  • we get support for glam::BVecN (bool vecs) for free

Size and Alignment chart

(rust-gpu chooses chaotic evil)

type offset in
difftest
CPU current removing
SIMD hacks
removing glam
feature gates
size align size align size align size align
UVec2 0x400 8 4 8 *8* 8 4 8 4
IVec2 0x500 8 4 8 *8* 8 4 8 4
Vec2 0x600 8 4 8 *8* 8 4 8 4
UVec3 0x800 12 4 *16* *16* 12 4 12 4
IVec3 0x900 12 4 *16* *16* 12 4 12 4
Vec3 0xa00 12 4 *16* *16* 12 4 12 4
Vec3A 0xb00 16 16 16 16 *12* *4* 16 16
UVec4 0xc00 16 4 16 *16* 16 4 16 4
IVec4 0xd00 16 4 16 *16* 16 4 16 4
Vec4 0xe00 16 16 16 16 16 *4* 16 16

Notes:

  • removing SIMD hacks: misalignment caused by glam hiding alignment from spirv, see comment below
  • current implementation matches CPU exactly, but requires modifications in glam to remove these feature gates hiding alignment
  • if we have to update glam anyway, may as well remove our and glam's #[repr(SIMD)] hacks and replace them with a custom #[spirv(vector)] that we control (and rustc can't change on a whim)
  • the new abi::vector_layout* difftest tests every single glam type, not just vecs
    • we also test with glam/cuda in abi::vector_layout_cuda and glam/scalar-math in abi::vector_layout_scalar_math

@Firestar99 Firestar99 changed the base branch from main to difftest_refactor September 8, 2025 15:20
@Firestar99 Firestar99 force-pushed the vec3-12-bytes branch 3 times, most recently from d8b1b46 to 11c141a Compare September 11, 2025 17:25
@LegNeato
Copy link
Collaborator

Hard for me to grasp the why. What are the pros and cons here? Will this break projects like https://github.com/LegNeato/rust-gpu-chimera or will it align spirv with what glam w/ cuda feature and/or rust-cuda does?

#[repr(transparent)]
#[derive(Copy, Clone, Default, Eq, PartialEq)]
#[cfg_attr(feature = "bytemuck", derive(bytemuck::Zeroable, bytemuck::Pod))]
pub struct SubgroupMask(pub glam::UVec4);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of changing it from a separate struct to a typedef. The #[repr(transparent)] stopped working since Vec4 from rustc's perspective is just a regular struct, which causes us to emit a struct wrapping OpTypeVector f32 4 instead of the OpTypeVector directly. In glsl you explicitly use UVec4 for the bitmasks, which I'm not a big fan of since they represent something quite different than just a UVec4. So I've opted for a wrapper struct that you may manually construct or destruct if you need to.

Alternative to keep the struct: I have a branch where I've adjusted all the intrinsics to unwrap the struct, which works flawlessly. The problem are entry point params like these:

#[spirv(subgroup_eq_mask)] subgroup_eq_mask: SubgroupMask,
#[spirv(subgroup_ge_mask)] subgroup_ge_mask: SubgroupMask,
#[spirv(subgroup_gt_mask)] subgroup_gt_mask: SubgroupMask,
#[spirv(subgroup_le_mask)] subgroup_le_mask: SubgroupMask,
#[spirv(subgroup_lt_mask)] subgroup_lt_mask: SubgroupMask,

I'd have to refactor our entry point logic to allow us to not just pass a plain BuiltIn, but also modify it aka. wrap it with a SubgroupMask. But since these are rarely used (like even more rare than any of the subgroup intrinsics themselves), I could see us changing their type to UVec4 and having the end user manually convert them, for now. Whenever we decide to revisit BuiltIns we can fix that. (I really want to change most read-only builtins to global functions in spirv-std, so they're not just magic values you have to know / research in the code)

@nnethercote
Copy link
Contributor

nnethercote commented Sep 18, 2025

Vec3A 0xb00 16 16 16 16 *12* *4*
Vec4 0xe00 16 16 16 16 16 *4*

Why is Vec3A's size only 12 bytes?

Why are Vec3A and Vec4s alignment only 4 bytes?

@Firestar99
Copy link
Member Author

glam is explicitly hiding the alignment specifiers on our spirv target. The table is a bit outdated, I already removed the cfg's in the glam branch this repo links to.

I'm also on like the 3rd implementation, each time solving it slightly differently, but I'm happy with what I have right now. Though it'll block the release until glam has released a minor 0.30.6 patch for us fixing this on glam's side and we'll loose backwards compat.

#[cfg_attr(not(target_arch = "spirv"), repr(align(16)))]
#[cfg_attr(not(target_arch = "spirv"), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
pub struct Vec3A {
    pub x: f32,
    pub y: f32,
    pub z: f32,
}

https://github.com/bitshifter/glam-rs/blob/9a992a8bacba784053368bc0b7000c95aa5895b6/src/f32/scalar/vec3a.rs#L30-L37

#[cfg_attr(
    any(
        not(any(feature = "scalar-math", target_arch = "spirv")),
        feature = "cuda"
    ),
    repr(align(16))
)]
#[cfg_attr(not(target_arch = "spirv"), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
pub struct Vec4 {
    pub x: f32,
    pub y: f32,
    pub z: f32,
    pub w: f32,
}

https://github.com/bitshifter/glam-rs/blob/9a992a8bacba784053368bc0b7000c95aa5895b6/src/f32/scalar/vec4.rs#L27-L41

@FacelessTiger
Copy link

With the glam patch will it only match on default settings, or if you use the scalar-math feature will Vec4 be 4 byte alignement for example

@Firestar99
Copy link
Member Author

I assumed there's going to me more issues with glam's cuda and scalar-math features, since both of them modify alignments of structs. Currently modifying the abi::vector_layout difftest to also test with these features and verifying that feature unification didn't mess with the results.

(T, U) pairs in entrypoints and regular functions are now supported.
@Firestar99
Copy link
Member Author

Firestar99 commented Sep 23, 2025

Review guide

@LegNeato @eddyb @nnethercote This is ready to review, but merging is a bit more involved:

  1. After glam PRs are merged, we can merge with a git rev glam until glam has released a new version
  2. Or we can delay the merge until glam has merged their PRs and released a new version

What do you think?

Note: Just a few days ago glam released 0.30.6, so our SPIRV changes will hopefully be released in 0.30.7, not .6 as I may have mentioned previously.

Also the diff has both #381 and #401 in it as well, so this is the actual diff or review commit by commit.

@Firestar99 Firestar99 changed the base branch from main to tool_rename September 23, 2025 11:23
@nnethercote
Copy link
Contributor

If glam is likely to issue a new release quickly, I would just wait for that and skip the "merge with a git rev" step, avoiding unnecessary work.

#[cfg(not(target_arch = "spirv"))]
pub mod cpu_driver;
pub mod layout;
pub mod shader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these modules needed? Splitting the code across so many modules and files just makes it harder to read.

let r_mix = 8u32 ^ 16u32 ^ 0.5f32.to_bits() ^ (-2.0f32).to_bits();
out[12] = p_mix;
out[13] = q_mix;
out[14] = r_mix;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function is computing something very specific but I don't know what it is. A brief comment would be useful.

Copy link
Member Author

@Firestar99 Firestar99 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know either, that's from the scalar pair PR: https://github.com/Rust-GPU/rust-gpu/pull/381/files#r2375700762

@nnethercote
Copy link
Contributor

I don't know much about this code yet but I've given a few points of feedback.

@Firestar99

This comment was marked as resolved.

pub struct NotVectorField2 {
_x: NonZeroU32,
_y: NonZeroU32,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiosity: This does not error, as NonZeroU32 (with #[repr(transparent)]) compiles to a SpirvType::Integer. @eddyb is this intented?

I tried exploiting this to get a 0 value, but what I came up with is rather contrived due to needing to implement VectorOrScalar with false information:

// build-fail
// compile-flags: -C target-feature=+GroupNonUniform,+GroupNonUniformArithmetic

use spirv_std::spirv;
use spirv_std::arch::subgroup_i_add;
use spirv_std::glam::UVec3;
use core::num::NonZeroI32;
use core::num::NonZeroUsize;

#[rust_gpu::vector::v1]
#[derive(Copy, Clone)]
pub struct MyVec {
    x: NonZeroI32,
    y: NonZeroI32,
}

impl Default for MyVec {
    fn default() -> Self {
        Self {
            x: NonZeroI32::new(1).unwrap(),
            y: NonZeroI32::new(1).unwrap(),
        }
    }
}

unsafe impl spirv_std::vector::VectorOrScalar for MyVec {
    type Scalar = i32;
    const DIM: NonZeroUsize = NonZeroUsize::new(2).unwrap();
}

unsafe impl spirv_std::vector::Vector<i32, 2> for MyVec {}

#[spirv(compute(threads(2)))]
pub fn entry(
    #[spirv(local_invocation_id)] tid: UVec3,
) {
    let value = if tid.x == 0 {-1} else {1};
    let vec = MyVec {
        x: NonZeroI32::new(value).unwrap(),
        y: NonZeroI32::new(value).unwrap(),
    };
    let result = subgroup_i_add(vec);

    // assert_eq! fails to compile
    if result.x.get() == 0 && result.y.get() == 0 {
        panic!("guaranteed");
    }
}

@Firestar99
Copy link
Member Author

glam has released 0.30.8 17h ago. So the semver breaking change is in, so you can expect some people reporting issues, if they are on a git rev or even current master.

I've made a small announcement on the discord:

Semver breakage in glam 0.30.8

glam 0.30.8 just released and contains some breaking changes for rust-gpu. Do NOT update beyond 0.30.7 until PR 380 is merged. If you want to, you can fix the glam version like this to prevent cargo update from updating it:

glam = "=0.30.7"

Background: How many of you fell into the trap that is Vec3, IVec3 or UVec3? Cause they have a size of 12 bytes on the CPU, but a size of 16 on the GPU. This can mess up your struct layout, so sending data from one to the other can give you a lot of garbage and out of bounds. This PR will fix that and ensure the GPU layout matches the CPU layout exactly, and you don't have to ever think about it again. But it required patching glam with a breaking change, and to stay compatible with current libraries, we decided to break semver.

@tuguzT

This comment was marked as off-topic.

@ARez2

This comment was marked as off-topic.

@tuguzT

This comment was marked as off-topic.

Copy link
Contributor

@fu5ha fu5ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this is a better way than continuing to sorta bastardize #[repr(simd)] like my draft pr did. Thanks for pushing this forward and getting it done!

@LegNeato
Copy link
Collaborator

I'm not qualified to review this, so I will let others look at it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants