-
Notifications
You must be signed in to change notification settings - Fork 72
abi: Make glam
types abi layout match CPU
#380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: tool_rename
Are you sure you want to change the base?
Conversation
a881c58
to
acf572d
Compare
acf572d
to
5c6932a
Compare
0c2df59
to
23fe024
Compare
d8b1b46
to
11c141a
Compare
Hard for me to grasp the why. What are the pros and cons here? Will this break projects like https://github.com/LegNeato/rust-gpu-chimera or will it align spirv with what glam w/ cuda feature and/or rust-cuda does? |
4669fa0
to
e77ed1c
Compare
f4af86b
to
1b4b2d9
Compare
e77ed1c
to
c8e6c60
Compare
#[repr(transparent)] | ||
#[derive(Copy, Clone, Default, Eq, PartialEq)] | ||
#[cfg_attr(feature = "bytemuck", derive(bytemuck::Zeroable, bytemuck::Pod))] | ||
pub struct SubgroupMask(pub glam::UVec4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of changing it from a separate struct to a typedef. The #[repr(transparent)]
stopped working since Vec4
from rustc's perspective is just a regular struct, which causes us to emit a struct wrapping OpTypeVector f32 4
instead of the OpTypeVector
directly. In glsl you explicitly use UVec4
for the bitmasks, which I'm not a big fan of since they represent something quite different than just a UVec4
. So I've opted for a wrapper struct that you may manually construct or destruct if you need to.
Alternative to keep the struct: I have a branch where I've adjusted all the intrinsics to unwrap the struct, which works flawlessly. The problem are entry point params like these:
rust-gpu/tests/compiletests/ui/arch/subgroup/subgroup_builtins.rs
Lines 11 to 15 in d9bb8aa
#[spirv(subgroup_eq_mask)] subgroup_eq_mask: SubgroupMask, | |
#[spirv(subgroup_ge_mask)] subgroup_ge_mask: SubgroupMask, | |
#[spirv(subgroup_gt_mask)] subgroup_gt_mask: SubgroupMask, | |
#[spirv(subgroup_le_mask)] subgroup_le_mask: SubgroupMask, | |
#[spirv(subgroup_lt_mask)] subgroup_lt_mask: SubgroupMask, |
I'd have to refactor our entry point logic to allow us to not just pass a plain BuiltIn
, but also modify it aka. wrap it with a SubgroupMask
. But since these are rarely used (like even more rare than any of the subgroup intrinsics themselves), I could see us changing their type to UVec4
and having the end user manually convert them, for now. Whenever we decide to revisit BuiltIns we can fix that. (I really want to change most read-only builtins to global functions in spirv-std, so they're not just magic values you have to know / research in the code)
aa8292c
to
f14db4e
Compare
Why is Why are |
glam is explicitly hiding the alignment specifiers on our spirv target. The table is a bit outdated, I already removed the cfg's in the glam branch this repo links to. I'm also on like the 3rd implementation, each time solving it slightly differently, but I'm happy with what I have right now. Though it'll block the release until glam has released a minor #[cfg_attr(not(target_arch = "spirv"), repr(align(16)))]
#[cfg_attr(not(target_arch = "spirv"), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
pub struct Vec3A {
pub x: f32,
pub y: f32,
pub z: f32,
} #[cfg_attr(
any(
not(any(feature = "scalar-math", target_arch = "spirv")),
feature = "cuda"
),
repr(align(16))
)]
#[cfg_attr(not(target_arch = "spirv"), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
pub struct Vec4 {
pub x: f32,
pub y: f32,
pub z: f32,
pub w: f32,
} |
With the glam patch will it only match on default settings, or if you use the |
I assumed there's going to me more issues with glam's |
f14db4e
to
a727c08
Compare
1b4b2d9
to
24a9bc9
Compare
a727c08
to
b17666f
Compare
38649e4
to
b91eda9
Compare
(T, U) pairs in entrypoints and regular functions are now supported.
Review guide@LegNeato @eddyb @nnethercote This is ready to review, but merging is a bit more involved:
What do you think? Note: Just a few days ago glam released Also the diff has both #381 and #401 in it as well, so this is the actual diff or review commit by commit. |
87e47c7
to
32eb0c7
Compare
32eb0c7
to
8bdec82
Compare
If glam is likely to issue a new release quickly, I would just wait for that and skip the "merge with a git rev" step, avoiding unnecessary work. |
#[cfg(not(target_arch = "spirv"))] | ||
pub mod cpu_driver; | ||
pub mod layout; | ||
pub mod shader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all these modules needed? Splitting the code across so many modules and files just makes it harder to read.
let r_mix = 8u32 ^ 16u32 ^ 0.5f32.to_bits() ^ (-2.0f32).to_bits(); | ||
out[12] = p_mix; | ||
out[13] = q_mix; | ||
out[14] = r_mix; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function is computing something very specific but I don't know what it is. A brief comment would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know either, that's from the scalar pair PR: https://github.com/Rust-GPU/rust-gpu/pull/381/files#r2375700762
I don't know much about this code yet but I've given a few points of feedback. |
This comment was marked as resolved.
This comment was marked as resolved.
ca59b8a
to
288a4d1
Compare
288a4d1
to
c4f2227
Compare
pub struct NotVectorField2 { | ||
_x: NonZeroU32, | ||
_y: NonZeroU32, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curiosity: This does not error, as NonZeroU32
(with #[repr(transparent)]
) compiles to a SpirvType::Integer
. @eddyb is this intented?
I tried exploiting this to get a 0 value, but what I came up with is rather contrived due to needing to implement VectorOrScalar
with false information:
// build-fail
// compile-flags: -C target-feature=+GroupNonUniform,+GroupNonUniformArithmetic
use spirv_std::spirv;
use spirv_std::arch::subgroup_i_add;
use spirv_std::glam::UVec3;
use core::num::NonZeroI32;
use core::num::NonZeroUsize;
#[rust_gpu::vector::v1]
#[derive(Copy, Clone)]
pub struct MyVec {
x: NonZeroI32,
y: NonZeroI32,
}
impl Default for MyVec {
fn default() -> Self {
Self {
x: NonZeroI32::new(1).unwrap(),
y: NonZeroI32::new(1).unwrap(),
}
}
}
unsafe impl spirv_std::vector::VectorOrScalar for MyVec {
type Scalar = i32;
const DIM: NonZeroUsize = NonZeroUsize::new(2).unwrap();
}
unsafe impl spirv_std::vector::Vector<i32, 2> for MyVec {}
#[spirv(compute(threads(2)))]
pub fn entry(
#[spirv(local_invocation_id)] tid: UVec3,
) {
let value = if tid.x == 0 {-1} else {1};
let vec = MyVec {
x: NonZeroI32::new(value).unwrap(),
y: NonZeroI32::new(value).unwrap(),
};
let result = subgroup_i_add(vec);
// assert_eq! fails to compile
if result.x.get() == 0 && result.y.get() == 0 {
panic!("guaranteed");
}
}
glam has released I've made a small announcement on the discord:
Semver breakage in glam
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think this is a better way than continuing to sorta bastardize #[repr(simd)]
like my draft pr did. Thanks for pushing this forward and getting it done!
I'm not qualified to review this, so I will let others look at it |
Old embark PR trying to do the same, though I made this PR is made from scratch due to the branches being too different: https://github.com/EmbarkStudios/rust-gpu/pull/1158/files
Objective
Currently, rust-gpu has special layout rules for glam types, which gives them a higher alignment than necessary. This causes structs using them to have a different layout (size, alignment, member offsets) in rust-gpu than they do on the CPU, causing a lot of UB when sending data between host and device. Worst of them is
Vec3
, which has a size of 12 on the CPU but 16 in rust-gpu, which is why we always recommended not to use them in structs shared between targets.Another common workaround was to use
glam
's featurecuda
, which would give all glam vectors a higher than necessary alignment. These alignments seem (not sure!) match those of rust-gpu, though they are defined in very different places and could drift out of sync.This is a mess, and this PR tries to clean it up by:
SpirvType::Vector
size and alignment to match layout of the type#[repr(SIMD)]
with a new custom#[rust_gpu::vector::v1]
#[rust_gpu::vector::v1]
is a new "stable" attribute, allows us to offer backwards compat in the futureglam/cuda
feature will continue to work, though must be turned on on both targets!glam::BVecN
(bool vecs) for freeSize and Alignment chart
(rust-gpu chooses chaotic evil)
difftest
SIMD hacks
feature gates
Notes:
#[repr(SIMD)]
hacks and replace them with a custom#[spirv(vector)]
that we control (and rustc can't change on a whim)abi::vector_layout*
difftest tests every single glam type, not just vecsglam/cuda
inabi::vector_layout_cuda
andglam/scalar-math
inabi::vector_layout_scalar_math