Skip to content

Conversation

@Traxar
Copy link
Contributor

@Traxar Traxar commented Sep 14, 2025

changes

  • reduce the fields into an array of columns
  • use simd to calculate all 4 entries at once
    • use a inline loop to further reduce the sum
  • remove the now unnecessary row function

c1: @Vector(4, f32),
c2: @Vector(4, f32),
c3: @Vector(4, f32),
const Mat4 = struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packedness is lost due to using an array, which does not guarantee memory layout. The sample still works (atleast for me). If packedness is of importance i will rewrite the changes both cpu and gpu side with the original struct fields and the simplified maths

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm now this is a good question. If we mark the struct as extern that conforms to the C ABI packing right? But is that consistent across platforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the extern mark should work 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants