Skip to content

Conversation

sbernauer
Copy link
Owner

@sbernauer sbernauer commented Jun 18, 2023

Can be extended to parse rgba when we already have it in our vector. I think the swizzle (shuffle) part is costing most of the performance

@sbernauer sbernauer mentioned this pull request Jun 18, 2023
Copy link
Contributor

@fabi321 fabi321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm feeling your pain, it takes 54ms on my machine, compared to 20ms for my solution, or 18.5ms for the existing solution

Comment on lines +327 to +330
let chars = u8x32::from_array([
value[0], value[1], value[2], value[3], value[4], value[5], value[6], value[7], value[8],
value[9], 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my testing, it is faster to cast the values into the final type at this point.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is the idea with shuffling all the needed stuff (x, y and (later on) rgb) into a well-defined position only works with u8 as far as I could find :(

Comment on lines +336 to +338
let spaces_bitmask = chars.simd_eq(SIMD_SPACE_CHAR).to_bitmask();
let newline_bitmask = chars.simd_eq(SIMD_NEWLINE_CHAR).to_bitmask();
let spaces_bitmask = (spaces_bitmask | newline_bitmask) & SPACES_BITMASK_MASK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just doing a digits>=10 on digits, as everything below b'0' will wrap around, this catches all characters other than digits.

@fabi321
Copy link
Contributor

fabi321 commented Jun 18, 2023

Also a note on types: I tried to change my solution to u16 (I'm currently using u32), but it was 20% slower. That's why I'm using u32.

@fabi321
Copy link
Contributor

fabi321 commented Jun 18, 2023

If I follow the docs, and use cargo bench -Zbuild-std --target x86_64-unknown-linux-gnu, the time usage goes down to 21.5ms, from 50ms.

@sbernauer
Copy link
Owner Author

Ahh you're right! Have read that and had it back of my mind, but totally missed that. Thx!

@sbernauer
Copy link
Owner Author

That kind of motivates me to continue on this and add the rgb parsing as well ^^

@sbernauer sbernauer changed the title spike: Test using SIMD to parse coordinates. Approx 50% slowdown spike: Test using SIMD to parse coordinates Jun 18, 2023
@fabi321
Copy link
Contributor

fabi321 commented Jun 18, 2023

the following test fails:
assert_eq!(simd_parse_coord(b"1 2\n "), (1, 2, 3));
The problem is, that the spaces after the newline are not ignored, and lead to an invalid pattern

@fabi321
Copy link
Contributor

fabi321 commented Jun 18, 2023

This is the reason why I use trailing_zeros to get the number of digits, as it stops at the first one, and ignores all the other clutter. This obviously only works on one number at a time.

@sbernauer
Copy link
Owner Author

Yep, I have noticed the failing tests as well. I already have an idea on how to solve this (with no performance penalty), but I did not continue any further until we solved the performance problem. The -Zbuild-std improved things quite a bit, so I might add the rgb parsing and than re-visit the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants