Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Frame::fill_rgb/fill_rgba clip function, reduce code in BoolReader::read_bool match statement #72

Merged
merged 1 commit into from
May 14, 2024

Conversation

okaneco
Copy link
Contributor

@okaneco okaneco commented May 7, 2024

  • Extract shared clip and mulhi functions outside of Frame member functions
  • Rearrange the order of clamping operations to enable better auto-vectorization
  • Remove no-op returning branch value in BoolReader::read_bool match statement

The clip function can be done with an arithmetic shift and then two packed saturating truncation operations, letting SIMD instructions clamp while casting from i32 to u8.
Currently, that's inhibited by clipping to .max(0) first, which results in a lot of select code for max and min in baseline X86 instructions.

Example in portable_simd of what to look for
https://rust.godbolt.org/z/f6cxThK75
You can see some of the effect if you right-click on L11 in both editors and reveal linked code. It results in about 40% less instructions in the main loop at label .LBB0_9, L20 in the editors with reveal linked code.
https://rust.godbolt.org/z/h1nM6cG4K

Noticed while poking around the profile from #71.

`BoolReader::read_bool` match statement

Rearranging the order of clamping allows for better
autovectorization.
Extract shared clip and mulhi functions outside of `Frame`
Remove no-op returning branch value in `BoolReader::read_bool` match statement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants