Optimize Frame::fill_rgb/fill_rgba
clip
function, reduce code in BoolReader::read_bool
match statement
#72
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
clip
andmulhi
functions outside ofFrame
member functionsBoolReader::read_bool
match statementThe
clip
function can be done with an arithmetic shift and then two packed saturating truncation operations, letting SIMD instructions clamp while casting fromi32
tou8
.Currently, that's inhibited by clipping to
.max(0)
first, which results in a lot of select code formax
andmin
in baseline X86 instructions.Example in
portable_simd
of what to look forhttps://rust.godbolt.org/z/f6cxThK75
You can see some of the effect if you right-click on
L11
in both editors and reveal linked code. It results in about 40% less instructions in the main loop at label.LBB0_9
,L20
in the editors with reveal linked code.https://rust.godbolt.org/z/h1nM6cG4K
Noticed while poking around the profile from #71.