You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While refactoring Rust code, I ended up with the src loop body which produces extra max/min instructions compared to tgt.
Reordering the right shift to be before the signed max with 0 produces better auto-vectorized assembly.
The arithmetic shift right preserves the sign, so the saturating truncation instructions can handle clamping to 0. https://rust.godbolt.org/z/WW4sGoaf4
You can see some of the effect if you right-click on L11 in both editors and reveal linked code. It results in about 40% less instructions in the main loop at label .LBB0_9, L20 in the editors with reveal linked code. https://rust.godbolt.org/z/h1nM6cG4K
The text was updated successfully, but these errors were encountered:
While refactoring Rust code, I ended up with the
src
loop body which produces extra max/min instructions compared totgt
.Reordering the right shift to be before the signed max with 0 produces better auto-vectorized assembly.
The arithmetic shift right preserves the sign, so the saturating truncation instructions can handle clamping to 0.
https://rust.godbolt.org/z/WW4sGoaf4
Assembly instructions
Emitted IR - https://alive2.llvm.org/ce/z/fa8cRT
src
bodytgt
bodyalive2 proof - https://alive2.llvm.org/ce/z/iUbk-i
A real world case of this was from the Rust
image-webp
crate.image-rs/image-webp#72
The text was updated successfully, but these errors were encountered: