Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ZSTD_decodeSequence when ofBits==0 #3651

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

yoniko
Copy link
Contributor

@yoniko yoniko commented May 22, 2023

This patch adds a branch to a previously branchless code in decompress hot loop handling the case where ofBits == 0.
Even though a branch is added, the branch saves on instructions that introduce memory dependency an unneeded memory operations when the condition isn't met.

Testing on intel Skylake shows positive decompression speed improvements across different corpora and compilers, with speed improvements of 1% to 7%.
On M1 Macbook Pro performance is mostly neutral with a possible very small regression.

Full benchmark results - https://docs.google.com/spreadsheets/d/1hEUY5Gkf6Ebz6Gq5X9U5mURC_SsI43BhIFpE7uBDVsw/edit?usp=sharing

This patch adds a branch to a previously branchless code in decompress hot loop handling the case where `ofBits == 0`.
Even though a branch is added, the branch saves on instructions that introduce memory dependency an unneeded memory operations when the condition isn't met.

Testing on intel Skylake shows positive decompression speed improvements across different corpora and compilers, with speed improvements of 1% to 7%.
On M1 Macbook Pro performance is mostly neutral with a possible very small regression.
@terrelln
Copy link
Contributor

Seems reasonable for most data, since we probably almost never use ll0 repcodes. I wonder what the perf looks like when we do. E.g. maybe kennedy.xls has this pattern.

@terrelln
Copy link
Contributor

I will run benchmarks on my server as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants