-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate CRC32C faster #65
Comments
This comment was marked as resolved.
This comment was marked as resolved.
Hi @lizhanhui. Yes, if a different CRC implementation gives better performance with the same result then such change would be very much appreciated. I'd prefer to have the implementation in some crate though, versus building a wildly custom implementation inside our WAL codebase. Maybe In any case, changes like this are always welcome! |
All right, maybe it's a good idea to build a similar crate for CRC32C. |
Would it be worth to try and include it in |
crc32fast targets 'Fast, SIMD-accelerated CRC32 (IEEE) checksum computation', not sure if it would expand its scope or not. I would drop a message there to check it out. |
There is a 3-year-old thread discussing support of other polynomials, srijs/rust-crc32fast#9, but the maintainer has not yet responded. |
Thank you for checking. In that case it might make sense to work on a new crate. By the way, if the actual implementation of this ends up being small then it's probably fine to implement it inside WAL. But yes, setting up a new crate will allow others to benefit from this as well. |
While I am building a crate, I found this crate quite popularly adopted. Even if it still leaves much space to improve, I create a pull request for you guys to review: #66 |
This crate uses CRC32C algorithm from
crc
crate to check and verify data integrity. Given that every byte written to disk and read from disk is verified to detect potential errors, this function is very hot.Unfortunately, crc crate only employs table-lookup approach, leaving much performance space to optimize. One CPU core may achieve around 200MiB/s with the compiler's implicit optimization. See metrics from crc32fast. (Note, they are comparing CRC32, CRC32C differs in the polynomials only, perf metrics are close enough).
This repo offers good explanations to the optimization space ahead and this blog marches further, achiving performances all the way to some 50GiB/s per CPU core through using haredware accelerated SIMD intristics:
_mm_crc32_u64
and_mm_clmulepi64_si128
in particular.One more thing, zlib, chrome brower and Java hotspot VM uses the three-interleave-crc32x
With that said, let me know if you guys are intested in this issue and the potential improvement
The text was updated successfully, but these errors were encountered: