Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass through nightly feature to crc32fast crate to get SIMD crc32 on Aarch64 #545

Merged
merged 1 commit into from
Dec 7, 2024

Conversation

Shnatsel
Copy link
Contributor

@Shnatsel Shnatsel commented Dec 7, 2024

crc32fast crate contains an SIMD implementation of crc32 for Aarch64, but it's gated behind the nightly feature flag.

The relevant intrinsics have actually been stable since Rust 1.80, but the PR to make use of them on stable seems to have stalled: srijs/rust-crc32fast#36

This PR lets us make use of them on nightly at least until upstream sorts out stable support.

@kornelski kornelski merged commit 2232f83 into image-rs:master Dec 7, 2024
24 checks passed
@Shnatsel
Copy link
Contributor Author

Shnatsel commented Dec 7, 2024

Thanks for merging!

@kornelski if you have the time, I would be very interested in seeing the numbers from https://github.com/fintelia/corpus-bench/ on Apple silicon, both before and after this PR. I'd like to make a public announcement about the recent performance gains, and it would be very nice to include the numbers from an ARM system. It would also tell me if the gains from this PR are compelling enough to cut a new release.

@kornelski
Copy link
Contributor

Running decoding benchmark with corpus: QoiBench
image-rs PNG:     256.059 MP/s (average) 210.616 MP/s (geomean)
zune-png:         221.543 MP/s (average) 178.502 MP/s (geomean)
wuffs PNG:        255.111 MP/s (average) 200.834 MP/s (geomean)
libpng:           168.912 MP/s (average) 143.849 MP/s (geomean)
spng:             138.046 MP/s (average) 112.993 MP/s (geomean)
stb_image PNG:    186.223 MP/s (average) 139.381 MP/s (geomean)

The code very optimistically assumed that including png.h would work, I had to dust off libpng-sys.

@Shnatsel
Copy link
Contributor Author

Shnatsel commented Dec 8, 2024

Thanks! I understand this is measured using unmodified https://github.com/fintelia/corpus-bench, not repointed to this PR?

@kornelski
Copy link
Contributor

kornelski commented Dec 8, 2024

Comparing 2232f83 10644db with cargo +nightly run --release -- decode qoi-bench --image-rs-only is below noise threshold. Both around 295MB/s.

@kornelski
Copy link
Contributor

kornelski commented Dec 8, 2024

2232f83 with unstable flag is slightly faster — 301MB/s instead of 295MB/s, but it's hard to say for sure. There's high variance, and I sometimes get 285MB/s without changing anything.

@Shnatsel
Copy link
Contributor Author

Shnatsel commented Dec 8, 2024

I see. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants