Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD support #5

Merged
merged 5 commits into from
Dec 16, 2023
Merged

SIMD support #5

merged 5 commits into from
Dec 16, 2023

Conversation

leonbotros
Copy link
Contributor

@leonbotros leonbotros commented Jun 29, 2023

  • SIMD enabled by simd feature, make sure to compile using nightly (cargo +nightly) for a supported target (using RUSTFLAGS='-C target-cpu=native' or RUSTFLAGS='-C target_feature=+avx2' for example).
  • Checks at compile time if the target supports AVX512/AVX2/WASM SIMD and chooses x16/x8/x4 accordingly.
  • Keeps bigger in and out buffers in the SIMD implementation, so only useful if atleast 48 * 16/8/4 byte input/outputs are absorbed/squeezed.
  • SIMD xoodoo permutation is fully generic over the amount of SIMD lanes.
  • Xoofff implementations are not fully generic due to unrolling on a generic constant not being possible using crunchy.
  • Operations converting bytes to N parallel state and vice versa are not optimized. They might be faster using gather/scatter methods.

TODO:

  • document how to enable the feature.
  • doc test compilation currently fails.
  • test simd in CI.
  • test or not include the x16 code (support is low, I don't have it either)

- Checks at compile time if the target supports AVX512/AVX2/WASM SIMD.
- Keeps bigger in and out buffers in the SIMD implementation, so only useful
  if atleast 48 * 16/8/4 byte input/outputs are absorbed/squeezed.
- SIMD xoodoo permutation is fully generic over the amount of SIMD
  lanes.
- Xoofff implementations are not fully generic due to unrolling on a
  generic constant not being possible using crunchy.
- Operations converting bytes to N parallel state and vice versa are not optimized.
  They might be faster using gather/scatter methods.
@itzmeanjan
Copy link
Owner

Amazing, thanks for the PR @leonbotros . I'll take a look.

@leonbotros
Copy link
Contributor Author

Thanks, feedback is appreciated. Since WASM does not have runtime feature detection I decided to not use any of the #[target_feature(enable = "...")] stuff.

@itzmeanjan
Copy link
Owner

Thanks, feedback is appreciated. Since WASM does not have runtime feature detection I decided to not use any of the #[target_feature(enable = "...")] stuff.

Totally makes sense. I'll need some time before I can review the PR. Though it's a relief to see that tests are passing.

@itzmeanjan itzmeanjan mentioned this pull request Nov 21, 2023
@itzmeanjan
Copy link
Owner

Hi @leonbotros , I merged back latest from master into your PR branch and when I benchmark on my machine 12th Gen Intel(R) Core(TM) i7-1260P, running GNU/Linux kernel 6.5.0-13-generic, I see ⬇️

Screenshot from 2023-11-26 09-11-40

It's pretty good 💯

I'd like to merge this and publish on crates.io. Would you doing the remaining or you want me to take over for the rest of the part ?

@leonbotros
Copy link
Contributor Author

leonbotros commented Nov 27, 2023 via email

@itzmeanjan
Copy link
Owner

Hi, I currently have no time to clean this up a bit so feel free ;)

On Sun, Nov 26, 2023, 06:16 Anjan Roy @.> wrote: Hi @leonbotros https://github.com/leonbotros , I merged back latest from master into your PR branch and when I benchmark on my machine 12th Gen Intel(R) Core(TM) i7-1260P, running GNU/Linux kernel 6.5.0-13-generic, I see ⬇️ Screenshot.from.2023-11-26.09-11-40.png (view on web) https://github.com/itzmeanjan/xoofff/assets/45074836/e4651462-47e6-4a84-95f8-2fdcc6d66f96 It's pretty good 💯 I'd like to merge this and publish on crates.io. Would you doing the remaining or you want me to take over for the rest of the part ? — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANY4MOE3JESN7JMHXNJBY3YGLGETAVCNFSM6AAAAAAZYJZQOOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWGU3DEMRXGU . You are receiving this because you were mentioned.Message ID: @.>

Thanks for your contribution. I'll proceed then ;)

@itzmeanjan itzmeanjan self-requested a review December 16, 2023 05:57
@itzmeanjan itzmeanjan marked this pull request as ready for review December 16, 2023 05:57
@itzmeanjan itzmeanjan merged commit 35654f8 into itzmeanjan:master Dec 16, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants