-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM support #31
Comments
Sorry, I had been meaning to document that and kept forgetting to. The main culprits are src/prefix_sum.rs and src/transpose.rs. An ARM Neon version of them could definitely be implemented, but it is unlikely to ever be a priority for us. Patches welcome though! There's also Unfortunately a non-SIMD version is unusably slow. |
Another option is adding an option to disable compression and/or to only use zstd directly, without the transpose/prefix_sum transformations to improve the compression (both ratio and speed). That would result in significantly increased bandwidth usage, but might still be worth it to be able to use wprs on ARM at all (until ARM SIMD versions of the functions are implemented). |
Oh, there may also be some endianness issues, but those should be easy to resolve. |
Thanks for your quick reply!
I'm probably not up for this task, unfortunately. But it's good to know you'd welcome external contributions should someone more capable implement it.
Would generating
Do you have a rough idea what the bandwith usage would be? Especially in comparison to Waypipe?
That shouldn't be an issue. Waypipe also has the limitation that both systems need to have the same endianness, and it works for my use case. That is because even though ARM CPUs are bi-endian, in the real world Linux on ARM is always little-endian like x86. |
Hi,
this project looks very interesting.
I'm currently using Waypipe to forward Wayland apps running on a NixOS aarch64 host to a NixOS x86-64 client. I guess the more common use case would be the reverse: forwarding apps running on a more powerful x86-64 host to a less powerful ARM SBC.
As I found out while packaging this project with Nix and trying to build it for aarch64-linux, the code only builds for x86-64-v3 as it uses instructions specific to that architecture (SSE2/AVX2).
The SIMD / Architecture-specific instruction stuff is frankly a bit above my paygrade.
But I know that ARM has its own instructions for SIMD. Unfortunately, just like for x86, the set of instructions varies by device and vendor. So for example, an Apple M1 chip (ARMv8.5-A) supports newer SIMD instructions that a Raspberry Pi 5 (ARMv8.2-A) doesn't.
Do you think it would be possible to reimplement the code specific to x86-64-v3 for ARM using some of the ARM SIMD extensions? Alternatively, would it make sense to have a non-optimized, generic implementation that does not rely on any architecture-specific instructions as a fallback?
The text was updated successfully, but these errors were encountered: