You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The prefetch in Stream here: https://github.com/google/highway/blob/master/hwy/ops/arm_neon-inl.h#L4061 in the ARM implementation of Stream can degrade throughput. On a Jetson Nano, I have a Memset-like operation that can achieve 11 GB/s with Store, and is reduced to ~3.5 GB/s with Stream unless I remove the prefetch. Can the prefetch be removed or made optional?
The text was updated successfully, but these errors were encountered:
Hi, thanks for reporting. Are you aware of another way to implement the non-temporal behavior? Data Cache Clean is a system instruction.
If not, we'd welcome a pull request to remove the prefetch.
The prefetch in Stream here: https://github.com/google/highway/blob/master/hwy/ops/arm_neon-inl.h#L4061 in the ARM implementation of Stream can degrade throughput. On a Jetson Nano, I have a Memset-like operation that can achieve 11 GB/s with Store, and is reduced to ~3.5 GB/s with Stream unless I remove the prefetch. Can the prefetch be removed or made optional?
The text was updated successfully, but these errors were encountered: