-
Notifications
You must be signed in to change notification settings - Fork 313
Revert "Use LLVM intrinsics for madd intrinsics"
#2014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
See also https://godbolt.org/z/8z53f6WPE r? sayantn |
This reverts commit 3214671.
3ae174a to
ed3475f
Compare
crates/core_arch/src/x86/avx2.rs
Outdated
| /// This is a trick used in the adler32 algorithm to get a widening addition. The | ||
| /// multiplication by 1 is trivial, but must not be optimized out because then the vpmaddwd | ||
| /// instruction is no longer selected. The assert_instr verifies that this is the case. | ||
| #[target_feature(enable = "avx2")] | ||
| #[cfg_attr(test, assert_instr(vpmaddwd))] | ||
| unsafe fn test_mm256_madd_epi16_mul_one(mad: __m256i) -> __m256i { | ||
| let one_v = _mm256_set1_epi16(1); | ||
| _mm256_madd_epi16(mad, one_v) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just append this to the test_mm256_madd_epi16 function. Or if you really want to keep it as a separate function, just move it to the tests module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test checks that the instruction is emitted, so it can't just be added to that existing test. Locally it didn't run when I up it with the other tests, but on CI it does, so I've moved it now.
ed3475f to
8a883ad
Compare
|
This seems like it would break the usage of base64. https://godbolt.org/z/q6e1ne568 12 I have some concerns about this kind of change. It might introduce reliance on hard-to-predict optimizer behavior and lead to performance issues that are difficult to detect. Footnotes |
|
Interesting, I think it uses a shift instead of a multiply and that fools the pattern match. Yeah that needs a fix. Do you have any other cases that don't optimize? I'll revert this, and then add separate tests for the adler32 and base64 patterns. |
Can't find more on GitHub. https://github.com/search?q=_mm_madd_epi16+NOT+is%3Afork+language%3ARust&type=code If not limited to real-world cases, there are also cases like |
This reverts commit 3214671.
The commit is from #1985, which itself reverted this original change.
Use
intrinsics::simdformadd(again), now that llvm/llvm-project#174149 should make this optimize properly.