Skip to content

Conversation

@folkertdev
Copy link
Contributor

@folkertdev folkertdev commented Feb 1, 2026

This reverts commit 3214671.

The commit is from #1985, which itself reverted this original change.

Use intrinsics::simd for madd (again), now that llvm/llvm-project#174149 should make this optimize properly.

@folkertdev
Copy link
Contributor Author

See also https://godbolt.org/z/8z53f6WPE

r? sayantn

@folkertdev folkertdev marked this pull request as ready for review February 1, 2026 01:12
Comment on lines 1826 to 1862
/// This is a trick used in the adler32 algorithm to get a widening addition. The
/// multiplication by 1 is trivial, but must not be optimized out because then the vpmaddwd
/// instruction is no longer selected. The assert_instr verifies that this is the case.
#[target_feature(enable = "avx2")]
#[cfg_attr(test, assert_instr(vpmaddwd))]
unsafe fn test_mm256_madd_epi16_mul_one(mad: __m256i) -> __m256i {
let one_v = _mm256_set1_epi16(1);
_mm256_madd_epi16(mad, one_v)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just append this to the test_mm256_madd_epi16 function. Or if you really want to keep it as a separate function, just move it to the tests module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test checks that the instruction is emitted, so it can't just be added to that existing test. Locally it didn't run when I up it with the other tests, but on CI it does, so I've moved it now.

@sayantn sayantn added this pull request to the merge queue Feb 2, 2026
Merged via the queue into rust-lang:main with commit 030e64c Feb 2, 2026
75 checks passed
@usamoi
Copy link
Contributor

usamoi commented Feb 2, 2026

This seems like it would break the usage of base64. https://godbolt.org/z/q6e1ne568 12

I have some concerns about this kind of change. It might introduce reliance on hard-to-predict optimizer behavior and lead to performance issues that are difficult to detect.

Footnotes

  1. https://github.com/ozgb/bs64/blob/ef033b373cb08d76808da9242fe57e89b2eff002/src/avx2/mod.rs#L136

  2. https://github.com/aki-akaguma/basexx/blob/536976ff8aefb8299e0dba335a34198c6ed62d79/src/base64/base64_avx2.rs#L158

@folkertdev
Copy link
Contributor Author

Interesting, I think it uses a shift instead of a multiply and that fools the pattern match. Yeah that needs a fix.

Do you have any other cases that don't optimize?

I'll revert this, and then add separate tests for the adler32 and base64 patterns.

@usamoi
Copy link
Contributor

usamoi commented Feb 2, 2026

Do you have any other cases that don't optimize?

Can't find more on GitHub. https://github.com/search?q=_mm_madd_epi16+NOT+is%3Afork+language%3ARust&type=code

If not limited to real-world cases, there are also cases like _mm256_madd_epi16(v, _mm256_set1_epi16(-1)) that might make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants