-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Optimize std::includes
by using ranges::includes
approach
#5543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize std::includes
by using ranges::includes
approach
#5543
Conversation
Thanks! 😻 I pushed medium-small changes, please double-check. |
I think the chance of UB were lower, than your commit message says. We pick the middle needle elemnt to break the match, not an arbitrary random number. Otherwise fine. Please confirm that my results where some cases became worse, don't concern you too much. |
Good point, yeah. 🎲
Yeah, that's fine. Thanks for the analysis! 😸 |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Must go faster! 🚗 🦕 🦖 |
❔ Why ranges approach
We have three outcomes of comparison of elements pair:
return false
Out of these outcomes, we can get one in one branch, the rest will be two branches.
Since we have "less" predicate, we can't have equal in one branch.
Out of haystack less and needle less, the current
std::
algorithm favors needle less, and the ranges algorithm favors haystack less.Favoring haystack less optimizes long algorithm run, where haystack is way bigger than needle.
Favoring needle less optimizes early return, which is not so useful, as it happens only once per algorithm run.
🏁 Benchmarks
Not sure what is common case of
includes
, trying these three:(0) Needle is seen contiguously in haystack
(1) Needle is seen not completely contiguously in haystack, but still compact (stride is geometric distribution)
(2) Needle is spread in haystack with almost the same stride (plus-minus one)
(3) Needle is sampled randomly from haystack
All these are multiplied by
(1) The needle is actually found
(0) The middle element of the needle does not match anything in haystack
I think these combinations are enough to explore the algorithm performance properties
Benchmark results show being affected by codegen gremlin, and expectedly 300/290 cases are less faster and more slower.
But overall looks like an improvement.
📜 Benchmark results
Mixed results with overall improvement, but some great degradation
And random variation for ranges part