-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize slice::ptr_rotate for small rotates #135847
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
On my machine I'm measuring a ~15% speedup on rotating an array of 23 usizes to the right by one 1000000 times: fn main() {
let mut vec = [0; 23];
for _ in 0..1000000 {
std::hint::black_box(vec.rotate_right(1));
}
} before:
after:
Arrays of length >= 24 should be unaffected by this change since they were already using the correct algorithm before, however since it is now used unconditionally for small rotates, we can optimize out algorithm 2 in cases where the compiler doesn't know the slice's length but does know the rotate amount to be small. |
One other request: can you add a codegen test? I think the simplest way of phrasing what we want to ensure continues to be the case even if the algorithms here are tweaked further is "look, shifting by a constant 1 should do the obvious thing", which is reasonably feasible to do as a codegen test. Something like a #[no_mangle]
// CHECK-LABEL: @rotate_left_by_one
pub fn rotate_left_by_one(slice: &mut [i32]) {
// Ensure that the simple case of rotating by a constant 1 optimizes to the obvious thing
// CHECK-NOT: call
// CHECK-NOT: load
// CHECK-NOT: store
// CHECK: %[[FIRST:.+]] = load i32, ptr %slice.0
// CHECK-NOT: call
// CHECK-NOT: load
// CHECK-NOT: store
// CHECK: call void @llvm.memmove
// CHECK-NOT: call
// CHECK-NOT: load
// CHECK-NOT: store
// CHECK: store i32 %[[FIRST]], ptr
// CHECK-NOT: call
// CHECK-NOT: load
// CHECK-NOT: store
if !slice.is_empty() {
slice.rotate_left(1);
}
} (bonus points if you feel like making it more specific in what it's checking, or adding one for @rustbot author |
I'm having a hard time writing useful codegen tests for |
Hmm, doing the gcd and such is a bit more code than I think is appropriate to Maybe you can split the three different algorithms into separate functions? Then the top-level And ideally it won't be needed, but there's also https://doc.rust-lang.org/nightly/std/intrinsics/fn.is_val_statically_known.html that could be used if necessary to force an extra check for this case. |
I've done this now, however it doesn't seem to be enough convincing for the compiler :( It does help with code clarity though, imo. Additionally the std-dev-guide recommends not applying any inlining attributes to private or generic functions, so I'm not sure what else I can try in this case. |
Hmm, that's de-facto out of date. If it helps to It's just |
@rustbot ready I've made the codegen tests work by modifying LLVM's inline-threshold for those tests specifically. I'm not sure that's a good solution, since it won't affect actual user-written code, but it works for now. I added the inline attribute on the "selection" function anyways, just in case. The case for |
This comment has been minimized.
This comment has been minimized.
Ugh, it worked locally. Lemme have another look. Is there a way for me to take away the "waiting on review" tag? 😆 |
This comment has been minimized.
This comment has been minimized.
d7f10ac
to
bb01b5a
Compare
This comment has been minimized.
This comment has been minimized.
bb01b5a
to
225fb74
Compare
This comment has been minimized.
This comment has been minimized.
225fb74
to
252cc7d
Compare
This comment has been minimized.
This comment has been minimized.
252cc7d
to
d1b547c
Compare
Now that the codegen test finally works I'll move the |
d1b547c
to
fb3d1d0
Compare
Sounds good! Thanks for taking this on :) @bors r+ rollup=iffy (new codegen tests are always a bit sketchy even after they pass CI) |
Rollup of 5 pull requests Successful merges: - rust-lang#135847 (optimize slice::ptr_rotate for small rotates) - rust-lang#136215 (btree/node.rs: remove incorrect comment from pop_internal_level docs) - rust-lang#136252 (spastorino back from vacations) - rust-lang#136254 (Rustc dev guide subtree update) - rust-lang#136259 (Cleanup docs for Allocator) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#135847 - edwloef:slice_ptr_rotate_opt, r=scottmcm optimize slice::ptr_rotate for small rotates r? `@scottmcm` This swaps the positions and numberings of algorithms 1 and 2 in `slice::ptr_rotate`, and pulls the entire outer loop into algorithm 3 since it was redundant for the first two. Effectively, `ptr_rotate` now always does the `memcpy`+`memmove`+`memcpy` sequence if the shifts fit into the stack buffer. With this change, an `IndexMap`-style `move_index` function is optimized correctly. Assembly comparisons: - `move_index`, before: https://godbolt.org/z/Kr616KnYM - `move_index`, after: https://godbolt.org/z/1aoov6j8h - the code from `rust-lang#89714`, before: https://godbolt.org/z/Y4zaPxEG6 - the code from `rust-lang#89714`, after: https://godbolt.org/z/1dPx83axc related to rust-lang#89714 some relevant discussion in https://internals.rust-lang.org/t/idea-shift-move-to-efficiently-move-elements-in-a-vec/22184 Behavior tests pass locally. I can't get any consistent microbenchmark results on my machine, but the assembly diffs look promising.
r? @scottmcm
This swaps the positions and numberings of algorithms 1 and 2 in
slice::ptr_rotate
, and pulls the entire outer loop into algorithm 3 since it was redundant for the first two. Effectively,ptr_rotate
now always does thememcpy
+memmove
+memcpy
sequence if the shifts fit into the stack buffer.With this change, an
IndexMap
-stylemove_index
function is optimized correctly.Assembly comparisons:
move_index
, before: https://godbolt.org/z/Kr616KnYMmove_index
, after: https://godbolt.org/z/1aoov6j8h#89714
, before: https://godbolt.org/z/Y4zaPxEG6#89714
, after: https://godbolt.org/z/1dPx83axcrelated to #89714
some relevant discussion in https://internals.rust-lang.org/t/idea-shift-move-to-efficiently-move-elements-in-a-vec/22184
Behavior tests pass locally. I can't get any consistent microbenchmark results on my machine, but the assembly diffs look promising.