optimize slice::ptr_rotate for small rotates #135847

edwloef · 2025-01-21T22:28:39Z

This swaps the positions and numberings of algorithms 1 and 2 in slice::ptr_rotate, and pulls the entire outer loop into algorithm 3 since it was redundant for the first two. Effectively, ptr_rotate now always does the memcpy+memmove+memcpy sequence if the shifts fit into the stack buffer.
With this change, an IndexMap-style move_index function is optimized correctly.

Assembly comparisons:

move_index, before: https://godbolt.org/z/Kr616KnYM
move_index, after: https://godbolt.org/z/1aoov6j8h
the code from #89714, before: https://godbolt.org/z/Y4zaPxEG6
the code from #89714, after: https://godbolt.org/z/1dPx83axc

related to #89714
some relevant discussion in https://internals.rust-lang.org/t/idea-shift-move-to-efficiently-move-elements-in-a-vec/22184

Behavior tests pass locally. I can't get any consistent microbenchmark results on my machine, but the assembly diffs look promising.

rustbot · 2025-01-21T22:28:48Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

edwloef · 2025-01-22T11:56:01Z

On my machine I'm measuring a ~15% speedup on rotating an array of 23 usizes to the right by one 1000000 times:

fn main() {
    let mut vec = [0; 23];

    for _ in 0..1000000 {
        std::hint::black_box(vec.rotate_right(1));
    }
}

before:

Time (mean ± σ):       7.0 ms ±   1.4 ms    [User: 6.4 ms, System: 0.4 ms]
Range (min … max):     4.6 ms …  10.5 ms    1000 runs

after:

Time (mean ± σ):       6.1 ms ±   1.0 ms    [User: 5.5 ms, System: 0.4 ms]
Range (min … max):     4.8 ms …  15.1 ms    1000 runs

Arrays of length >= 24 should be unaffected by this change since they were already using the correct algorithm before, however since it is now used unconditionally for small rotates, we can optimize out algorithm 2 in cases where the compiler doesn't know the slice's length but does know the rotate amount to be small.

library/core/src/slice/rotate.rs

scottmcm · 2025-01-23T06:44:35Z

One other request: can you add a codegen test? I think the simplest way of phrasing what we want to ensure continues to be the case even if the algorithms here are tweaked further is "look, shifting by a constant 1 should do the obvious thing", which is reasonably feasible to do as a codegen test.

Something like a slice_rotate.rs with

#[no_mangle]
// CHECK-LABEL: @rotate_left_by_one
pub fn rotate_left_by_one(slice: &mut [i32]) {
    // Ensure that the simple case of rotating by a constant 1 optimizes to the obvious thing

    // CHECK-NOT: call
    // CHECK-NOT: load
    // CHECK-NOT: store
    // CHECK: %[[FIRST:.+]] = load i32, ptr %slice.0
    // CHECK-NOT: call
    // CHECK-NOT: load
    // CHECK-NOT: store
    // CHECK: call void @llvm.memmove
    // CHECK-NOT: call
    // CHECK-NOT: load
    // CHECK-NOT: store
    // CHECK: store i32 %[[FIRST]], ptr
    // CHECK-NOT: call
    // CHECK-NOT: load
    // CHECK-NOT: store
    if !slice.is_empty() {
        slice.rotate_left(1);
    }
}

(bonus points if you feel like making it more specific in what it's checking, or adding one for .rotate_right(1) too.)

@rustbot author

edwloef · 2025-01-26T09:44:10Z

I'm having a hard time writing useful codegen tests for rotate_left and rotate_right without slapping #[inline(always)] on slice::ptr_rotate, since for some reason the compiler doesn't inline that function on its own. Should I just test that function directly instead?

scottmcm · 2025-01-27T01:06:45Z

Hmm, doing the gcd and such is a bit more code than I think is appropriate to inline(always).

Maybe you can split the three different algorithms into separate functions? Then the top-level ptr_rotate would just be picking between them, which should make it an obvious inlining candidate with a literal argument. And then LLVM can pick which, if any, of them should be inlined when.

And ideally it won't be needed, but there's also https://doc.rust-lang.org/nightly/std/intrinsics/fn.is_val_statically_known.html that could be used if necessary to force an extra check for this case.

edwloef · 2025-01-27T15:36:33Z

Maybe you can split the three different algorithms into separate functions? Then the top-level ptr_rotate would just be picking between them, which should make it an obvious inlining candidate with a literal argument.

I've done this now, however it doesn't seem to be enough convincing for the compiler :( It does help with code clarity though, imo. Additionally the std-dev-guide recommends not applying any inlining attributes to private or generic functions, so I'm not sure what else I can try in this case.

scottmcm · 2025-01-28T05:46:56Z

Hmm, that's de-facto out of date. If it helps to #[inline], then that's fine.

It's just inline(always) that's strongly discouraged.

edwloef · 2025-01-28T11:41:09Z

@rustbot ready

I've made the codegen tests work by modifying LLVM's inline-threshold for those tests specifically. I'm not sure that's a good solution, since it won't affect actual user-written code, but it works for now. I added the inline attribute on the "selection" function anyways, just in case.

The case for rotate_right_by_one is a bit more complicated than rotate_left_by_one since LLVM can't tell that both memmove branches are equivalent for left == right (when slice.len() == 2). Special-casing that in ptr_rotate would be fairly easy, however I don't think it'd be worth it, since it would effectively mean duplicating most of the code as far as I can tell.

edwloef · 2025-01-28T11:54:14Z

Ugh, it worked locally. Lemme have another look. Is there a way for me to take away the "waiting on review" tag? 😆

tests/codegen/lib-optimizations/slice_rotate.rs

library/core/src/slice/rotate.rs

…help inlining

edwloef · 2025-01-29T18:38:14Z

Now that the codegen test finally works I'll move the left == 0 || right == 0 check back into the selection function (so it's still included when compiling for size), then this should be ready to go as far as I can tell.

scottmcm · 2025-01-30T01:54:26Z

Sounds good! Thanks for taking this on :)

@bors r+ rollup=iffy (new codegen tests are always a bit sketchy even after they pass CI)

bors · 2025-01-30T01:54:29Z

📌 Commit fb3d1d0 has been approved by scottmcm

It is now in the queue for this repository.

Rollup of 5 pull requests Successful merges: - rust-lang#135847 (optimize slice::ptr_rotate for small rotates) - rust-lang#136215 (btree/node.rs: remove incorrect comment from pop_internal_level docs) - rust-lang#136252 (spastorino back from vacations) - rust-lang#136254 (Rustc dev guide subtree update) - rust-lang#136259 (Cleanup docs for Allocator) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#135847 - edwloef:slice_ptr_rotate_opt, r=scottmcm optimize slice::ptr_rotate for small rotates r? `@scottmcm` This swaps the positions and numberings of algorithms 1 and 2 in `slice::ptr_rotate`, and pulls the entire outer loop into algorithm 3 since it was redundant for the first two. Effectively, `ptr_rotate` now always does the `memcpy`+`memmove`+`memcpy` sequence if the shifts fit into the stack buffer. With this change, an `IndexMap`-style `move_index` function is optimized correctly. Assembly comparisons: - `move_index`, before: https://godbolt.org/z/Kr616KnYM - `move_index`, after: https://godbolt.org/z/1aoov6j8h - the code from `rust-lang#89714`, before: https://godbolt.org/z/Y4zaPxEG6 - the code from `rust-lang#89714`, after: https://godbolt.org/z/1dPx83axc related to rust-lang#89714 some relevant discussion in https://internals.rust-lang.org/t/idea-shift-move-to-efficiently-move-elements-in-a-vec/22184 Behavior tests pass locally. I can't get any consistent microbenchmark results on my machine, but the assembly diffs look promising.

optimize slice::ptr_rotate for compile-time-constant small rotates

6b48b67

rustbot assigned scottmcm Jan 21, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 21, 2025

scottmcm reviewed Jan 23, 2025

View reviewed changes

library/core/src/slice/rotate.rs Outdated Show resolved Hide resolved

scottmcm reviewed Jan 23, 2025

View reviewed changes

library/core/src/slice/rotate.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 23, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 28, 2025

This comment has been minimized.

Sign in to view

rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 28, 2025

This comment has been minimized.

Sign in to view

edwloef force-pushed the slice_ptr_rotate_opt branch from d7f10ac to bb01b5a Compare January 28, 2025 12:32

rustbot removed has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 28, 2025

This comment has been minimized.

Sign in to view

scottmcm reviewed Jan 29, 2025

View reviewed changes

tests/codegen/lib-optimizations/slice_rotate.rs Outdated Show resolved Hide resolved

scottmcm reviewed Jan 29, 2025

View reviewed changes

library/core/src/slice/rotate.rs Show resolved Hide resolved

scottmcm reviewed Jan 29, 2025

View reviewed changes

library/core/src/slice/rotate.rs Show resolved Hide resolved

scottmcm reviewed Jan 29, 2025

View reviewed changes

library/core/src/slice/rotate.rs Show resolved Hide resolved

edwloef force-pushed the slice_ptr_rotate_opt branch from bb01b5a to 225fb74 Compare January 29, 2025 16:34

This comment has been minimized.

Sign in to view

edwloef force-pushed the slice_ptr_rotate_opt branch from 225fb74 to 252cc7d Compare January 29, 2025 16:59

This comment has been minimized.

Sign in to view

edwloef force-pushed the slice_ptr_rotate_opt branch from 252cc7d to d1b547c Compare January 29, 2025 17:32

edwloef added 2 commits January 29, 2025 19:34

split slice::ptr_rotate into three separate algorithms, to hopefully …

311c3b7

…help inlining

add inline attribute and codegen test

fb3d1d0

edwloef force-pushed the slice_ptr_rotate_opt branch from d1b547c to fb3d1d0 Compare January 29, 2025 18:41

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 30, 2025

Zalathar mentioned this pull request Jan 30, 2025

Rollup of 5 pull requests #136272

Merged

bors merged commit 6ebe590 into rust-lang:master Jan 30, 2025
6 checks passed

rustbot added this to the 1.86.0 milestone Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize slice::ptr_rotate for small rotates #135847

optimize slice::ptr_rotate for small rotates #135847

edwloef commented Jan 21, 2025

rustbot commented Jan 21, 2025

edwloef commented Jan 22, 2025 •

edited

Loading

scottmcm commented Jan 23, 2025 •

edited

Loading

edwloef commented Jan 26, 2025 •

edited

Loading

scottmcm commented Jan 27, 2025

edwloef commented Jan 27, 2025

scottmcm commented Jan 28, 2025

edwloef commented Jan 28, 2025 •

edited

Loading

This comment has been minimized.

edwloef commented Jan 28, 2025 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

edwloef commented Jan 29, 2025

scottmcm commented Jan 30, 2025

bors commented Jan 30, 2025

optimize slice::ptr_rotate for small rotates #135847

optimize slice::ptr_rotate for small rotates #135847

Conversation

edwloef commented Jan 21, 2025

rustbot commented Jan 21, 2025

edwloef commented Jan 22, 2025 • edited Loading

scottmcm commented Jan 23, 2025 • edited Loading

edwloef commented Jan 26, 2025 • edited Loading

scottmcm commented Jan 27, 2025

edwloef commented Jan 27, 2025

scottmcm commented Jan 28, 2025

edwloef commented Jan 28, 2025 • edited Loading

This comment has been minimized.

edwloef commented Jan 28, 2025 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

edwloef commented Jan 29, 2025

scottmcm commented Jan 30, 2025

bors commented Jan 30, 2025

edwloef commented Jan 22, 2025 •

edited

Loading

scottmcm commented Jan 23, 2025 •

edited

Loading

edwloef commented Jan 26, 2025 •

edited

Loading

edwloef commented Jan 28, 2025 •

edited

Loading

edwloef commented Jan 28, 2025 •

edited

Loading