Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize slice::ptr_rotate for small rotates #135847

Merged
merged 3 commits into from
Jan 30, 2025

Conversation

edwloef
Copy link
Contributor

@edwloef edwloef commented Jan 21, 2025

r? @scottmcm

This swaps the positions and numberings of algorithms 1 and 2 in slice::ptr_rotate, and pulls the entire outer loop into algorithm 3 since it was redundant for the first two. Effectively, ptr_rotate now always does the memcpy+memmove+memcpy sequence if the shifts fit into the stack buffer.
With this change, an IndexMap-style move_index function is optimized correctly.

Assembly comparisons:

related to #89714
some relevant discussion in https://internals.rust-lang.org/t/idea-shift-move-to-efficiently-move-elements-in-a-vec/22184

Behavior tests pass locally. I can't get any consistent microbenchmark results on my machine, but the assembly diffs look promising.

@rustbot
Copy link
Collaborator

rustbot commented Jan 21, 2025

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 21, 2025
@edwloef
Copy link
Contributor Author

edwloef commented Jan 22, 2025

On my machine I'm measuring a ~15% speedup on rotating an array of 23 usizes to the right by one 1000000 times:

fn main() {
    let mut vec = [0; 23];

    for _ in 0..1000000 {
        std::hint::black_box(vec.rotate_right(1));
    }
}

before:

Time (mean ± σ):       7.0 ms ±   1.4 ms    [User: 6.4 ms, System: 0.4 ms]
Range (min … max):     4.6 ms …  10.5 ms    1000 runs

after:

Time (mean ± σ):       6.1 ms ±   1.0 ms    [User: 5.5 ms, System: 0.4 ms]
Range (min … max):     4.8 ms …  15.1 ms    1000 runs

Arrays of length >= 24 should be unaffected by this change since they were already using the correct algorithm before, however since it is now used unconditionally for small rotates, we can optimize out algorithm 2 in cases where the compiler doesn't know the slice's length but does know the rotate amount to be small.

@scottmcm
Copy link
Member

scottmcm commented Jan 23, 2025

One other request: can you add a codegen test? I think the simplest way of phrasing what we want to ensure continues to be the case even if the algorithms here are tweaked further is "look, shifting by a constant 1 should do the obvious thing", which is reasonably feasible to do as a codegen test.

Something like a slice_rotate.rs with

#[no_mangle]
// CHECK-LABEL: @rotate_left_by_one
pub fn rotate_left_by_one(slice: &mut [i32]) {
    // Ensure that the simple case of rotating by a constant 1 optimizes to the obvious thing

    // CHECK-NOT: call
    // CHECK-NOT: load
    // CHECK-NOT: store
    // CHECK: %[[FIRST:.+]] = load i32, ptr %slice.0
    // CHECK-NOT: call
    // CHECK-NOT: load
    // CHECK-NOT: store
    // CHECK: call void @llvm.memmove
    // CHECK-NOT: call
    // CHECK-NOT: load
    // CHECK-NOT: store
    // CHECK: store i32 %[[FIRST]], ptr
    // CHECK-NOT: call
    // CHECK-NOT: load
    // CHECK-NOT: store
    if !slice.is_empty() {
        slice.rotate_left(1);
    }
}

(bonus points if you feel like making it more specific in what it's checking, or adding one for .rotate_right(1) too.)

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 23, 2025
@edwloef
Copy link
Contributor Author

edwloef commented Jan 26, 2025

I'm having a hard time writing useful codegen tests for rotate_left and rotate_right without slapping #[inline(always)] on slice::ptr_rotate, since for some reason the compiler doesn't inline that function on its own. Should I just test that function directly instead?

@scottmcm
Copy link
Member

Hmm, doing the gcd and such is a bit more code than I think is appropriate to inline(always).

Maybe you can split the three different algorithms into separate functions? Then the top-level ptr_rotate would just be picking between them, which should make it an obvious inlining candidate with a literal argument. And then LLVM can pick which, if any, of them should be inlined when.

And ideally it won't be needed, but there's also https://doc.rust-lang.org/nightly/std/intrinsics/fn.is_val_statically_known.html that could be used if necessary to force an extra check for this case.

@edwloef
Copy link
Contributor Author

edwloef commented Jan 27, 2025

Maybe you can split the three different algorithms into separate functions? Then the top-level ptr_rotate would just be picking between them, which should make it an obvious inlining candidate with a literal argument.

I've done this now, however it doesn't seem to be enough convincing for the compiler :( It does help with code clarity though, imo. Additionally the std-dev-guide recommends not applying any inlining attributes to private or generic functions, so I'm not sure what else I can try in this case.

@scottmcm
Copy link
Member

Hmm, that's de-facto out of date. If it helps to #[inline], then that's fine.

It's just inline(always) that's strongly discouraged.

@edwloef
Copy link
Contributor Author

edwloef commented Jan 28, 2025

@rustbot ready

I've made the codegen tests work by modifying LLVM's inline-threshold for those tests specifically. I'm not sure that's a good solution, since it won't affect actual user-written code, but it works for now. I added the inline attribute on the "selection" function anyways, just in case.

The case for rotate_right_by_one is a bit more complicated than rotate_left_by_one since LLVM can't tell that both memmove branches are equivalent for left == right (when slice.len() == 2). Special-casing that in ptr_rotate would be fairly easy, however I don't think it'd be worth it, since it would effectively mean duplicating most of the code as far as I can tell.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 28, 2025
@rust-log-analyzer

This comment has been minimized.

@edwloef
Copy link
Contributor Author

edwloef commented Jan 28, 2025

Ugh, it worked locally. Lemme have another look. Is there a way for me to take away the "waiting on review" tag? 😆

@rustbot rustbot added has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 28, 2025
@rustbot

This comment has been minimized.

@edwloef edwloef force-pushed the slice_ptr_rotate_opt branch from d7f10ac to bb01b5a Compare January 28, 2025 12:32
@rustbot rustbot removed has-merge-commits PR has merge commits, merge with caution. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 28, 2025
@rust-log-analyzer

This comment has been minimized.

@edwloef edwloef force-pushed the slice_ptr_rotate_opt branch from bb01b5a to 225fb74 Compare January 29, 2025 16:34
@rust-log-analyzer

This comment has been minimized.

@edwloef edwloef force-pushed the slice_ptr_rotate_opt branch from 225fb74 to 252cc7d Compare January 29, 2025 16:59
@rust-log-analyzer

This comment has been minimized.

@edwloef edwloef force-pushed the slice_ptr_rotate_opt branch from 252cc7d to d1b547c Compare January 29, 2025 17:32
@edwloef
Copy link
Contributor Author

edwloef commented Jan 29, 2025

Now that the codegen test finally works I'll move the left == 0 || right == 0 check back into the selection function (so it's still included when compiling for size), then this should be ready to go as far as I can tell.

@edwloef edwloef force-pushed the slice_ptr_rotate_opt branch from d1b547c to fb3d1d0 Compare January 29, 2025 18:41
@scottmcm
Copy link
Member

Sounds good! Thanks for taking this on :)

@bors r+ rollup=iffy (new codegen tests are always a bit sketchy even after they pass CI)

@bors
Copy link
Contributor

bors commented Jan 30, 2025

📌 Commit fb3d1d0 has been approved by scottmcm

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 30, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 30, 2025
Rollup of 5 pull requests

Successful merges:

 - rust-lang#135847 (optimize slice::ptr_rotate for small rotates)
 - rust-lang#136215 (btree/node.rs: remove incorrect comment from pop_internal_level docs)
 - rust-lang#136252 (spastorino back from vacations)
 - rust-lang#136254 (Rustc dev guide subtree update)
 - rust-lang#136259 (Cleanup docs for Allocator)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 6ebe590 into rust-lang:master Jan 30, 2025
6 checks passed
@rustbot rustbot added this to the 1.86.0 milestone Jan 30, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 30, 2025
Rollup merge of rust-lang#135847 - edwloef:slice_ptr_rotate_opt, r=scottmcm

optimize slice::ptr_rotate for small rotates

r? `@scottmcm`

This swaps the positions and numberings of algorithms 1 and 2 in `slice::ptr_rotate`, and pulls the entire outer loop into algorithm 3 since it was redundant for the first two. Effectively, `ptr_rotate` now always does the `memcpy`+`memmove`+`memcpy` sequence if the shifts fit into the stack buffer.
With this change, an `IndexMap`-style `move_index` function is optimized correctly.

Assembly comparisons:
- `move_index`, before: https://godbolt.org/z/Kr616KnYM
- `move_index`, after: https://godbolt.org/z/1aoov6j8h
- the code from `rust-lang#89714`, before: https://godbolt.org/z/Y4zaPxEG6
- the code from `rust-lang#89714`, after: https://godbolt.org/z/1dPx83axc

related to rust-lang#89714
some relevant discussion in https://internals.rust-lang.org/t/idea-shift-move-to-efficiently-move-elements-in-a-vec/22184

Behavior tests pass locally. I can't get any consistent microbenchmark results on my machine, but the assembly diffs look promising.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants