Add unstable loop unrolling hint attributes#156816
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
9c8f21c to
22381f6
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
4d232fd to
9c8493b
Compare
|
Some changes occurred in compiler/rustc_passes/src/check_attr.rs cc @jdonszelmann, @JonathanBrouwer Some changes occurred in compiler/rustc_attr_parsing cc @jdonszelmann, @JonathanBrouwer Some changes occurred in compiler/rustc_hir/src/attrs cc @jdonszelmann, @JonathanBrouwer Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt Some changes occurred in match lowering cc @Nadrieril Some changes occurred in coverage instrumentation. cc @Zalathar |
|
r? @folkertdev rustbot has assigned @folkertdev. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
(would like to take a look at this as well, should have time in the next few days) |
This comment has been minimized.
This comment has been minimized.
| #[no_mangle] | ||
| pub fn unroll_count() { | ||
| // CHECK-LABEL: @unroll_count | ||
| // CHECK: !llvm.loop ![[COUNT:[0-9]+]] |
There was a problem hiding this comment.
is checking the actual number tricky? or does it not map one-to-one?
There was a problem hiding this comment.
I am checking the number, it's at the very bottom of this file. Loop metadata looks like this:
bb7: ; preds = %bb3
call void @maybe_has_side_effect() #5
br label %bb1, !llvm.loop !11
...
!11 = distinct !{!11, !12}
!12 = !{!"llvm.loop.unroll.count", i32 5}|
Wow wonder what causes that insane improvement in |
|
It might be due to layout changes because some types got bigger so now something is aligned that wasn't before? Still that is a massive change. |
| MacroCall, | ||
| Crate, | ||
| Delegation { mac: bool }, | ||
| ForLoop, |
There was a problem hiding this comment.
Could you add this information as a field of Target::Expr, rather than its own target type?
I think this is confusing because now not all expressions produce Target::Expr
There was a problem hiding this comment.
Furthermore, does it make sense to combine these three to just Loop, or do they need to be separate targets?
There was a problem hiding this comment.
For #[loop_match] it's kind of nice to just have a Loop one corresponding to loop {}, but that's validated separately further down the line too.
There was a problem hiding this comment.
Ah right if that attribute is only valid on Loop then keeping the seperate targets is perfectly reasonable
| use super::*; | ||
| // tidy-alphabetical-start | ||
| static_assert_size!(BasicBlockData<'_>, 152); | ||
| static_assert_size!(BasicBlockData<'_>, 160); |
There was a problem hiding this comment.
The perf result of (probably) this change is a bad sad, can we improve that?
There was a problem hiding this comment.
I looked through the cachegrind diffs and I'm pretty sure the perf impact is mostly caused by adding a new field to an encoded struct (MIR Terminators), not by the size increase of the struct.
I did think about where to stash the data for a while. In THIR I was able to create a single collection for each Body, which means that the effect when not used is a single empty collection (or single zero byte when encoding/decoding). But in MIR, I want this to have a chance of surviving MIR optimizations, so putting attributes on something like a FxHashMap<BasicBlock, Vec<Attribute>> would mean that we'd need to repair that mapping any time a basic block was added or removed by a MIR transform. That sounds hard to maintain. So I think the only viable approach is to attach this to the Goto Terminators somehow, and there are often many terminators per basic block, so it's not shocking that the overhead surfaces here.
There are things I could do here so I'll try one, but you might not like it 😛
| } | ||
| } | ||
| ArgParser::NameValue(_) => { | ||
| cx.adcx().warn_ill_formed_attribute_input(ILL_FORMED_ATTRIBUTE_INPUT); |
There was a problem hiding this comment.
This can be .expected_list_or_no_args
There was a problem hiding this comment.
We should start linting against weird parsing practices :3 we totally can detect these. @JonathanBrouwer
| } | ||
| } | ||
|
|
||
| match l.meta_item().and_then(|i| i.path().word_sym()) { |
There was a problem hiding this comment.
This forgets to check whether l has any arguments, use the new meta_item_no_args method introduced in #155193 (which is in the queue atm)
| print_tup!(A B C D E F G H); | ||
| print_skip!(Span, (), ErrorGuaranteed, AttrId); | ||
| print_disp!(u8, u16, u32, u128, usize, bool, NonZero<u32>, Limit); | ||
| print_disp!(u8, u16, u32, u64, u128, usize, bool, NonZero<u32>, Limit); |
There was a problem hiding this comment.
Is the u64 used anywhere?
There was a problem hiding this comment.
Whoops. Previously I was accepting any u64 in the unroll count, but I changed to u32 because that's what clang does.
|
I think the image perf result is just something broken in collection. https://rust-lang.zulipchat.com/#narrow/channel/247081-t-compiler.2Fperformance/topic/sus.20perf.20results/near/599190868 |
|
Let's try again to see if it persists. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add unstable loop unrolling hint attributes
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (57c302e): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -2.1%, secondary 1.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -37.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.3%, secondary 0.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 511.125s -> 514.465s (0.65%) |
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add unstable loop unrolling hint attributes
|
The job Click to see the possible cause of the failure (guessed by this bot) |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (c1e5e0f): comparison URL. Overall result: ❌ regressions - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 0.1%, secondary 2.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.1%, secondary 5.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary -0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 511.17s -> 513.02s (0.36%) |
View all comments
Tracking issue: #156874
This adds as new attribute
#[unroll]/#[unroll(full)]/#[unroll(never)]/#[unroll(16)](or any u32).#[unroll]is behind a new feature gate#![feature(loop_hints)]because I intend to add an attribute for loop vectorization as well. If a user wants to turn off loop unrolling to locally minimize code size, LLVM may vectorize the loop even though it isn't unrolled which can produce a similar code size explosion.