-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Optimize the codegen for Span::from_expansion
#140485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
Is this function this hot that it needs micro-optimized? Have you benchmark it on real world case? |
There are more than 300 call sites (most from clippy) and it ends up being called more than once per expression while linting (in clippy). It's definitely used a lot and frequently called. |
e332f70
to
761d0ec
Compare
@@ -306,8 +306,16 @@ impl Span { | |||
/// Returns `true` if this span comes from any kind of macro, desugaring or inlining. | |||
#[inline] | |||
pub fn from_expansion(self) -> bool { | |||
// If the span is fully inferred then ctxt > MAX_CTXT | |||
self.inline_ctxt().map_or(true, |ctxt| !ctxt.is_root()) | |||
// Optimizes much better than `inline_ctxt` since it avoids comparing against `lo_or_index`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Optimizes much better than `inline_ctxt` since it avoids comparing against `lo_or_index`. | |
// Optimizes much better than `inline_ctxt` since it avoids comparing against `ctxt_or_parent_or_marker`. |
I wonder how to write this better so it doesn't look like magic. |
Perhaps like this: let ctxt = match_span_kind! {
self,
// All branches here, except `InlineParent`, actually return `span.ctxt_or_parent_or_marker`,
// this makes the code optimize very well by eliminating the `ctxt_or_parent_or_marker` comparison.
InlineCtxt(span) => SyntaxContext::from_u16(span.ctxt),
InlineParent(_span) => SyntaxContext::root(),
PartiallyInterned(span) => SyntaxContext::from_u16(span.ctxt),
Interned(_span) => SyntaxContext::from_u16(CTXT_INTERNED_MARKER),
}; |
In any case, let's benchmark this on rustc too. |
This comment has been minimized.
This comment has been minimized.
Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (fb65274): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 0.7%, secondary 1.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -0.7%, secondary -2.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 770.233s -> 770.093s (-0.02%) |
761d0ec
to
d9c060b
Compare
Went with a more verbose explanation of why this version optimizes better and why it gets the right result. Also that's a little more of a benefit than I was expecting from just rustc. |
See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.
Also tried fully writing the function by hand:
But that was no better than this PR's current use of
match_span_kind
.