Optimize the codegen for `Span::from_expansion` #140485

Jarcho · 2025-04-29T19:40:36Z

See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.

Also tried fully writing the function by hand:

sp.ctxt_or_parent_or_marker != 0
        && (
            sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER 
            || sp.len_with_tag_or_marker & PARENT_TAG == 0
        )

But that was no better than this PR's current use of match_span_kind.

rustbot · 2025-04-29T19:40:41Z

r? @jieyouxu

rustbot has assigned @jieyouxu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Urgau · 2025-04-29T20:09:40Z

Is this function this hot that it needs micro-optimized? Have you benchmark it on real world case?

Jarcho · 2025-04-29T20:46:20Z

There are more than 300 call sites (most from clippy) and it ends up being called more than once per expression while linting (in clippy). It's definitely used a lot and frequently called.

jieyouxu · 2025-04-30T04:31:21Z

r? @petrochenkov

compiler/rustc_span/src/span_encoding.rs

petrochenkov · 2025-04-30T14:12:08Z

I wonder how to write this better so it doesn't look like magic.
It took me some time and manual inlining to understand why this works.

petrochenkov · 2025-04-30T14:30:39Z

Perhaps like this:

        let ctxt = match_span_kind! {
            self,
            // All branches here, except `InlineParent`, actually return `span.ctxt_or_parent_or_marker`,
            // this makes the code optimize very well by eliminating the `ctxt_or_parent_or_marker` comparison.
            InlineCtxt(span) => SyntaxContext::from_u16(span.ctxt),
            InlineParent(_span) => SyntaxContext::root(),
            PartiallyInterned(span) => SyntaxContext::from_u16(span.ctxt),
            Interned(_span) => SyntaxContext::from_u16(CTXT_INTERNED_MARKER),
        };

petrochenkov · 2025-04-30T14:31:41Z

In any case, let's benchmark this on rustc too.
@bors try @rust-timer queue

Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.

bors · 2025-04-30T14:32:58Z

⌛ Trying commit 761d0ec with merge fb65274...

bors · 2025-04-30T16:43:59Z

☀️ Try build successful - checks-actions
Build commit: fb65274 (fb65274cfcdf4129ff7a2c2c9c7c63b4014b82d4)

rust-timer · 2025-04-30T18:23:32Z

Finished benchmarking commit (fb65274): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.5%	[-2.8%, -0.3%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.5%	[-2.8%, -0.3%]	2

Max RSS (memory usage)

Results (primary 0.7%, secondary 1.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.4%	[0.4%, 6.4%]	6
Regressions ❌ (secondary)	3.2%	[2.6%, 3.7%]	2
Improvements ✅ (primary)	-0.5%	[-0.6%, -0.4%]	8
Improvements ✅ (secondary)	-2.4%	[-2.4%, -2.4%]	1
All ❌✅ (primary)	0.7%	[-0.6%, 6.4%]	14

Cycles

Results (primary -0.7%, secondary -2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.4%, 0.5%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.9%	[-2.6%, -0.4%]	11
Improvements ✅ (secondary)	-2.5%	[-3.0%, -2.2%]	5
All ❌✅ (primary)	-0.7%	[-2.6%, 0.5%]	13

Binary size

Results (primary -1.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.1%	[-1.1%, -1.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.1%	[-1.1%, -1.1%]	1

Bootstrap: 770.233s -> 770.093s (-0.02%)
Artifact size: 365.54 MiB -> 365.45 MiB (-0.02%)

Jarcho · 2025-04-30T19:34:48Z

Went with a more verbose explanation of why this version optimizes better and why it gets the right result. Also that's a little more of a benefit than I was expecting from just rustc.

petrochenkov · 2025-04-30T21:35:42Z

Also that's a little more of a benefit than I was expecting from just rustc.

The clap_derive change is definitely spurious.
typenum and match-stress may be real, or may be just inlining working differently.

petrochenkov · 2025-04-30T21:36:03Z

@bors r+ rollup=maybe

bors · 2025-04-30T21:36:05Z

📌 Commit d9c060b has been approved by petrochenkov

It is now in the queue for this repository.

…chenkov Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.

…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#134034 (handle paren in macro expand for let-init-else expr) - rust-lang#139186 (Refactor `diy_float`) - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into) - rust-lang#140430 (Improve test coverage of HIR pretty printing.) - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`) - rust-lang#140505 (linker: Quote symbol names in .def files) - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses) r? `@ghost` `@rustbot` modify labels: rollup

…chenkov Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.

Rollup of 11 pull requests Successful merges: - rust-lang#134034 (handle paren in macro expand for let-init-else expr) - rust-lang#138703 (chore: remove redundant words in comment) - rust-lang#139186 (Refactor `diy_float`) - rust-lang#139343 (Change signature of File::try_lock and File::try_lock_shared) - rust-lang#139780 (docs: Add example to `Iterator::take` with `by_ref`) - rust-lang#139802 (Fix some grammar errors and hyperlinks in doc for `trait Allocator`) - rust-lang#140034 (simd_select_bitmask: the 'padding' bits in the mask are just ignored) - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into) - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`) - rust-lang#140505 (linker: Quote symbol names in .def files) - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses) r? `@ghost` `@rustbot` modify labels: rollup

…chenkov Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.

Rollup of 10 pull requests Successful merges: - rust-lang#134034 (handle paren in macro expand for let-init-else expr) - rust-lang#138703 (chore: remove redundant words in comment) - rust-lang#139186 (Refactor `diy_float`) - rust-lang#139343 (Change signature of File::try_lock and File::try_lock_shared) - rust-lang#139780 (docs: Add example to `Iterator::take` with `by_ref`) - rust-lang#139802 (Fix some grammar errors and hyperlinks in doc for `trait Allocator`) - rust-lang#140034 (simd_select_bitmask: the 'padding' bits in the mask are just ignored) - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into) - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`) - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses) r? `@ghost` `@rustbot` modify labels: rollup

…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`) - rust-lang#140509 (transmutability: merge contiguous runs with a common destination) - rust-lang#140519 (Use select in projection lookup in `report_projection_error`) - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses) - rust-lang#140536 (Rename `*Guard::try_map` to `filter_map`.) - rust-lang#140550 (Stabilize `select_unpredictable`) - rust-lang#140563 (extend the list of registered dylibs on `test::prepare_cargo_test`) - rust-lang#140572 (Add useful comments on `ExprKind::If` variants.) - rust-lang#140574 (Add regression test for 133065) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#140485 - Jarcho:from_expansion_opt, r=petrochenkov Optimize the codegen for `Span::from_expansion` See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions. Also tried fully writing the function by hand: ```rust sp.ctxt_or_parent_or_marker != 0 && ( sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER || sp.len_with_tag_or_marker & PARENT_TAG == 0 ) ``` But that was no better than this PR's current use of `match_span_kind`.

rustbot assigned jieyouxu Apr 29, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 29, 2025

This comment has been minimized.

Sign in to view

Jarcho force-pushed the from_expansion_opt branch from e332f70 to 761d0ec Compare April 29, 2025 20:48

rustbot assigned petrochenkov and unassigned jieyouxu Apr 30, 2025

petrochenkov reviewed Apr 30, 2025

View reviewed changes

compiler/rustc_span/src/span_encoding.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 30, 2025

petrochenkov removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 30, 2025

Optimize the codegen for Span::from_expansion

d9c060b

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 30, 2025

petrochenkov added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 30, 2025

Jarcho force-pushed the from_expansion_opt branch from 761d0ec to d9c060b Compare April 30, 2025 19:32

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 30, 2025

matthiaskrgr mentioned this pull request May 1, 2025

Rollup of 7 pull requests #140537

Closed

Zalathar mentioned this pull request May 1, 2025

Rollup of 11 pull requests #140541

Closed

Zalathar mentioned this pull request May 1, 2025

Rollup of 10 pull requests #140547

Closed

matthiaskrgr mentioned this pull request May 2, 2025

Rollup of 9 pull requests #140596

Merged

bors merged commit a2ae171 into rust-lang:master May 3, 2025
6 checks passed

rustbot added this to the 1.88.0 milestone May 3, 2025

Optimize the codegen for Span::from_expansion #140485

Optimize the codegen for Span::from_expansion #140485

Uh oh!

Conversation

Jarcho commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Apr 29, 2025

Uh oh!

This comment has been minimized.

Urgau commented Apr 29, 2025

Uh oh!

Jarcho commented Apr 29, 2025

Uh oh!

jieyouxu commented Apr 30, 2025

Uh oh!

Uh oh!

petrochenkov commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petrochenkov commented Apr 30, 2025

Uh oh!

petrochenkov commented Apr 30, 2025

Uh oh!

This comment has been minimized.

bors commented Apr 30, 2025

Uh oh!

bors commented Apr 30, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Apr 30, 2025

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

Jarcho commented Apr 30, 2025

Uh oh!

petrochenkov commented Apr 30, 2025

Uh oh!

petrochenkov commented Apr 30, 2025

Uh oh!

bors commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Optimize the codegen for `Span::from_expansion` #140485

Optimize the codegen for `Span::from_expansion` #140485

Jarcho commented Apr 29, 2025 •

edited

Loading

petrochenkov commented Apr 30, 2025 •

edited

Loading