Skip to content

pub async fn impl is monomorphized when func itself is monomorphized #143290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

azhogin
Copy link
Contributor

@azhogin azhogin commented Jul 1, 2025

Implentation coroutine (func::{closure#0}) is monomorphized, when func itself is monomorphized.

Currently, when pub async fn foo(..) is exported from lib and used in several dependent crates, only 'header' function is monomorphized in the defining crate. 'header' function, returning coroutine object, is monomorphized, but the coroutine's poll function (which actually implements all the logic for the function) is not. In such situation, func::{closure#0} will be monomorphized in every dependency.

This PR adds monomorphization for func::{closure#0} (coroutine poll function), when func itself is monomorphized.

Simple test with one lib async function and ten dependent crates (executable) that use the function, shows 5-7% compilation time improvement (single-threaded).

@rustbot
Copy link
Collaborator

rustbot commented Jul 1, 2025

r? @oli-obk

rustbot has assigned @oli-obk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 1, 2025
@oli-obk
Copy link
Contributor

oli-obk commented Jul 2, 2025

I don't think this needs a -Z flag. It makes a lot of sense to just change this everywhere. We can then benchmark it in the benchmark suite, too.

A similar change could be done for iterator or closure returning functions.

On that note: instead of collecting nested bodies in general, wouldn't it be slightly more correct to collect types that are in the opaque (non-opque types must already have all their impls monomorphized, as they are publicly reachable) return type of monomorphized functions? For opaque types we'd only need to monomorphize the trait impls that the opaque type has in its bounds

cc @compiler-errors for thoughts as you wrote #135314

@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from d6bb74b to a6b81d5 Compare July 2, 2025 11:54
@rustbot
Copy link
Collaborator

rustbot commented Jul 2, 2025

Some changes occurred in coverage tests.

cc @Zalathar

@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from a6b81d5 to d6e2c24 Compare July 2, 2025 11:55
@azhogin azhogin changed the title -Zlink-pub-async-impls flag added to monomorphize pub async fn impl pub async fn impl is monomorphized when func itself is monomorphized Jul 2, 2025
@azhogin
Copy link
Contributor Author

azhogin commented Jul 2, 2025

Flag removed, behaviour changed to be default.
"opaque return type of monomorphized functions" - not yet.

@rust-log-analyzer

This comment has been minimized.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 2, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 2, 2025
@bors
Copy link
Collaborator

bors commented Jul 2, 2025

⌛ Trying commit d6e2c24 with merge 9443527...

bors added a commit that referenced this pull request Jul 2, 2025
pub async fn impl is monomorphized when func itself is monomorphized

Implentation coroutine (`func::{closure#0}`) is monomorphized, when func itself is monomorphized.

Currently, when `pub async fn foo(..)` is exported from lib and used in several dependent crates, only 'header' function is monomorphized in the defining crate. 'header' function, returning coroutine object, is monomorphized, but the coroutine's poll function (which actually implements all the logic for the function) is not. In such situation, `func::{closure#0}` will be monomorphized in every dependency.

This PR adds monomorphization for `func::{closure#0}` (coroutine poll function), when func itself is monomorphized.

Simple test with one lib async function and ten dependent crates (executable) that use the function, shows 5-7% compilation time improvement (single-threaded).
@bors
Copy link
Collaborator

bors commented Jul 2, 2025

☀️ Try build successful - checks-actions
Build commit: 9443527 (9443527f3f94abad97e5cecc88f429e299eac1f0)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9443527): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
6.0% [0.8%, 13.6%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -3.1%, secondary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.4% [0.4%, 5.0%] 4
Improvements ✅
(primary)
-3.1% [-3.8%, -2.5%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.1% [-3.8%, -2.5%] 2

Cycles

Results (primary 2.5%, secondary 4.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.5% [2.0%, 3.7%] 11
Regressions ❌
(secondary)
5.6% [0.9%, 14.1%] 9
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.2% [-1.2%, -1.2%] 1
All ❌✅ (primary) 2.5% [2.0%, 3.7%] 11

Binary size

Results (secondary 9.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
9.4% [1.2%, 15.8%] 5
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Bootstrap: 462.08s -> 462.853s (0.17%)
Artifact size: 372.26 MiB -> 372.28 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 3, 2025
@oli-obk
Copy link
Contributor

oli-obk commented Jul 4, 2025

Some tests need blessing

The performance regression is expected, as the regressing benchmark has public async fn as a library build, and will thus now do the work that normally happened in the downstream crate.

@Zalathar
Copy link
Contributor

Zalathar commented Jul 6, 2025

In particular, you will need to set build.profiler = true in your bootstrap.toml config to properly bless the modified coverage test.

@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from d6e2c24 to 57c2602 Compare July 8, 2025 19:25
@rust-log-analyzer

This comment has been minimized.

@azhogin
Copy link
Contributor Author

azhogin commented Jul 8, 2025

It looks like '-Z print-type-sizes' shows different results for x86_64 and aarch64 for some tests. Are there some helpfull flags to remove the difference between targets (except -C panic=abort)?

Also, I performed deeply-nested-multi test locally with one dependent crate with usage of main async func poll. And compilation time is the same (lib crate + use crate compilation with/without this change).

@oli-obk
Copy link
Contributor

oli-obk commented Jul 9, 2025

you could limit the test to one platform with //@only-x86_64

@azhogin azhogin force-pushed the azhogin/link-pub-async-impls branch from 57c2602 to e6b35f4 Compare July 9, 2025 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants