Use self-hosted org runners to speed up CI #3065

MabezDev · 2025-01-30T13:36:52Z

We should be able to add the following runner names to the CI workflow:

macos-m1-self-hosted
linux-x86_64-self-hosted
windows-x86_64-self-hosted

bugadani · 2025-01-31T10:44:23Z

So, the runners are (at least the macos one) zippy, but:

We only have 3 self-hosted runners, as opposed to a large number of github-hosted ones
Even if our runners are 2x as fast as the github runners (which is reasonable), we are testing 7 chips, so we'd be at a net loss if we just replaced the runners blindly.
We are running every check on every PR, which is fairly excessive.
At least on my PC, CPU usage is rather low when I run a command like xtask build-examples.
A common point of time waste is installing and reinstalling the Xtensa toolchain (it's around a minute on the github runners). On self-hosted runners, we would only have to do it once per release, provided that espup can actually not fumble an update.

We can probably improve CI times if we didn't do every single job on separate runners, but instead tried to parallelize the tasks on the same runner. We could probably easily build 2 or 3 chips on the macos runner alone, in the same time it takes to build one.

We are building each and every example for every PR. cargo check is 3-10x less time than cargo build, I think we can save ~4-5 minutes on each runner by not building everything. I appreciate the concern that cargo build catches linker errors as well, but we are already building a large number of HIL tests, so I don't know how valuable this is.

We are testing all test cases on every PR, which is a waste of time. If a PR doesn't generate a different binary for a test than something that already passed, it shouldn't be run. I don't know how we should approach the bookkeeping here, though, but the HIL test runtime can only be optimized so much, and we'll just have more and more tests.

We should also look into running less checks on a PR push, and doing a more comprehensive set of checks in the merge queue. We should probably run HIL tests on PR update, with a bit more strict checking against warnings to catch the low hanging fruit, but we can leave linting and documentation builds to the MQ. We can also probably just replace the MSRV checks by building the HIL tests with the MSRV compilers.

MabezDev · 2025-01-31T10:51:46Z

At least on my PC, CPU usage is rather low when I run a command like xtask build-examples

Maybe we can just drop in rayon here where we iterate the examples for a quick win 🤔.

Even if our runners are 2x as fast as the github runners (which is reasonable), we are testing 7 chips, so we'd be at a net loss if we just replaced the runners blindly.

Do you know if the order of specifying runs-on matters? Like we can prioritize using the self hosted runners and then fall back to github runners (I think we will always want this because your analysis only covers a single PR, not if we have 2-3 being pushed to at a time, or multiple in the queue).

A common point of time waste is installing and reinstalling the Xtensa toolchain (it's around a minute on the github runners). On self-hosted runners, we would only have to do it once per release, provided that espup can actually not fumble an update

Can we cache this some how? I'm not sure if downloading from github runner cache or github cdn will make any difference, but might be work exploring.

bugadani · 2025-01-31T10:55:21Z

Can we cache this some how? I'm not sure if downloading from github runner cache or github cdn will make any difference, but might be work exploring.

The self-hosted runners don't start from 0 every time. The work folder isn't cleared, the OS isn't set up from 0, there is a lot of state retained between runs. On GHA we can probably include the toolchain folder in the cache, but we'd run out of cache space. We have 10.02/10.00 GB currently used without that 🤣

Maybe we can just drop in rayon here where we iterate the examples for a quick win 🤔.

I'm more thinking about cargo-batch. The annoying part is always build logs, just shoving those into a single stream will be unreadable.

Do you know if the order of specifying runs-on matters? Like we can prioritize using the self hosted runners and then fall back to github runners (I think we will always want this because your analysis only covers a single PR, not if we have 2-3 being pushed to at a time, or multiple in the queue).

We can specify a number of tags, and the runners that matches ALL of them will run the job. I don't know if we have common tags between a GHA runner and a self-hosted one, but generally selecting a runner isn't very convenient.

bjoernQ · 2025-01-31T11:53:00Z

We are building each and every example for every PR. cargo check is 3-10x less time than cargo build, I think we can save ~4-5 minutes on each runner by not building everything. I appreciate the concern that cargo build catches linker errors as well, but we are already building a large number of HIL tests, so I don't know how valuable this is.

I think we changed from check to build before we had hil-tests built - I agree we should catch most linker errors via the hil-tests. Additionally, we could do the build checks in nightly - just to be safe(er)

bugadani · 2025-01-31T11:53:37Z

We can specify a number of tags, and the runners that matches ALL of them will run the job. I don't know if we have common tags between a GHA runner and a self-hosted one, but generally selecting a runner isn't very convenient.

Actually, a comment on https://stackoverflow.com/questions/77997951/can-i-specify-github-actions-runs-on-as-either-one-label-or-another-or-logic-in proposes a cheeky way to pick either self-hosted or gh-hosted runners: just label self-hosted runners with ubuntu-latest. It's unclear what github would prefer, I guess we can run an experiment to find out, although the result would be somewhat confusing when a windows machine would pick up a ubuntu-latest build.

Also I just realized we have 7 VMs in addition to the 3 self-hosted machines.

MabezDev added the CI Continuous integration/deployment label Jan 30, 2025

jessebraham added this to esp-rs Jan 30, 2025

github-project-automation bot moved this to Todo in esp-rs Jan 30, 2025

bugadani self-assigned this Jan 30, 2025

MabezDev added this to the 1.0.0-beta.1 milestone Jan 30, 2025

SergioGasquez mentioned this issue Jan 30, 2025

Add an input to select the host target esp-rs/xtensa-toolchain#32

Closed

bugadani mentioned this issue Jan 31, 2025

Use self-hosted mac mini for the small, auxiliary tasks #3072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use self-hosted org runners to speed up CI #3065

Use self-hosted org runners to speed up CI #3065

MabezDev commented Jan 30, 2025

bugadani commented Jan 31, 2025 •

edited

Loading

MabezDev commented Jan 31, 2025

bugadani commented Jan 31, 2025 •

edited

Loading

bjoernQ commented Jan 31, 2025

bugadani commented Jan 31, 2025 •

edited

Loading

Use self-hosted org runners to speed up CI #3065

Use self-hosted org runners to speed up CI #3065

Comments

MabezDev commented Jan 30, 2025

bugadani commented Jan 31, 2025 • edited Loading

MabezDev commented Jan 31, 2025

bugadani commented Jan 31, 2025 • edited Loading

bjoernQ commented Jan 31, 2025

bugadani commented Jan 31, 2025 • edited Loading

bugadani commented Jan 31, 2025 •

edited

Loading

bugadani commented Jan 31, 2025 •

edited

Loading

bugadani commented Jan 31, 2025 •

edited

Loading