-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel GH actions workflow for Nixpkgs eval #356023
Conversation
.github/workflows/eval.yml
Outdated
- name: Enable swap | ||
if: env.mergedSha | ||
run: | | ||
sudo fallocate -l 10G /swapfile | ||
sudo chmod 600 /swapfile | ||
sudo mkswap /swapfile | ||
sudo swapon /swapfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is unused, we can just drop it for now. Especially given that we have another swap already enabled that can act as a canary:
- name: Enable swap | |
if: env.mergedSha | |
run: | | |
sudo fallocate -l 10G /swapfile | |
sudo chmod 600 /swapfile | |
sudo mkswap /swapfile | |
sudo swapon /swapfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If allocating swap is not slowing down the gh action, I would recommend keeping it as a backup. You never know what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it such that we have a constant chunk size, so that we effectively have an upper limit to memory usage. Then we definitely don't need swap :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohnRTitor there is already a swap file. So we would see if it is starting to swap before it is a problem and we can add this back as needed.
I added several improvements to this script: tweag#100 notably more useful json output. |
I like this approach already. I want to incorporate this code in one form or the other in nixpkgs-review as it will make evaluating on normal laptops possible again. |
Please do! Currently evaluating uses too much memory and time! |
@GaetanLepage wants to look into integrating it into nixpkgs-review. |
@infinisil I would suggest to just address all comments for this pull request for now and maybe merge my pull request but not to extend the scope for the first pull request too much. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
Agreed, due to slowness of Ofborg recently a few commiters are merging PRs without waiting for a successful eval which can take up to 13-15 hours. (I am guilty of the act myself) This fast but dangerous approach of review makes nixpkgs prone to a tree with broken eval. Having this basic CI check would help to detect issues beforehand, as we plan to phase out Ofborg in less than two months. |
82e4957
to
1348471
Compare
The nixpkgs-review counterpart is being worked on here: Mic92/nixpkgs-review#429 |
…ttrNamesOnly The attrNamesOnly feature is used by pkgs/top-level/release-attrpaths-superset.nix to return a superset of all attributes that might be built by Hydra. Before this change it would include all attribute paths to derivations that could be found recursively, ignoring the recurseForDerivations and recurseForRelease attributes that control Hydra's recursion. This had the effect that it would end up with ~266000 attributes, most of which definitely won't be built by Hydra. We can remove those while staying true to the superset notion to end up with just ~97000, a reduction of ~63.6%! This also comes with an eval time reduction from 31.7s to 18.7s (-41.0%)! As an example, all derivations in pypy310Packages don't get included anymore, because it doesn't have a `.recurseForDerivations` set. As a nice side effect, with `--arg enableWarnings false`, no more warnings are printed, because things like `checkpointBuildTools.mkCheckpointedBuild` (which is deprecated) isn't being recursed to anymore.
7c36b3d
to
9360b35
Compare
This is done now imo! I added some docs and some comments, I also updated the PR description. Pinging @Mic92 @Erethon @balsoft @JohnRTitor @a-kenji @azuwis @TheGiraffe3 for review and approval |
Motivated by ofborg struggling [1] and its evaluations taking too long, inspired by Jörg's initial PR [2] and Adam's previous attempt to parallelise Nixpkgs evaluation [3], this PR contains initial work to relief ofborg from its evaluation duty by using GitHub Actions to evaluate Nixpkgs. For now this doesn't take care of all of what ofborg does, such as requesting appropriate reviewers or labeling mass rebuilds, but this can be follow-up work. [1]: https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025?u=infinisil [2]: NixOS#352808 [3]: NixOS#269403 Co-Authored-By: Jörg Thalheim <[email protected]> Co-Authored-By: Adam Joseph <[email protected]>
Successfully created backport PR for |
Follow-up fix: #357965 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/equinix-metal-cta-infra-short-update/56538/1 |
I'm seeing ofborg eval failing on staging https://gist.github.com/GrahamcOfBorg/f50183ae86ce9f339b6b41cc33e14619 but the parallel evals are passing: #359106 |
staging will be fixed very soon. In this particular case the GH actions ran first, then staging broke, and finally ofborg got around to the PR. |
WIP follow-up towards adding rebuild labels: #359704 |
This PR primarily introduces a new GitHub Action workflow to effectively do a full Hydra evaluation for every PR, paving the way towards relieving ofborg of its evaluation duty. This is motivated by https://discourse.nixos.org/t/infrastructure-announcement-the-future-of-ofborg-your-help-needed/56025, see also #355847.
To make this as efficient as possible, 4 workflow jobs are spawned in parallel (one for each supported system, as originally experimented by @Mic92 in #352808), while each job makes full use of its 4 cores and 16GB of memory (based off amjoseph's idea in #269403 and initial work in #269356).
I've also implemented it in a way that limits maximum memory usage as Nixpkgs grows. CI will only take longer to finish, instead of OOMing. Currently it takes about 5 minutes, check out this test run.
Note that ofborg does more than just evaluate Nixpkgs, but this PR can be used as a base to also implement output path comparisons to assign labels based on changed paths, to request appropriate maintainers, and do evaluation time comparisons.
Ping @philiptaron @Mic92 @Erethon @balsoft @JohnRTitor @a-kenji
Things done
This work is funded by Tweag and Antithesis ✨
Add a 👍 reaction to pull requests you find important.