-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nightly release CI action is broken #1591
Comments
This seems to be the culprit but I don't really understand it yet...
|
The last successful nightly release was 3rd September 2024: so I presume that one of the commits since that date caused the problem? I hope it's not one of mine! :-) |
I created a PR that should address this issue: #1592 |
Thanks @cmuellner. 👍 |
Any idea why the nightly build still doesn't seem to be working or, at least, hasn't completed and uploaded a complete set of built artifacts yet? |
Still something wrong I guess? Only sources in the latest release again.
Edit: oh, out of disk space? Even though it's supposed to clean up after itself as far as I can see?
Does it maybe need to do more to clean up? |
It looks like it is the "create release" job that is running out of space. It downloads all of the artifacts from previous steps, which take up 25 GB but the runner only has 21 GB available. Each job is run on a separate runner, so the space needs to be cleaned up in this job too. |
The CI seems to be regularly (I've observed this multiple times since we added the musl builds into the CI/CD) broken because of git/musl issues:
I'm not sure what the best way to move forward here is. |
That's the wrong URL as far as I can see: Edit: ah - sorry - ignore that... |
Maybe we are hitting an issue with HTTP (e.g. http.postBuffer is not enough to hold the pack file), does doing a shallow clone solve this ? Yocto at some point used this mirror: https://github.com/kraj/musl/tree/master it seems to be up to date. Maybe @richfelker can help. |
Can you provide a minimal test case to reproduce the failure to git-clone? I just cloned successfully. |
FWIW if you're re-cloning on every CI job, the polite thing to do is make it a shallow clone. The more polite thing to do would be to cache git clones. But I don't think this is related to the problem. |
FWIW that's what this recent PR was intended to deal with but it's closed pending further investigations:
Do you know what this would involve for this repo's actions? |
No, I don't. I've really avoided getting into CI workflows myself because I deem them a gigantically irresponsible abuse of resources. So I'm not sure what tooling there is to fix this (avoid downloading the same thing thousands of times), but it's something I very much hope someone is working on. |
I'm in the same camp. However, there is a significant demand for pre-built releases of this repo. A possible solution is to have a mirror repo on Github, which regularly pulls the changes from the official repo. This reduces the load on upstream git servers. |
Another possibility might be to |
In case this helps at all (may belong elsewhere?):
Notes:
|
On one hand if we run CI very often it's indeed a waste of resources, on the other hand it's useful for regression testing (and we can't solve that with a mirror repo btw, we can't check for example pull requests that way, and it's super useful), and it's an even worse waste of resources to have the users of this repo (or their CIs) building this repo again and again. That being said there are a few ways to optimize the flow, here are a few suggestions:
|
Tarball repositories that I've seen (e.g. see above) suggest that LZ compression may be even better than XZ (at least from a compression perspective, not sure if it's slower?). |
PR exists (#1605).
I also thought of this, but I have zero experience with it. It is hard to get up and running if it cannot be tested locally.
We trigger the build every night, but it will not download/build anything if there were no changes in the last 24 hours.
I will look into this. I usually use Thanks! |
I'm working on a branch on my fork on this topic. Not happy with it quite yet. TShapinsky#2 |
Another way toolchain size can be reduced is if stripped versions of the programs are used. A good portion of the dependencies already support a variant of |
I'll play a bit with install-strip-host, IMHO we shouldn't strip target libraries. |
BTW we install qemu as part of the toolchain for no reason (users would probably use their distro's qemu package), along with roms etc, we also install both 32/64bit qemu regardless the toolchain. Same goes for dejagnu, they both come from make-report (btw we also clone their repos each time, so I'll include them in the cache too). I'll see if I can clean this up a bit. Any idea if we need amdgpu-arch / nvptx-arch tools on llvm ? Is there a way to disable them (I don't think they make much sense in a cross-compile toolchain) ? |
I think as far as the dependencies installed to make the report, the tarball should probably be created before running Additionally, for fun I put together a workflow which incorporated |
+1 for ccache. I use it all the time: I didn't realise that it could maybe also be used in the CI GitHub actions. |
That's also my approach, I also think we should install things on /mnt (the ssd partition that has 14GB guaranteed) instead of /opt but I'm still checking it out, I'll update the pull request during the weekend with more stuff.
Although I like the idea of using ccache across runs to speed up the process, we complicate the workflow and add yet another thing to debug in case things break, we'll also need to either create one cache per host/build environmment (e.g. one for ubuntu-22.04 and another for ubuntu-24.04), or try to combine them, which may complicate things even further. Finally in order for this to survive across runs we'll need to upload the cache(s) as artifacts wasting storage resources. I checked out your approach and you upload a cache for each build configuration, this doesn't make much sense since the ccache is not used for the target binaries (libc/libgcc etc), but the host binaries. Also if we are going to use a persistent cache across runs it would be better to do it for both the submodule cache and ccache, we could even use the submodule commit hashes to invalidate them in case for example we update the compiler. |
I've been optimizing the approach a bit since then. The order of operations is
My PR #1607 already does this, it just uses the hash of the In the case of any hashing, I think it's probably most effective to only have caches be generated by the @mickflemm have you seen this action? https://github.com/easimon/maximize-build-space it is a hack like what is currently being done, but can give you something like 60GB of build space at the cost of 2 seconds of run time. It's what I've been using in my tests. |
In case it matters/helps, there's another open PR that changes the CI to only build on
|
ACK will check it out, at this point persistent cache is further down on my todo list, I 'm not there yet.
I'd prefer to just use the default branch so that the workflow can be triggered manually, it won't make much of a difference for upstream but it'd help when debugging. Also I'm still not sure the persistent cache idea is ideal, most of the commits in this repo are for bumping up submodule versions, so in most cases we'll be invalidating the cache anyway.
I've freed up to 56GB but it doesn't make much sense, we don't need this much space and we don't utilize /mnt at all, as for the speed of the process at this point most of the time is the rms, when I'm done I'll just wrap those in a script and let them run in the background, there is no reason for the CI to wait for this to finish it can continue the process asap. Regarding the action you mention I'd prefer to not add a dependency for something as simple as doing rm (or even apt remove), it'll be yet another thing to keep an eye for during maintenance. |
I reverted this though since github hasn't switched ubuntu-latest to ubuntu-24.04 yet, however I'm still in favor of this approach, it also affects the cleanup part since on each iteration they install different packages. |
Sorry - I failed to realise that it was actually another of your own commits! :-D |
No hurry or worries. When it's ready I'll open a PR over here and you all can decide if it's something you want or not! :) |
I added most of what we discussed here, along with some further fixes and optimizations in #1608, I also allowed the submodule cache to persist across runs based on the hash of current state of submodules as mentioned above. The build workflow works as expected, I also updated the nightly build workflow but it needs further updates since the create-release and upload-release-asset actions are deprecated / unmaintained, it works for now (with the PR applied) but it needs more work, I'll come back with another PR for it when I get some time. @cmuellner would you mind merging #1608 ? |
BTW is it ok if we remove one of the two "nightly" parts from the filename, currently it's "nightly-tag-nightly.tar.gz" ? As for the ccache approach @TShapinsky how about this: We have another job, not part of build / nightly, that we can trigger once a week or manually to generate a combined ccache for each os environment (combined: we just compile gcc, llvm, binutils etc to populate the cache, no install or anything), it should be less than 4G each, so we have two of them (or one if we just use ubuntu-latest as I suggested) and we save them as normal caches (not as artifacts). Then the build / nightly process just restores the ccache for its os environment if found and uses it, else it's business as usual. Also if we see things breaking (which is a possibility with ccache), we remove the caches from the list of caches (if they are artifacts it would be more complicated). Even with two caches and the submodule cache, we should be less than 10G, so within the size for storing caches, and if one component changes we'll need to invalidate them all together so they are one batch (although it doesn't make sense to combine them since submodule cache is os independent and we'll duplicate the same thing). We can even have a workflow for init/update of all caches (including submodule cache), that handles rotating etc, and let the build workflow be only a consumer of the caches. The more I think about using ccache on the build workflow, the more I don't like it, if the cache doesn't exist we need someone to populate it (so others depend on it), when populating the ccache the build process would take longer (since it'll be full of cache misses). If all jobs run in parallel (which is the desired scenario, and it happens often when runners are available), this is quite messy and results the workflow to take longer. Given that often the commits on this repo update submodules (hence the caches would be invalidated), we may not win any time at all, to the contrary we may slow things down with ccache (with the submodule it's different because it's the same across all configurations so it can easily be part of the build workflow). Your approach of having one ccache per configuration makes it less messy on one hand (it doesn't break parallelization, it would stall the workflow though in case the ccache is invalidated), on the other we polute the list of artifacts and there is a lot of duplication in there so it's a different kind of mess. |
Pre-built binaries are back, thanks @mickflemm! https://github.com/riscv-collab/riscv-gnu-toolchain/releases/tag/2024.11.22 |
Thanks @mickflemm (and @kito-cheng). 👍 |
Fixed by Thanks a lot again @mickflemm. 👍 |
See here:
Only source bundles generated, no binary toolchains.
The text was updated successfully, but these errors were encountered: