-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bazel spends a lot of time uploading events/cache in CI #26
Comments
Looking at that run's logs it doesn't look like it's fetching much from cache, it looks more like it's building a lot from scratch and pushing it to cache. E.g.
doesn't include any Would you have expected more cache hits for this CI run? |
Thanks for looking at this.
Yes, I'm puzzled why there are no cache hits. Rerunning the job is fast enough and shows cache hits: https://github.com/tweag/gazelle_cabal/runs/5691354186?check_suite_focus=true |
How long before did the last build run? Perhaps it was GC'ed on BuildBuddy's side? |
On March 21st we should have had these entries cached by this job. However on March 24th the same entries weren't there when this other job run. |
According to Slack there doesn't seem to be a guaranteed period. |
Posted this in Slack, but posting here as well: Hey @facundominguez! Unfortunately we don't guarantee a cache TTL for our free-tier users. How frequently artifacts expire from the cache is based on an LRU, so if there is a period during which the artifacts aren't hit - it's likely that they will get evicted. The exact duration depends on overall system load and isn't very straightforward to predict. We are working on some architectural changes that will allow us to offer longer / more predictable TTLs for both free-tier and paid users. I'm not sure the title of that bug is exactly right. If you look at the timing profile of one of the invocations from the run - you can see very little time is spent uploading artifacts (on the order of a few seconds throughout the entire build): https://app.buildbuddy.io/invocation/494d73a3-30d7-4db6-8e58-b0c57f7c22ed#timing Most of the time seems to be spent actually performing the build. A couple recommendations I'd make to speed up that build:
|
I answered in slack and also here. Thanks for the tips. I'll gave them a try. Just being able to see when caches were last discarded could help to discern if there are other configuration issues. The title is due to observing the builds while they were running. bazel spends a lot of time printing these four messages at the end of each invocation:
|
Hm, that invocation claims to have taken just 10 minutes total - where are you seeing how long it's taking to print those four messages? Taking a look at the Github actions, it looks like there are also a ton of Bazel invocations that are being run back-to-back in serial rather than in parallel. There are about 9 Bazel invocations that are run back-to-back (taking ~10 min each) Running this many invocations back to back, it might also be worth setting this Bazel flag:
|
Here's an invocation which takes longer https://github.com/tweag/gazelle_cabal/runs/5635411297?check_suite_focus=true#step:7:275 I see the time it takes to print those messages by looking at the console output while it is being produced by the ongoing job. |
Thanks, we already have this here. |
I'd try using remote.buildbuddy.io instead of cache.buildbuddy.io to improve upload performance. Separately, I think increasing the --jobs= flag (currently implicitly set to 2) and running the Bazel invocations in parallel instead of in serial would speed up this pipeline in general. It also seems like the invocations from the different Github actions steps (i.e. "Build & test", "Test the example", "Test alternative dependencies") aren't hitting each-others cache. For example if you look at these two actions for the same target (@bazel_gazelle//language/proto:proto) run 25 minutes apart in separate invocations:
They have different digests, different input root digests, and different output digests. To debug this, I'd recommend capturing --execution_log_json_file= from invocations in those 3 different steps and following the cache hit debugging instructions here to find the root cause: |
Here is an example job where bazel completes successfully each time, yet it spends a lot of time at the end talking to buildbuddy.
Ideally, we wouldn't spend so long. We need to either fix our buildbuddy configuration or switch to using caching from github actions.
The text was updated successfully, but these errors were encountered: