Skip to content

Conversation

@cstamas
Copy link
Member

@cstamas cstamas commented Oct 29, 2025

Problem: Our caches (all of them) are over 10GB allowed cache space. Due this, GH is dropping caches and our jobs end up exposed to HTTP 500 that since migration Central happens quite often. We used the plain actions/cache to store Mimir caches for each node (200-400 MB depending on which node we talk about), and we have 3 active branches (3.9, 4.0 and master), that simply totals out the 10 GB limit.

This PR makes we have one "cache blob" per OS, so each OS has one cache blob (times three, for 3.9, 4.0 and master).

Changes:

  • implement "always save" pattern (see cache doco)
  • we keep cache "per lane" (per OS)
  • 3 kind of builds (initial, full and integration-tests) all use same (per OS) cache at start and at end uploads cache as artifact (1 day retention)
  • at end there is a matrix job "consolidate caches" (runs on all 3 OSes) that downloads caches and consolidate them and save cache
  • hence, we will have 3 OS specific caches

We are over 10GB allowed cache space. Hence, we need a bit of
more improved cache handling.

Changes:
* disabled cache maintenance on Mimir (if this works, we will end up with one cca 400MB cache)
* implement "rolling caches", we always get miss and older cache, and will save newer cache
* initial-build gets `master-XXX` cache, uses it, uploads it as `cache-initial-build` artifact
* full-builds gets `master-XXX` cache, uses it, uploads it as `cache-full-build-XXX` artifact
* integration-tests get `master-XXX` cache, uses it, uploads as `cache-integration-tests-XXX` artifact
* introduce last "consolidate" job (depends on all these before): it downloads all `cache-*` and saves to cache as `master-XXX`
@cstamas cstamas self-assigned this Oct 29, 2025
@cstamas cstamas marked this pull request as ready for review October 31, 2025 17:50
@cstamas cstamas added this to the 4.1.0 milestone Oct 31, 2025
@cstamas cstamas merged commit 304791e into apache:master Nov 3, 2025
41 of 42 checks passed
@cstamas cstamas deleted the consolidate-caches branch November 3, 2025 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants