ci: mirror CI service images to GHCR (fork-safe Docker Hub pulls, groundwork)#40880
Open
rusackas wants to merge 1 commit into
Open
ci: mirror CI service images to GHCR (fork-safe Docker Hub pulls, groundwork)#40880rusackas wants to merge 1 commit into
rusackas wants to merge 1 commit into
Conversation
Add a scheduled/dispatchable workflow that mirrors the Docker Hub service- container images CI depends on (postgres, redis, mysql, presto) into this repo's GHCR namespace under a ci/ prefix. This is the groundwork for replacing anonymous Docker Hub service pulls (which share the runner-IP rate limit and flake on master/same-repo PRs) with public GHCR pulls that need no credentials — so the consuming workflows can drop the credentials: blocks entirely and fork PRs work unchanged. Adding credentials: directly to the service blocks (as #40875 did) breaks forks: empty secrets resolve to '' and GitHub rejects the workflow at parse time. The matrix mirrors only the images declared as services: containers. The bde2020 hive-metastore image pulled via docker compose is left for a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4 tasks
Member
Author
|
Follow-up repoint is staged as a draft in #40882 — ready to rebase + un-draft once this merges, the mirror runs, and the four |
rusackas
added a commit
that referenced
this pull request
Jun 9, 2026
Repoint the postgres/redis/mysql/presto service containers across the E2E, Python-Integration, and Presto/Hive workflows at the GHCR mirror (ghcr.io/apache/superset/ci/*). Public GHCR images pull without Docker Hub's anonymous rate limit and without any credentials, so this removes the service-pull flakes on master/same-repo PRs while keeping fork PRs working — unlike credentials: on the service blocks (#40875, reverted in #40879), where empty fork secrets resolve to '' and fail the workflow at parse time. Depends on the GHCR mirror being populated and public — see the mirror workflow in #40880. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SUMMARY
Groundwork for fixing CI's Docker Hub service-pull flakes without breaking fork PRs — the failure mode that #40875 hit and #40879 reverts.
Root cause of the #40875 fork breakage: adding
credentials:to aservices:container looks safe, but on fork PRs theDOCKERHUB_USER/DOCKERHUB_TOKENsecrets are unavailable, so the templated values resolve to empty strings. GitHub Actions validates thecredentials:block at job-setup time and rejects an emptyusername/passwordwith a hard template error:So every fork PR's Python-Integration / E2E / Presto-Hive job died at "Set up job". Empty creds do not fall back to anonymous pulls — they fail to parse. (See run 27179055813 on a fork.)
This PR (groundwork): add a scheduled/dispatchable workflow that mirrors the four Docker Hub service-container images CI relies on —
postgres:17-alpine,redis:7-alpine,mysql:8.0,starburstdata/presto:350-e.6— into this repo's GHCR namespace under aci/prefix (ghcr.io/apache/superset/ci/<name>).Why GHCR fixes it for everyone: public GHCR images are pulled without Docker Hub's anonymous rate limit and without any credentials. Once CI points at the mirrored copies, the consuming workflows can drop their
credentials:blocks entirely → no empty-secret parse error → forks work unchanged, and same-repo/masterstop flaking.This PR adds only the mirror workflow. The repoint of the
services.*.imagerefs is the follow-up (ready-to-go diff below), staged so CI never points at images that don't exist yet.workflow_dispatchrequires that).ghcr.io/apache/superset/*push works under ASF infra. This is the key unknown — the first run will tell us whether the repo'sGITHUB_TOKENhaspackages: writeto the apache GHCR namespace. If it doesn't, that's the blocker to resolve with ASF infra (apache/airflow et al. publish to GHCR, so it's likely fine).Follow-up repoint (PR B) — for reference, NOT in this PR
For each
services:image acrosssuperset-e2e.yml,superset-python-integrationtest.yml,superset-python-presto-hive.yml:(…same for
redis:7-alpine→ci/redis:7-alpine,mysql:8.0→ci/mysql:8.0,starburstdata/presto:350-e.6→ci/presto:350-e.6. Everycredentials:block on a mirrored service is removed.)Out of scope
The
bde2020/hive-metastore-postgresqlimage is pulled viadocker compose(not aservices:block), so it never hit the parse error. Mirroring it is a separate, optional follow-up.TESTING INSTRUCTIONS
The mirror workflow is exercised by running it from the Actions tab; its job summary lists each
docker.io/... → ghcr.io/...copy. The repoint is validated by the existing integration/E2E/Presto-Hive suites once it lands.ADDITIONAL INFORMATION
🤖 Generated with Claude Code