Skip to content

ci: mirror CI service images to GHCR (fork-safe Docker Hub pulls, groundwork)#40880

Open
rusackas wants to merge 1 commit into
masterfrom
ci/mirror-service-images-to-ghcr
Open

ci: mirror CI service images to GHCR (fork-safe Docker Hub pulls, groundwork)#40880
rusackas wants to merge 1 commit into
masterfrom
ci/mirror-service-images-to-ghcr

Conversation

@rusackas

@rusackas rusackas commented Jun 9, 2026

Copy link
Copy Markdown
Member

SUMMARY

Groundwork for fixing CI's Docker Hub service-pull flakes without breaking fork PRs — the failure mode that #40875 hit and #40879 reverts.

Root cause of the #40875 fork breakage: adding credentials: to a services: container looks safe, but on fork PRs the DOCKERHUB_USER/DOCKERHUB_TOKEN secrets are unavailable, so the templated values resolve to empty strings. GitHub Actions validates the credentials: block at job-setup time and rejects an empty username/password with a hard template error:

The template is not valid. superset-python-integrationtest.yml
(Line 55,56,69,70): Unexpected value ''

So every fork PR's Python-Integration / E2E / Presto-Hive job died at "Set up job". Empty creds do not fall back to anonymous pulls — they fail to parse. (See run 27179055813 on a fork.)

This PR (groundwork): add a scheduled/dispatchable workflow that mirrors the four Docker Hub service-container images CI relies on — postgres:17-alpine, redis:7-alpine, mysql:8.0, starburstdata/presto:350-e.6 — into this repo's GHCR namespace under a ci/ prefix (ghcr.io/apache/superset/ci/<name>).

Why GHCR fixes it for everyone: public GHCR images are pulled without Docker Hub's anonymous rate limit and without any credentials. Once CI points at the mirrored copies, the consuming workflows can drop their credentials: blocks entirely → no empty-secret parse error → forks work unchanged, and same-repo/master stop flaking.

This PR adds only the mirror workflow. The repoint of the services.*.image refs is the follow-up (ready-to-go diff below), staged so CI never points at images that don't exist yet.

⚠️ One-time bootstrap (maintainer) — do these before the repoint

  • Merge this PR so the workflow is on the default branch (workflow_dispatch requires that).
  • Run "Mirror service images to GHCR" once (Actions → Run workflow).
  • Confirm ghcr.io/apache/superset/* push works under ASF infra. This is the key unknown — the first run will tell us whether the repo's GITHUB_TOKEN has packages: write to the apache GHCR namespace. If it doesn't, that's the blocker to resolve with ASF infra (apache/airflow et al. publish to GHCR, so it's likely fine).
  • Set the four mirrored packages' visibility to public (this is what lets fork CI pull without auth).
  • Open the repoint follow-up (diff below).

Follow-up repoint (PR B) — for reference, NOT in this PR

For each services: image across superset-e2e.yml, superset-python-integrationtest.yml, superset-python-presto-hive.yml:

       postgres:
-        image: postgres:17-alpine
+        image: ghcr.io/apache/superset/ci/postgres:17-alpine
-        credentials:
-          username: ${{ secrets.DOCKERHUB_USER }}
-          password: ${{ secrets.DOCKERHUB_TOKEN }}

(…same for redis:7-alpineci/redis:7-alpine, mysql:8.0ci/mysql:8.0, starburstdata/presto:350-e.6ci/presto:350-e.6. Every credentials: block on a mirrored service is removed.)

Out of scope

The bde2020/hive-metastore-postgresql image is pulled via docker compose (not a services: block), so it never hit the parse error. Mirroring it is a separate, optional follow-up.

TESTING INSTRUCTIONS

The mirror workflow is exercised by running it from the Actions tab; its job summary lists each docker.io/... → ghcr.io/... copy. The repoint is validated by the existing integration/E2E/Presto-Hive suites once it lands.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration

🤖 Generated with Claude Code

Add a scheduled/dispatchable workflow that mirrors the Docker Hub service-
container images CI depends on (postgres, redis, mysql, presto) into this
repo's GHCR namespace under a ci/ prefix.

This is the groundwork for replacing anonymous Docker Hub service pulls
(which share the runner-IP rate limit and flake on master/same-repo PRs)
with public GHCR pulls that need no credentials — so the consuming
workflows can drop the credentials: blocks entirely and fork PRs work
unchanged. Adding credentials: directly to the service blocks (as #40875
did) breaks forks: empty secrets resolve to '' and GitHub rejects the
workflow at parse time.

The matrix mirrors only the images declared as services: containers. The
bde2020 hive-metastore image pulled via docker compose is left for a
follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added the github_actions Pull requests that update GitHub Actions code label Jun 9, 2026
@rusackas rusackas marked this pull request as ready for review June 9, 2026 04:14
@rusackas

rusackas commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Follow-up repoint is staged as a draft in #40882 — ready to rebase + un-draft once this merges, the mirror runs, and the four ci/* packages are made public.

rusackas added a commit that referenced this pull request Jun 9, 2026
Repoint the postgres/redis/mysql/presto service containers across the
E2E, Python-Integration, and Presto/Hive workflows at the GHCR mirror
(ghcr.io/apache/superset/ci/*).

Public GHCR images pull without Docker Hub's anonymous rate limit and
without any credentials, so this removes the service-pull flakes on
master/same-repo PRs while keeping fork PRs working — unlike credentials:
on the service blocks (#40875, reverted in #40879), where empty fork
secrets resolve to '' and fail the workflow at parse time.

Depends on the GHCR mirror being populated and public — see the mirror
workflow in #40880.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

github_actions Pull requests that update GitHub Actions code size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant