Skip to content

feat: implement AWS ECR credentials loader in Kubernetes keychains#3864

Open
Nachiket-Roy wants to merge 12 commits into
knative:mainfrom
Nachiket-Roy:feat/ecr-cred-loader
Open

feat: implement AWS ECR credentials loader in Kubernetes keychains#3864
Nachiket-Roy wants to merge 12 commits into
knative:mainfrom
Nachiket-Roy:feat/ecr-cred-loader

Conversation

@Nachiket-Roy
Copy link
Copy Markdown
Contributor

Summary

This change adds programmatic AWS Elastic Container Registry (ECR) authentication support when resolving OCI registry keychains, aligning it with the existing GCP (Google Container Registry) and Azure (ACR) patterns. Previously, the ECR credentials loader was left unimplemented (GetECRCredentialLoader returned an empty slice), which prevented automatic, programmatic ECR credential resolution when pushing/pulling images in Kubernetes keychains workflows.

What changes were made?

  1. Registry Matching (isECRRegistry): Added matching logic to detect if a registry hostname corresponds to an AWS ECR registry. This includes public.ecr.aws as well as private registry formats across various AWS partitions.
  2. ECR Helper Integration (GetECRCredentialLoader): Leveraged the official amazon-ecr-credential-helper/ecr-login library programmatically, wrapped via authn.NewKeychainFromHelper().
  3. Graceful Fallback: Silenced standard library logging with io.Discard and ensured that if no ambient credentials exist, creds.ErrCredentialsNotFound is returned, letting subsequent credential loaders in the chain attempt authentication.

Testing

  • Unit test was added (pkg/k8s/keychains_test.go)
  • Test has been verified locally

Closes : #3863

@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Nachiket-Roy
Once this PR has been reviewed and has the lgtm label, please assign lkingland for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 28, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: Nachiket-Roy / name: Nachiket Roy (9df4cbc)

@knative-prow knative-prow Bot requested review from dsimansk and jrangelramos May 28, 2026 20:31
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 28, 2026

Welcome @Nachiket-Roy! It looks like this is your first PR to knative/func 🎉

@knative-prow knative-prow Bot added size/L 🤖 PR changes 100-499 lines, ignoring generated files. needs-ok-to-test 🤖 Needs an org member to approve testing labels May 28, 2026
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 28, 2026

Hi @Nachiket-Roy. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gauron99 gauron99 requested review from Copilot and removed request for dsimansk and jrangelramos May 29, 2026 06:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds programmatic AWS ECR authentication support to the Kubernetes keychain credential loader pipeline, aligning it with the existing GCR and ACR loaders so image push/pull can automatically resolve ECR credentials.

Changes:

  • Add ECR registry hostname detection (isECRRegistry) and wire an ECR credential loader using amazon-ecr-credential-helper via authn.NewKeychainFromHelper.
  • Return creds.ErrCredentialsNotFound when the loader should not apply (non‑ECR registries) to allow other loaders to proceed.
  • Add unit tests for ECR registry detection and basic loader behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pkg/k8s/keychains.go Implements ECR registry detection and an ECR credential loader using the AWS ECR credential helper.
pkg/k8s/keychains_test.go Adds tests for ECR registry detection and ECR loader fallback behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/k8s/keychains.go
Comment thread pkg/k8s/keychains_test.go
Comment thread pkg/k8s/keychains_test.go Outdated
@Nachiket-Roy Nachiket-Roy marked this pull request as draft May 29, 2026 07:03
@knative-prow knative-prow Bot added the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label May 29, 2026
@Nachiket-Roy Nachiket-Roy marked this pull request as ready for review May 29, 2026 11:18
@knative-prow knative-prow Bot removed the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label May 29, 2026
@knative-prow knative-prow Bot requested review from dsimansk and jrangelramos May 29, 2026 11:18
@Nachiket-Roy Nachiket-Roy requested a review from Copilot May 29, 2026 11:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

Comment thread pkg/k8s/keychains_test.go Outdated
Comment thread pkg/k8s/keychains.go Outdated
Comment thread pkg/k8s/keychains.go Outdated
Comment thread pkg/k8s/keychains.go Outdated
Comment thread pkg/k8s/keychains.go
@Nachiket-Roy Nachiket-Roy marked this pull request as draft May 29, 2026 11:24
@knative-prow knative-prow Bot added the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label May 29, 2026
@Nachiket-Roy Nachiket-Roy marked this pull request as ready for review May 29, 2026 11:30
@knative-prow knative-prow Bot removed the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label May 29, 2026
@Nachiket-Roy Nachiket-Roy requested a review from Copilot May 29, 2026 11:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread pkg/k8s/keychains.go Outdated
Comment thread pkg/k8s/keychains_test.go Outdated
@Nachiket-Roy Nachiket-Roy requested a review from Copilot May 29, 2026 11:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread pkg/k8s/keychains.go
Comment thread pkg/k8s/keychains.go
Comment thread pkg/k8s/keychains.go
@matejvasek
Copy link
Copy Markdown
Contributor

/ok-to-test

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread pkg/k8s/keychains.go Outdated
Comment thread pkg/k8s/keychains.go Outdated
Comment thread pkg/k8s/keychains_test.go
@Nachiket-Roy Nachiket-Roy requested a review from Copilot June 3, 2026 10:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread pkg/k8s/keychains.go Outdated
Comment thread pkg/k8s/keychains.go Outdated
Comment thread pkg/k8s/keychains.go
Comment thread pkg/k8s/keychains_test.go
@Nachiket-Roy Nachiket-Roy marked this pull request as draft June 3, 2026 10:39
@knative-prow knative-prow Bot added the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label Jun 3, 2026
@matejvasek matejvasek marked this pull request as ready for review June 3, 2026 10:59
@knative-prow knative-prow Bot removed the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label Jun 3, 2026
@matejvasek
Copy link
Copy Markdown
Contributor

The semaphore introduces a cross-registry blocking problem. Consider:

  1. Goroutine A starts for registry X, acquires semaphore (1/2 used)
  2. Resolve() hangs on IMDS, 5s timeout fires, function returns error to caller
  3. Goroutine A is still running, holding its semaphore slot (1/2 used)
  4. Goroutine B starts for registry Y (different registry), acquires semaphore (2/2 used)
  5. Same thing — timeout, goroutine still running (2/2 used)
  6. Any subsequent ECR lookup for any registry blocks on Acquire, then fails with "queue full"

After just 2 timeouts — potentially for completely unrelated registries — all ECR credential lookups are dead until one of the leaked goroutines' Resolve() calls eventually return (which depends on the AWS SDK's internal HTTP timeout, typically 30s+).

The semaphore doesn't prevent the goroutine leak (the leaked goroutine holds the slot until Resolve completes), it just caps the damage at 2 leaked goroutines while introducing a new failure mode where unrelated registries block each other.

I'd suggest dropping it. The goroutine leak is inherent to Resolve() / Get() not accepting a context.Context — it can't be fixed at this layer. The TTL cache and not caching timeouts (both good changes in this revision) are sufficient mitigation: a leaked goroutine runs to completion eventually, and the next lookup retries cleanly.

@matejvasek
Copy link
Copy Markdown
Contributor

The caching here is more complex than it needs to be. The ECR helper already has its own file-based credential cache (cache.BuildCredentialsCache) that stores successful auth tokens with expiration and even falls back to expired tokens when the API fails. So caching successes in the loader is redundant — and the current code already avoids that, which is good.

The only value of caching at this layer is avoiding repeated 5-second timeout hangs when credentials aren't configured. For that, a TTL'd ecrCacheEntry struct is overkill. A simple set of failed registries would do:

var failedRegistries sync.Map

// early return
if _, ok := failedRegistries.Load(registry); ok {
    return oci.Credentials{}, creds.ErrCredentialsNotFound
}

// after the lookup, only cache "no credentials" errors (not timeouts)
if errors.Is(resErr, creds.ErrCredentialsNotFound) {
    failedRegistries.Store(registry, struct{}{})
}

No TTL is needed — if AWS credentials aren't configured, they won't appear mid-process. The closure scoping already ensures the state is fresh per GetECRCredentialLoader() call, so a process restart clears it.

Alternatively, drop the cache entirely. The 5-second timeout is bounded, the helper caches its own successes, and isECRRegistry filters out non-ECR registries. The worst case is a few 5-second waits during a single push when credentials aren't configured — which is arguably correct behavior.

@matejvasek
Copy link
Copy Markdown
Contributor

isECRRegistry is duplicating what the helper already does internally. ECRHelper.Get() calls api.ExtractRegistry() as its very first operation — a regex match, no network calls, no AWS SDK initialization — and returns credentials.NewErrCredentialsNotFound() for non-ECR registries.

Removing isECRRegistry and letting the helper handle registry detection would:

  • Fix the missing partition coverage (.amazonaws.eu, .on.aws, .cloud.adc-e.uk, .csp.hci.ic.gov, etc.) that the helper's regex already supports
  • Fix the missing ecr-public.aws.com dual-stack hostname
  • Keep detection automatically in sync with the helper across version upgrades
  • Remove ~15 lines of code and the associated tests

The only argument for keeping it is avoiding the goroutine spawn for non-ECR registries, but that's a micro-optimization around the goroutine pattern itself.

@Nachiket-Roy Nachiket-Roy marked this pull request as draft June 3, 2026 12:39
@knative-prow knative-prow Bot added the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label Jun 3, 2026
@matejvasek
Copy link
Copy Markdown
Contributor

My PR in aws ecr was merged.

@Nachiket-Roy
Copy link
Copy Markdown
Contributor Author

@matejvasek will update my pr in few hours.

@Nachiket-Roy Nachiket-Roy force-pushed the feat/ecr-cred-loader branch from 426d4a5 to 463f3c7 Compare June 6, 2026 16:31
@Nachiket-Roy Nachiket-Roy marked this pull request as ready for review June 6, 2026 16:32
@knative-prow knative-prow Bot removed the do-not-merge/work-in-progress 🤖 PR should not merge because it is a work in progress. label Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test 🤖 Non-member PR verified by an org member that is safe to test. size/L 🤖 PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature : Implement AWS ECR credentials loader in Kubernetes keychains

3 participants