Skip to content

New Registry url for Kubernetes (registry.k8s.io)

Arnaud M edited this page Oct 14, 2022 · 4 revisions

New Registry url for Kubernetes (registry.k8s.io)

Background

For a few years now, we have been using k8s.gcr.io in all our repositories as the default repository for downloading images from. We are now used to using the image promoter process to promote images to the official kubernetes container registry using the infrastructure (GCR staging repos etc) provided by sig-k8s-infra

Why a new URL?

So far we (the Kubernetes Project) are using GCP as our default infrastructure provider for all things like GCS, GCR, GKE based prow clusters etc. Google has graciously sponsored a lot of our infrastructure costs as well. However for about a year or so we are finding that our costs are sky-rocketing because the community usage of this infrastructure has been from other cloud providers like AWS, Azure etc. So in conjunction with CNCF staff we are trying to put together a plan to host copies of images and binaries nearer to where they are used rather than incur cross-cloud costs.

One part of this plan is to set up an opinionated OCI proxy service, that can identify where the traffic is coming from and redirect to the nearest image layer/repository. This is why we are setting up a new service using what we call an oci-proxy for everyone to use. This proxy will identify traffic coming from, for example, a certain AWS region, then will setup a HTTP redirect to a source in that AWS region. If we get traffic from GKE/GCP or we don't know where the traffic is coming from, it will still redirect to the current GCP infrastructure.

How can we help?

When Kubernetes master opens up for v1.25 development, we need to update all default urls in our code and test harness to the new registry url. As a team sig-k8s-infra is signing up to ensure that registry.k8s.io will be as robust and available as the current setup. As a backup, we will continue to run the current k8s.gcr.io as well. So do not worry about that going away. Turning on traffic to the new url will help us monitor and fix things if/when they break and we will be able to tune traffic and lower our costs of operation.

What exactly are you doing?

  • We are setting up an AWS account with an IAM role and S3 buckets in AWS regions where we see a large percentage of container image pull traffic
  • We will iterate on a sandbox url (registry-sandbox.k8s.io) for our experiments and ONLY promote the features and code changes to registry.k8s.io when we have complete confidence
  • both registry.k8s.io and registry-sandbox.k8s.io are serving traffic using oci-proxy on Google Cloud Run
  • oci-proxy will be updated to identify incoming traffic from AWS regions based on IP ranges so we can route traffic to AWS S3 buckets in that region. If a specific AWS region do not currently host AWS S3 buckets, we will redirect to the nearest region which does have AWS S3 buckets (tradeoff between storage and network costs)
  • We will bulk sync existing container image layers to these AWS S3 buckets as a starting point (from GCS/GCR)
  • We will update the image-promoter to push to these AWS S3 buckets as well in addition to the current setup
  • We will set up monitoring/reporting to check on new costs we incur on the AWS infrastructure and update what we do in GCP infrastructure as well to include the new components
  • We will have a plan in place on how we could add additional AWS regions in the future
  • We will have CI jobs that will run against registry-sandbox.k8s.io as well to monitor stability before we promote code to the production registry
  • We will automate the deployment/monitoring and testing of code landing in the oci-proxy repository

What is not in scope

  • Currently we focus on AWS only. We are getting a lot of help from AWS in terms of technical details as well as targeted infrastructure costs for standing up and running this infrastructure

What are good goals to shoot for

  • In terms of cost reduction, monitor GCP infrastructure and get to the point where we fully avoid serving large binary image layers from GCR/GCS
  • We can add other AWS regions and clouds as needed in well known documented way
  • Seamless transition for the community from the old k8s.gcr.io to registry.k8s.io with same rock solid stability as we now have with k8s.gcr.io
Clone this wiki locally