diff --git a/projects/k8gb/tech-review/2026-01-30.md b/projects/k8gb/tech-review/2026-01-30.md new file mode 100644 index 000000000..f3113c4e0 --- /dev/null +++ b/projects/k8gb/tech-review/2026-01-30.md @@ -0,0 +1,231 @@ +# General Technical Review - [k8gb] / [Incubation] + +* **Project:** k8gb +* **Project Version:** v0.17.0/Sandbox +* **Website:** +* **Date Updated:** 2026-01-30 +* **Template Version:** v1.0 +* **Reviewer:** Kashif Khan, Co-chair, TAG Infrastructure +* **Description:** + +A Global Service Load Balancing solution for kubernetes + +## Overall Assessment + +This technical review finds k8gb to be overall satisfactory for progression from Sandbox to Incubation status. The project demonstrates strong technical maturity with a well-architected, Kubernetes-native solution that's proven in production by multiple adopters. Security practices are adequate. The documentation is thorough with clear tutorials and integration guides. The multi-cluster distributed architecture eliminates single points of failure and follows cloud-native principles effectively. + +While some areas could be strengthened—such as formalizing the roadmap planning process and publishing load testing results—these are typical areas for growth toward eventual graduation rather than blockers for incubation. The project shows healthy community engagement, regular quarterly releases, and responsive security maintenance. The technical implementation is sound, the use case is well-defined, and the project fills a genuine gap in the Kubernetes ecosystem for DNS-based global load balancing. + +## Day 0 - Planning Phase + +### Scope + + * Describe the roadmap process, how scope is determined for mid to long term features, as well as how the roadmap maps back to current contributions and maintainer ladder? + * The project maintains a [roadmap](https://github.com/k8gb-io/k8gb/blob/master/ROADMAP.md) and tracks work through a [GitHub project board](https://github.com/orgs/k8gb-io/projects/2/views/2), though the formal process for roadmap planning isn't explicitly documented. + * Describe the target persona or user(s) for the project? + * The project targets platform engineers and SREs managing multi-cluster Kubernetes deployments that need global load balancing ([intro documentation](https://www.k8gb.io/intro/)). + * Explain the primary use case for the project. What additional use cases are supported by the project? + * Multi-cluster load balancing with health-based failover is the core use case. The project also supports geo-routing and split-brain prevention across clusters. [Strategy documentation](https://www.k8gb.io/strategy/) for implementation details. + * Explain which use cases have been identified as unsupported by the project. + * While k8gb can work in [single-cluster mode](https://www.k8gb.io/intro/?h=single#1-basic-single-cluster) for testing, it's not designed for non-DNS based routing or application-layer load balancing. The primary value is in multi-cluster scenarios. + * Describe the intended types of organizations who would benefit from adopting this project. (i.e. financial services, any software manufacturer, organizations providing platform engineering services)? + * Any organization running multi-region or multi-cloud Kubernetes that needs high availability and disaster recovery capabilities. The [ADOPTERS list](https://github.com/k8gb-io/k8gb/blob/master/ADOPTERS.md) shows a variety of companies already using k8gb in production. + * Please describe any completed end user research and link to any reports. + * I did not find any research reports. + +### Usability + + * How should the target personas interact with your project? + * Users install the project via the [Helm chart](https://github.com/k8gb-io/k8gb/tree/master/chart/k8gb), configure their DNS provider credentials, and declare `Gslb` resources to define load balancing behavior for their services. + * Describe the user experience (UX) and user interface (UI) of the project. + * The typical workflow involves: installing the Helm chart, configuring DNS provider credentials and cluster geo tags, creating a `Gslb` resource pointing to an Ingress or Service, and monitoring via standard Kubernetes tools. + * The learning curve is eased by comprehensive [tutorials](https://www.k8gb.io/tutorials/) and [reference documentation](https://www.k8gb.io/resource_ref/). + * Describe how this project integrates with other projects in a production environment. + * k8gb plays well with any Kubernetes ingress controller, external DNS providers like Route53 or GCP DNS, and standard monitoring tools. The [component documentation](https://www.k8gb.io/components/) explains the integration points. + +### Design + + * Explain the design principles and best practices the project is following. + * The architecture is built around being Kubernetes-native with DNS-based load balancing and avoiding any single points of failure. The [project website](https://www.k8gb.io/) explains the design philosophy. + * Outline or link to the project's architecture requirements? Describe how they differ for Proof of Concept, Development, Test and Production environments, as applicable. + * For PoC/Dev work, one can use 2+ local k3d clusters. Production deployments need 2+ geo-dispersed Kubernetes clusters with an external DNS provider configured. The [intro](https://www.k8gb.io/intro/) walks through both scenarios. + * Define any specific service dependencies the project relies on in the cluster. + * CoreDNS (bundled with k8gb), an ingress controller, and an external DNS provider. Details in the [components guide](https://www.k8gb.io/components/). + * Describe how the project implements Identity and Access Management. + * Uses a ServiceAccount with RBAC, runs as non-root, and mounts a read-only filesystem. Creation of `Gslb` resources is restricted to users with appropriate RBAC permissions. + * Describe how the project has addressed sovereignty. + * Each cluster operates independently without a centralized control plane—coordination happens purely through DNS. + * Describe any compliance requirements addressed by the project. + * The project has earned an [OpenSSF Best Practices badge](https://www.bestpractices.dev/en/projects/4866), uses DCO for contributions, completed a TAG Security [self-assessment](https://github.com/cncf/toc/issues/1472#security), and also has SLSA with signed releases and SBOM. + * Describe the project’s High Availability requirements. + * The multi-cluster distributed architecture eliminates single points of failure. Health-based failover is built in ([strategy docs](https://www.k8gb.io/strategy/)). + * Describe the project’s resource requirements, including CPU, Network and Memory. + * Pretty lightweight: operator needs 100m-500m CPU and 32-128Mi memory; CoreDNS uses 100m CPU and 128Mi memory. See the [Helm values](https://github.com/k8gb-io/k8gb/blob/master/chart/k8gb/values.yaml) for defaults. + * Describe the project’s storage requirements, including its use of ephemeral and/or persistent storage. + * No persistent storage needed—just ephemeral volumes. All state lives in Kubernetes CRDs. + * Please outline the project’s API Design: + * Describe the project's API topology and conventions + * The main API is the [Gslb CRD](https://www.k8gb.io/resource_ref/) (k8gb.absa.oss/v1beta1), which follows standard Kubernetes conventions. Full API reference at [doc.crds.dev](https://doc.crds.dev/github.com/k8gb-io/k8gb). + * Describe the project defaults + * Out of the box: roundRobin strategy, 30 second TTL, ClusterIP service type ([stategy.type](https://doc.crds.dev/github.com/k8gb-io/k8gb/k8gb.absa.oss/Gslb/v1beta1@v0.17.0#spec-strategy-type)). + * Outline any additional configurations from default to make reasonable use of the project + * You'll typically configure: resourceRef (what to load balance), strategy type, DNS zones, and cluster geo tags. The [tutorials](https://www.k8gb.io/tutorials/) walk through common setups. + * Describe any new or changed API types and calls \- including to cloud providers \- that will result from this project being enabled and used + * Adds Gslb and DNSEndpoint CRDs to your cluster. Doesn't break or modify the Kubernetes API itself. ([API docs](https://doc.crds.dev/github.com/k8gb-io/k8gb)) + * The operator makes API calls to external DNS providers such as Route53, GCP Cloud DNS, and others to manage DNS records. + * Describe compatibility of any new or changed APIs with API servers, including the Kubernetes API server + * Works with Kubernetes 1.21+ and any compliant ingress controller. Version details in the [documentation](https://www.k8gb.io/intro/). + * Describe versioning of any new or changed APIs, including how breaking changes are handled + * Currently v1beta1. The project follows Kubernetes versioning practices and SemVer, maintaining backward compatibility. Check the [CHANGELOG](https://github.com/k8gb-io/k8gb/blob/master/CHANGELOG.md) for version history. + * Note: Work is underway to adopt a new API group name, see [k8gb-io/k8gb#2180](https://github.com/k8gb-io/k8gb/issues/2180). + * Describe the project’s release processes, including major, minor and patch releases. + * Releases happen quarterly following SemVer. The process is [automated](https://github.com/k8gb-io/k8gb/blob/master/CONTRIBUTING.md#release-process) with SLSA provenance, cosign signatures, and SBOM generation. + + * Describe how the project is installed and initialized, e.g. a minimal install with a few lines of code or does it require more complex integration and configuration? + * Installation is straightforward via Helm chart or kubectl apply, but you'll need to configure DNS zones and set up your external DNS provider first. The [tutorials](https://www.k8gb.io/tutorials/) and [quick start](https://github.com/k8gb-io/k8gb?tab=readme-ov-file#quick-start) guide you through it. + * How does an adopter test and validate the installation? + * Create a test Gslb resource, verify DNS resolution works, check CoreDNS and operator logs. There are also [terratest integration tests](https://github.com/k8gb-io/k8gb/tree/master/terratest) you can run, and a [local playground](https://www.k8gb.io/local/) setup for validation. + +### Security + + * Please provide a link to the project's cloud native [security self assessment](https://tag-security.cncf.io/community/assessments/). + * https://github.com/k8gb-io/k8gb/blob/master/self-assessment.md and https://github.com/cncf/tag-security/pull/1446 + * Please review the [Cloud Native Security Tenets](https://github.com/cncf/tag-security/blob/main/community/resources/security-whitepaper/secure-defaults-cloud-native-8.md) from TAG Security. + * How are you satisfying the tenets of cloud native security projects? + * Runs as non-root, uses read-only filesystem, applies minimal RBAC, signs releases, and provides SBOM. The [self-assessment](https://github.com/k8gb-io/k8gb/blob/master/self-assessment.md) covers all the details. + * Describe how each of the cloud native principles apply to your project. + * Distributed by design, immutable infrastructure, declarative APIs. See the [actors section](https://github.com/k8gb-io/k8gb/blob/master/self-assessment.md#actors) of the self-assessment. + * How do you recommend users alter security defaults in order to "loosen" the security of the project? Please link to any documentation the project has written concerning these use cases. + * I don't recommend loosening security, but if you really need to, you can modify the security context through [Helm values](https://github.com/k8gb-io/k8gb/blob/master/chart/k8gb/values.yaml). + * Security Hygiene + * Please describe the frameworks, practices and procedures the project uses to maintain the basic health and security of the project. + * 2FA is required for maintainers, DCO for all commits, documented [security reporting process](https://github.com/k8gb-io/k8gb/blob/master/SECURITY.md), regular dependency updates, and an OpenSSF badge. + * Describe how the project has evaluated which features will be a security risk to users if they are not maintained by the project? + * The main security-sensitive areas are external DNS credentials, CoreDNS exposure to the network, and RBAC permissions—all called out in the [security functions section](https://github.com/k8gb-io/k8gb/blob/master/self-assessment.md#security-functions-and-features). + * Cloud Native Threat Modeling + * Explain the least minimal privileges required by the project and reasons for additional privileges. + * Needs read/write on Gslb and DNSEndpoint CRDs, plus read access to Ingress, Service, and ConfigMap resources. RBAC permissions are defined in the [Helm chart](https://github.com/k8gb-io/k8gb/tree/master/chart/k8gb). + * Describe how the project is handling certificate rotation and mitigates any issues with certificates. + * The operator itself doesn't require TLS certificates. External DNS provider credentials are user-managed. + * Describe how the project is following and implementing [secure software supply chain best practices](https://project.linuxfoundation.org/hubfs/CNCF\_SSCP\_v1.pdf) + * SLSA Level 3 provenance, cosign signatures, SBOM in every release, reproducible builds. Details in [CONTRIBUTING.md](https://github.com/k8gb-io/k8gb/blob/master/CONTRIBUTING.md#signed-releases). + +## Day 1 \- Installation and Deployment Phase + +### Project Installation and Configuration + + * Describe what project installation and configuration look like. + * Install via Helm with your DNS zone and provider configured, then deploy Gslb CRDs for the applications you want to globally load balance. [Tutorials here](https://www.k8gb.io/tutorials/) and a [local setup guide](https://www.k8gb.io/local/) for testing. + +### Project Enablement and Rollback + + * How can this project be enabled or disabled in a live cluster? Please describe any downtime required of the control plane or nodes. + * Simple helm install/uninstall. No control plane downtime—only DNS resolution for Gslb-managed services is affected. + * Describe how enabling the project changes any default behavior of the cluster or running workloads. + * Nothing changes until you create a Gslb resource. Once enabled, it adds a CoreDNS deployment for DNS resolution. + * Describe how the project tests enablement and disablement. + * Through [terratest integration tests](https://github.com/k8gb-io/k8gb/tree/master/terratest) and a local playground setup. + * How does the project clean up any resources created, including CRDs? + * Helm uninstall removes the operator and CRDs; DNSEndpoint resources get cleaned up automatically. + +### Rollout, Upgrade and Rollback Planning + + * How does the project intend to provide and maintain compatibility with infrastructure and orchestration management tools like Kubernetes and with what frequency? + * Tested with Kubernetes 1.21 and newer, with [quarterly releases](https://github.com/k8gb-io/k8gb/releases) to keep up with the ecosystem. + * Describe how the project handles rollback procedures. + * Helm rollback is supported. There are [documented rollback procedures](https://www.k8gb.io/rollback_procedures/) to follow. + * How can a rollout or rollback fail? Describe any impact to already running workloads. + * DNS resolution might temporarily fail, but your workloads keep running—only the DNS routing is affected. + * Describe any specific metrics that should inform a rollback. + * Watch for DNS query errors, reconciliation errors, or CoreDNS failures in the [metrics](https://www.k8gb.io/metrics/). + * Explain how upgrades and rollbacks were tested and how the upgrade->downgrade->upgrade path was tested. + * Integration tests cover upgrades. Downgrade paths are tested manually (see [terratest suite](https://github.com/k8gb-io/k8gb/tree/master/terratest)). + * Explain how the project informs users of deprecations and removals of features and APIs. + * Through the [CHANGELOG](https://github.com/k8gb-io/k8gb/blob/master/CHANGELOG.md) and GitHub releases. + * Explain how the project permits utilization of alpha and beta capabilities as part of a rollout. + * The API is currently v1beta1. New features get flagged in the documentation and CHANGELOG. + +## Day 2 \- Day-to-Day Operations Phase + +### Scalability/Reliability + + * Describe how the project increases the size or count of existing API objects. + * Scales with the number of Gslb resources you create—each one generates corresponding DNSEndpoint resources. + * Describe how the project defines Service Level Objectives (SLOs) and Service Level Indicators (SLIs). + * Exposes [metrics](https://www.k8gb.io/metrics/) for reconciliation time, DNS queries, and errors that you can use to define your own SLOs. + * Describe any operations that will increase in time covered by existing SLIs/SLOs. + * More Gslb resources, higher DNS query volume, or adding clusters to the topology will all increase load. + * Describe the increase in resource usage in any components as a result of enabling this project, to include CPU, Memory, Storage, Throughput. + * Pretty minimal footprint: around 100m CPU and 160Mi memory total for the operator plus CoreDNS for the deployment itself. + * Describe which conditions enabling / using this project would result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.) + * Extremely high DNS query load to CoreDNS could exhaust network connections, but that's unlikely in typical deployments. + * Describe the load testing that has been performed on the project and the results. + * Integration tests exist, and multiple [adopters](https://github.com/k8gb-io/k8gb/blob/master/ADOPTERS.md) run it in production. No formal load testing results are published. + * Describe the recommended limits of users, requests, system resources, etc. and how they were obtained. + * No hard limits are documented—it scales with your cluster resources. The [ADOPTERS list](https://github.com/k8gb-io/k8gb/blob/master/ADOPTERS.md) shows production usage patterns. + * Describe which resilience pattern the project uses and how, including the circuit breaker pattern. + * Uses a health-based failover pattern—unhealthy endpoints automatically drop out of DNS responses. See the [strategy docs](https://www.k8gb.io/strategy/). + +### Observability Requirements + + * Describe the signals the project is using or producing, including logs, metrics, profiles and traces. Please include supported formats, recommended configurations and data storage. + * Produces [Prometheus metrics](https://www.k8gb.io/metrics/), structured logs (JSON or simple format), and optionally [OpenTelemetry traces](https://www.k8gb.io/traces/). + * Describe how the project captures audit logging. + * Relies on Kubernetes audit logging for API operations. The operator logs all reconciliation actions. + * Describe any dashboards the project uses or implements as well as any dashboard requirements. + * There's a Grafana dashboard available. Works with any Prometheus-compatible setup. ([metrics docs](https://www.k8gb.io/metrics/)) + * Describe how the project surfaces project resource requirements for adopters to monitor cloud and infrastructure costs, e.g. FinOps + * Resource requests and limits show up in metrics. External DNS API calls will appear in your provider's billing. + * Which parameters is the project covering to ensure the health of the application/service and its workloads? + * Pod liveness and readiness probes, DNS query success rate, reconciliation errors. All visible in [metrics](https://www.k8gb.io/metrics/). + * How can an operator determine if the project is in use by workloads? + * Look for Gslb resources, DNSEndpoint resources, and the CoreDNS deployment in your cluster. + * How can someone using this project know that it is working for their instance? + * DNS queries should return the expected IPs, metrics should show successful reconciliation, and you shouldn't see errors in the logs. + * Describe the SLOs (Service Level Objectives) for this project. + * Not formally defined by the project. + * What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + * Reconciliation success rate, DNS query success rate, CoreDNS uptime— again all available through [metrics](https://www.k8gb.io/metrics/). + +### Dependencies + + * Describe the specific running services the project depends on in the cluster. + * External DNS provider (like Route53 or GCP DNS), Kubernetes API server, and CoreDNS. [Component documentation](https://www.k8gb.io/components/) explains the dependencies. + * Describe the project's dependency lifecycle policy. + * Dependencies get updated regularly via Dependabot. Security vulnerabilities are addressed in patch releases. + * How does the project incorporate and consider source composition analysis as part of its development and security hygiene? Describe how this source composition analysis (SCA) is tracked. + * Uses CodeQL, Dependabot, Renovate and GitHub security scanning. Every release includes an SBOM. + * Describe how the project implements changes based on source composition analysis (SCA) and the timescale. + * Critical vulnerabilities get fixed within days. Other issues are addressed in the next regular release cycle. + +### Troubleshooting + + * How does this project recover if a key component or feature becomes unavailable? e.g Kubernetes API server, etcd, database, leader node, etc. + * The operator will reconcile once the API server comes back. Meanwhile, CoreDNS serves cached DNS records during failures. [Strategy documentation](https://www.k8gb.io/strategy/) explains failover behavior. + * Describe the known failure modes. + * DNS provider API failures, CoreDNS becoming unavailable, network partitions between clusters. Common issues are documented in the [project documentation](https://www.k8gb.io/). + +### Compliance + + * What steps does the project take to ensure that all third-party code and components have correct and complete attribution and license notices? + * Maintains an Apache 2.0 LICENSE, generates SBOMs for all releases, and tracks third-party license compliance using automated scanning (FOSSA). + * Describe how the project ensures alignment with CNCF [recommendations](https://github.com/cncf/foundation/blob/main/policies-guidance/recommendations-for-attribution.md) for attribution notices. + * Follows CNCF guidance by providing a project LICENSE, SBOMs for all released artifacts, and continuous license compliance tracking via FOSSA. See the [LICENSE file](https://github.com/k8gb-io/k8gb/blob/master/LICENSE). + + * How are notices managed for third-party code incorporated directly into the project's source files? + * Apache 2.0 headers go in source files. Vendored code keeps its original licenses. + * How are notices retained for unmodified third-party components included within the project's repository? + * The project does not vendor third-party source code; dependencies are consumed via Go modules, with license compliance tracked through FOSSA, so no aggregated notice (NOTICE file) is needed. + * How are notices for all dependencies obtained at build time included in the project's distributed build artifacts (e.g. compiled binaries, container images)? + * SBOM attestation is attached to container images via cosign. Details in [SECURITY.md](https://github.com/k8gb-io/k8gb/blob/master/SECURITY.md). + +### Security + + * Security Hygiene + * How is the project executing access control? + * Uses RBAC for Kubernetes resources with least-privilege service accounts. The [Helm chart](https://github.com/k8gb-io/k8gb/tree/master/chart/k8gb) defines all the permissions. + * Cloud Native Threat Modeling + * How does the project ensure its security reporting and response team is representative of its community diversity (organizational and individual)? + * Maintainers come from multiple organizations and handle security issues. The team is open to community security experts joining. + * How does the project invite and rotate security reporting team members? + * There's no formal rotation process. The team expands through the maintainer promotion process based on contributions. \ No newline at end of file