Skip to content

[feat gw api] Implement Gateway Status + Some bug fixes for Gateway #4189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 29, 2025

Conversation

zac-nixon
Copy link
Collaborator

@zac-nixon zac-nixon commented May 20, 2025

Description

Implements Gateway conditions for both Programmed and Accepted. I had to refactor the route util in order to propagate the exact problem back to the gateway controller in order to properly implement the various failure causes on the Accepted condition.

I had to refactor the deployer to get back the LB status to correctly implement requeuing the Gateway in order to update the programmed condition when the LB finishes provisioning.

Added support infrastructure annotations, they get propagated to the TGB created by the Gateway.

Tests

  • Verified Accepted / Program conditions:
nixozach@80a997300bd5 aws-load-balancer-controller % kubectl -n gateway-nlb get gateway test-gw-nlb -o yaml
	apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  annotations:
    foo: baz
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gateway.networking.k8s.io/v1","kind":"Gateway","metadata":{"annotations":{},"name":"test-gw-nlb","namespace":"gateway-nlb"},"spec":{"gatewayClassName":"nlb-gateway","listeners":[{"allowedRoutes":{"namespaces":{"from":"Same"}},"name":"test-2024","port":80,"protocol":"TCP"}]}}
  creationTimestamp: "2025-04-17T05:38:04Z"
  finalizers:
  - gateway.k8s.aws/nlb
  generation: 3
  name: test-gw-nlb
  namespace: gateway-nlb
  resourceVersion: "26639358"
  uid: 823cdc65-4ea0-4dfc-b8b5-48b3a7e86516
spec:
  gatewayClassName: nlb-gateway
  infrastructure:
    parametersRef:
      group: gateway.k8s.aws
      kind: LoadBalancerConfiguration
      name: local-nlb
  listeners:
  - allowedRoutes:
      namespaces:
        from: Same
    name: test-2024
    port: 80
    protocol: TCP
status:
  addresses:
  - type: Hostname
    value: k8s-gatewayn-testgwnl-3797607738-fbef7aa6d15b46aa.elb.us-east-1.amazonaws.com
  conditions:
  - lastTransitionTime: "2025-05-15T22:13:13Z"
    message: ""
    observedGeneration: 3
    reason: Accepted
    status: "True"
    type: Accepted
  - lastTransitionTime: "2025-05-16T23:03:00Z"
    message: arn:aws:elasticloadbalancing:us-east-1:565768096483:loadbalancer/net/k8s-gatewayn-testgwnl-3797607738/fbef7aa6d15b46aa
    observedGeneration: 3
    reason: Programmed
    status: "True"
    type: Programmed
  • Verified labels / annotations are applied from Gateway infrastructure parameter to TGB. (also backfilled tests here)

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 20, 2025
@zac-nixon zac-nixon force-pushed the znixon/gateway-status branch from d91565a to d2b5899 Compare May 20, 2025 23:01
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2025
@zac-nixon zac-nixon force-pushed the znixon/gateway-status branch from d2b5899 to 030a637 Compare May 27, 2025 22:06
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 27, 2025
@zac-nixon zac-nixon force-pushed the znixon/gateway-status branch 2 times, most recently from 1812d4b to 459977e Compare May 27, 2025 22:32
@@ -186,12 +196,23 @@ func (r *gatewayReconciler) reconcileHelper(ctx context.Context, req reconcile.R
mergedLbConfig, err := r.cfgResolver.getLoadBalancerConfigForGateway(ctx, r.k8sClient, gw, gwClass)

if err != nil {
statusErr := r.updateGatewayStatusFailure(ctx, gw, gwv1.GatewayReasonInvalid, err)
if statusErr != nil {
r.logger.Error(err, "Unable to update gateway status on failure to retrieve attached config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err or statusErr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be statusErr. thanks :)

@@ -31,15 +31,17 @@ func commonBackendLoader(ctx context.Context, k8sClient client.Client, typeSpeci

// We only support references of type service.
if backendRef.Kind != nil && *backendRef.Kind != "Service" {
return nil, nil
initialErrorMessage := "Backend Ref must be of kind 'Service'"
return nil, wrapError(errors.Errorf("%s", generateInvalidMessageWithRouteDetails(initialErrorMessage, routeKind, routeIdentifier)), gwv1.GatewayReasonListenersNotValid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse some of the existing error constant? e.g. gwv1.RouteReasonInvalidKind

Copy link
Collaborator Author

@zac-nixon zac-nixon May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you want that introduced? Sorry it was not clear to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking this can be the returned reason. but now realize that gateway status has a different set of error reason (gwv1.GatewayReasonListenersNotValid). in this case, we might wanna think about how to handle both route and gateway status here, cuz in the same place, route status is supposed to have gwv1.RouteReasonInvalidKind. i can take gwv1.GatewayReasonListenersNotValid and map it to invalidKind if this is the only place ListenersNotValid is thrown

@@ -72,9 +75,17 @@ func commonBackendLoader(ctx context.Context, k8sClient client.Client, typeSpeci
svc := &corev1.Service{}
err := k8sClient.Get(ctx, svcIdentifier, svc)
if err != nil {

convertToNotFoundError := client.IgnoreNotFound(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gwv1.RouteReasonBackendNotFound

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be applied to the route condition, however the Gateway condition has it's own set of conditions to set.

@zac-nixon zac-nixon force-pushed the znixon/gateway-status branch 2 times, most recently from 7c5ad7e to 283c5bf Compare May 28, 2025 19:55
@@ -86,6 +94,7 @@ func newGatewayReconciler(controllerName string, lbType elbv2model.LoadBalancerT
reconcileTracker: reconcileTracker,
cfgResolver: cfgResolver,
routeReconciler: routeReconciler,
gatewayConditionUpdater: prepareGatewayConditionUpdate,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you are calling prepareGatewayConditionUpdate directly everywhere where you are using it. Do you want to remove it from the struct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find, thanks. i've replaced the direct usage with calls from the struct.

if len(gw.Status.Addresses) != 1 ||
gw.Status.Addresses[0].Value != "" ||
gw.Status.Addresses[0].Value != lbDNS {
gwOld := gw.DeepCopy()
gw.Status.Addresses[0].Value != lbStatus.DNSName {
ipAddressType := gwv1.HostnameAddressType
gw.Status.Addresses = []gwv1.GatewayStatusAddress{
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm. Is this correct ? `gw.Status.Addresses[0].Value != "" The second condition will always be true if the address value is not empty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow this comment, sorry.

Are you saying that if the address is not empty, then it must equal the DNS name of the LB?

Copy link
Collaborator

@wweiwei-li wweiwei-li May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may need to understand the intent here better. I was thinking gw.Status.Addresses[0].Value cannot be simultaneously equal to both an empty string and lbStatus.DNSName. One of these conditions will always be true. Therefore, gw.Status.Addresses[0].Value != "" || gw.Status.Addresses[0].Value != lbStatus.DNSName will always be true.

Was your intent here to update the address only when necessary. More specifically, only when either:

  1. The address value is empty OR
  2. The address does not match the load balancer's DNS name
if len(gw.Status.Addresses) != 1 ||
    gw.Status.Addresses[0].Value == "" || gw.Status.Addresses[0].Value != lbStatus.DNSName) {
    ipAddressType := gwv1.HostnameAddressType
    gw.Status.Addresses = []gwv1.GatewayStatusAddress{
        {
            Type:  &ipAddressType,
            Value: lbStatus.DNSName,
        },
    }
    needPatch = true
}

Else, simply remove gw.Status.Addresses[0].Value != ""

if len(gw.Status.Addresses) != 1 ||
    gw.Status.Addresses[0].Value != lbStatus.DNSName {
    ipAddressType := gwv1.HostnameAddressType
    gw.Status.Addresses = []gwv1.GatewayStatusAddress{
        {
            Type:  &ipAddressType,
            Value: lbStatus.DNSName,
        },
    }
    needPatch = true
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 wow -- thanks for catching that and the explanation.

// LB Status should always be set, if it's not, we need to prevent NPE
if lbStatus == nil {
r.logger.Info("Unable to update Gateway Status due to null LB status")
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if lbStatus == nil, the updateGatewayStatusSuccess return nil (no error).
There's a potential issue with this part in reconcileUpdate. it will success reconcile. Should this not be expected ? How about return an error ?

if err = r.updateGatewayStatusSuccess(ctx, lb.Status, gw); err != nil {
    r.eventRecorder.Event(gw, corev1.EventTypeWarning, k8s.GatewayEventReasonFailedUpdateStatus, fmt.Sprintf("Failed update status due to %v", err))
    return err
}
r.eventRecorder.Event(gw, corev1.EventTypeNormal, k8s.GatewayEventReasonSuccessfullyReconciled, "Successfully reconciled")
	return nil
} 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i wasn't exactly sure what to do there. My thinking is that the lbstatus being null is a weird and unexpected condition. I'd prefer to silently fail (with a log message) rather than keep performing reconciles when the LBStatus is busted.

@zac-nixon zac-nixon force-pushed the znixon/gateway-status branch from 283c5bf to 8f7a43f Compare May 29, 2025 00:52
@zac-nixon zac-nixon force-pushed the znixon/gateway-status branch from 8f7a43f to cecfe5b Compare May 29, 2025 05:24
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wweiwei-li, zac-nixon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [wweiwei-li,zac-nixon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wweiwei-li
Copy link
Collaborator

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2025
@k8s-ci-robot k8s-ci-robot merged commit 7399a96 into kubernetes-sigs:main May 29, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants