Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: continue using and drain serving endpointslices during termination #4946

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fogninid
Copy link

@fogninid fogninid commented Dec 18, 2024

Continue routing to endpoints while their graceful termination is in progress

What this PR does / why we need it:
use serving condition that was defined in kubernetes/kubernetes#92968

Release Notes: Yes

@fogninid fogninid requested a review from a team as a code owner December 18, 2024 19:56
Copy link
Member

@zhaohuabing zhaohuabing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@arkodg
Copy link
Contributor

arkodg commented Dec 31, 2024

@fogninid why should traffic be routed to a pod during graceful termination? Imo with the right rolling update strategy in place, traffic should be shifted to the newer pod

@fogninid
Copy link
Author

fogninid commented Jan 7, 2025

@fogninid why should traffic be routed to a pod during graceful termination? Imo with the right rolling update strategy in place, traffic should be shifted to the newer pod

@arkodg the {Serving=true,Terminating=true} state was defined by kubernetes from the PR I linked in the cover message.
My understanding is that the state is necessary so that backend-applications can communicate that traffic can still be routed to terminating pods, instead of dropping it and causing disruptions.
The clearest example is that of a rolling update of a statefulset of one replica: the single terminating pod can now have time to gracefully stop while still handling all incoming requests.

My expectations are for envoy-gateway to behave in the same way as other networking solutions. Most explicit mentions I found about this logic are close to what cilium is doing here.

envoy-gateway could do similar, using all endpoints with state {Serving=true,Terminating=false} if available, and falling back to include those {Serving=true,Terminating=true} if none would be found.

Would this be the right place to introduce this logic?

@arkodg
Copy link
Contributor

arkodg commented Jan 7, 2025

thanks for sharing the link @fogninid, found this blog https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/#traffic-loss-from-load-balancers-during-rolling-updates which highlights how it can be used by Ingress Controllers

Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints.

This would translate to, if an endpoint is not Ready, but Serving, we can set it the lb endpoint status to Draining to not disrupt the existing connections right away, but newer connections would go to other endpoints

wdyt @envoyproxy/gateway-maintainers / @envoyproxy/gateway-reviewers

@fogninid
Copy link
Author

fogninid commented Jan 8, 2025

thanks for sharing the link @fogninid, found this blog https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/#traffic-loss-from-load-balancers-during-rolling-updates which highlights how it can be used by Ingress Controllers

Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints.

This would translate to, if an endpoint is not Ready, but Serving, we can set it the lb endpoint status to Draining to not disrupt the existing connections right away, but newer connections would go to other endpoints

wdyt @envoyproxy/gateway-maintainers / @envoyproxy/gateway-reviewers

The description in that link for the behavior of a load balancer is too simplified for my taste. There is no 1:1 mapping from a single field of the EndpointConditions to the expected lb state of the matching endpoint, but rather the full list of endpoints and the combination of Serving,Terminating should map to a set of lb endpoints with appropriate states.

Again my main reasoning is that new connections should not be dropped if any Serving=true endpoint is currently available (irrespective its termination), this is the same as described for kube-proxy in this quote from that link:

Starting in Kubernetes v1.26, kube-proxy enables the ProxyTerminatingEndpoints feature by default, which adds automatic failover and routing to terminating endpoints in scenarios where the traffic would otherwise be dropped. More specifically, when there is a rolling update and a Node only contains terminating Pods, kube-proxy will route traffic to the terminating Pods based on their readiness. In addition, kube-proxy will actively fail the health check NodePort if there are only terminating Pods available. By doing so, kube-proxy alerts the external load balancer that new connections should not be sent to that Node but will gracefully handle requests for existing connections.

Based on the enovy config link you provided, I believe the k8s endpoints => envoy config mapping should look something like this:

k8s endpoints envoy config
[{Address1,Serving=true,Terminating=false}] [{Address1,HealthStatus=HEALTHY}]
[{Address1,Serving=false,Terminating=false}] [{Address1,HealthStatus=UNHEALTHY}]
[{Address1,Serving=false,Terminating=true}] [{Address1,HealthStatus=UNHEALTHY}]
[{Address1,Serving=true,Terminating=true}] [{Address1,HealthStatus=HEALTHY}] Endpoint1 enters termination phase (preStop running)
[{Address1,Serving=true,Terminating=true},{Address2,Serving=false,Terminating=false}] [{Address1,HealthStatus=HEALTHY},{Address2,HealthStatus=UNHEALTHY}] Endpoint2 appears, but is not ready yet => allow full use of the terminating endpoint
[{Address1,Serving=true,Terminating=true},{Address2,Serving=true,Terminating=false}] [{Address1,HealthStatus=DRAINING},{Address2,HealthStatus=HEALTHY}] Endpoint2 becomes ready => drain the terminating endpoint
[{Address1,Serving=false,Terminating=true},{Address2,Serving=true,Terminating=false}] [{Address1,HealthStatus=DRAINING},{Address2,HealthStatus=HEALTHY}] Endpoint1 enters final termination phase (preStop done)

@arkodg
Copy link
Contributor

arkodg commented Jan 10, 2025

thanks for creating the detailed table, it looks good except for this row

[{Address1,Serving=false,Terminating=true} ,{Address2,Serving=true,Terminating=false}]
which maps to [{Address1,HealthStatus=DRAINING},{Address2,HealthStatus=HEALTHY}]

if ready is false and so is serving, we should not be including this endpoint at all, and the translator doesnt rely on prev state to decipher the state transition change

draining is an important distinction and if computed for the endpoints using draining := !ready && serving, we can still retain these endpoints (which is not done today) and maintain existing connections, and also send new requests to these draining endpoints in case there are no ready endpoints ( the default panic theshold is 50 in envoy https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/panic_threshold which envoy gateway doesn't change, so once health drops below %50 traffic is sent to all endpoints - healthy and non healthy, draining is treated as non healthy )

to implement this

@fogninid
Copy link
Author

[..]
to implement this

* we'd need to introduce a draining field in the IR https://github.com/envoyproxy/gateway/blob/8b4cf17ef7d8ef860a361d49014ffcaf003154bb/internal/ir/xds.go#L1327

* set this in https://github.com/envoyproxy/gateway/blob/8b4cf17ef7d8ef860a361d49014ffcaf003154bb/internal/gatewayapi/route.go#L1623

* use the value to set the status to `Draining` in xds https://github.com/envoyproxy/gateway/blob/8b4cf17ef7d8ef860a361d49014ffcaf003154bb/internal/xds/translator/cluster.go#L418

thanks for the hints, I will give a shot at implementing that way.
One question though: I would have put the complete HealthStatus in the IR (as a type int32 matching those relevant from envoy config). Or do you have a specific reason to prefer only a Draining boolean there?

@arkodg
Copy link
Contributor

arkodg commented Jan 10, 2025

thanks @fogninid, prefer a bool to reduce memory footprint (at scale the order of endpoints is in the thousands, and the IR is cached in memory)

@fogninid fogninid changed the title feat: continue routing to serving endpointslices during termination feat: continue using and drain serving endpointslices during termination Jan 10, 2025
Copy link

codecov bot commented Jan 11, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 5 lines in your changes missing coverage. Please review.

Project coverage is 66.66%. Comparing base (271a697) to head (1848fb3).

Files with missing lines Patch % Lines
internal/ir/xds.go 0.00% 4 Missing ⚠️
internal/gatewayapi/route.go 94.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4946      +/-   ##
==========================================
- Coverage   66.73%   66.66%   -0.07%     
==========================================
  Files         210      210              
  Lines       32609    32615       +6     
==========================================
- Hits        21761    21743      -18     
- Misses       9537     9554      +17     
- Partials     1311     1318       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fogninid
Copy link
Author

@arkodg can you have a look at the latest changes? I think they now match your suggestions.

I am hacking together an e2e test to convince myself that the implementation is correct together with envoyproxy.
You can find it here if you want to have a look, with this being the source for that custom image for echo-basic. Eventually I would also like to create PRs for them, but for now I would rather focus on completing the implementation changes currently on this branch

internal/ir/xds.go Outdated Show resolved Hide resolved
@arkodg
Copy link
Contributor

arkodg commented Jan 13, 2025

thanks @fogninid ! the PR looks good, added some comments to improve maintenance of the code
would be great if you could also enhance / edit a xds translator test https://github.com/envoyproxy/gateway/blob/main/internal/xds/translator/testdata/in/xds-ir/http-route.yaml and add an ep with draining set
you may have already figured this out :) - make testdata can help automate with updating the test output yaml files

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !

@arkodg arkodg requested review from a team January 13, 2025 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants