feat: continue using and drain serving endpointslices during termination #4946

fogninid · 2024-12-18T19:56:23Z

Continue routing to endpoints while their graceful termination is in progress

What this PR does / why we need it:
use serving condition that was defined in kubernetes/kubernetes#92968

Release Notes: Yes

zhaohuabing

LGTM thanks!

arkodg · 2024-12-31T08:46:05Z

@fogninid why should traffic be routed to a pod during graceful termination? Imo with the right rolling update strategy in place, traffic should be shifted to the newer pod

fogninid · 2025-01-07T17:49:39Z

@fogninid why should traffic be routed to a pod during graceful termination? Imo with the right rolling update strategy in place, traffic should be shifted to the newer pod

@arkodg the {Serving=true,Terminating=true} state was defined by kubernetes from the PR I linked in the cover message.
My understanding is that the state is necessary so that backend-applications can communicate that traffic can still be routed to terminating pods, instead of dropping it and causing disruptions.
The clearest example is that of a rolling update of a statefulset of one replica: the single terminating pod can now have time to gracefully stop while still handling all incoming requests.

My expectations are for envoy-gateway to behave in the same way as other networking solutions. Most explicit mentions I found about this logic are close to what cilium is doing here.

envoy-gateway could do similar, using all endpoints with state {Serving=true,Terminating=false} if available, and falling back to include those {Serving=true,Terminating=true} if none would be found.

Would this be the right place to introduce this logic?

arkodg · 2025-01-07T18:38:56Z

thanks for sharing the link @fogninid, found this blog https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/#traffic-loss-from-load-balancers-during-rolling-updates which highlights how it can be used by Ingress Controllers

Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints.

This would translate to, if an endpoint is not Ready, but Serving, we can set it the lb endpoint status to Draining to not disrupt the existing connections right away, but newer connections would go to other endpoints

wdyt @envoyproxy/gateway-maintainers / @envoyproxy/gateway-reviewers

fogninid · 2025-01-08T08:34:14Z

thanks for sharing the link @fogninid, found this blog https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/#traffic-loss-from-load-balancers-during-rolling-updates which highlights how it can be used by Ingress Controllers
Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints.
This would translate to, if an endpoint is not Ready, but Serving, we can set it the lb endpoint status to Draining to not disrupt the existing connections right away, but newer connections would go to other endpoints

wdyt @envoyproxy/gateway-maintainers / @envoyproxy/gateway-reviewers

The description in that link for the behavior of a load balancer is too simplified for my taste. There is no 1:1 mapping from a single field of the EndpointConditions to the expected lb state of the matching endpoint, but rather the full list of endpoints and the combination of Serving,Terminating should map to a set of lb endpoints with appropriate states.

Again my main reasoning is that new connections should not be dropped if any Serving=true endpoint is currently available (irrespective its termination), this is the same as described for kube-proxy in this quote from that link:

Starting in Kubernetes v1.26, kube-proxy enables the ProxyTerminatingEndpoints feature by default, which adds automatic failover and routing to terminating endpoints in scenarios where the traffic would otherwise be dropped. More specifically, when there is a rolling update and a Node only contains terminating Pods, kube-proxy will route traffic to the terminating Pods based on their readiness. In addition, kube-proxy will actively fail the health check NodePort if there are only terminating Pods available. By doing so, kube-proxy alerts the external load balancer that new connections should not be sent to that Node but will gracefully handle requests for existing connections.

Based on the enovy config link you provided, I believe the k8s endpoints => envoy config mapping should look something like this:

k8s endpoints	envoy config
[{Address1,Serving=true,Terminating=false}]	[{Address1,HealthStatus=HEALTHY}]
[{Address1,Serving=false,Terminating=false}]	[{Address1,HealthStatus=UNHEALTHY}]
[{Address1,Serving=false,Terminating=true}]	[{Address1,HealthStatus=UNHEALTHY}]
[{Address1,Serving=true,Terminating=true}]	[{Address1,HealthStatus=HEALTHY}]	Endpoint1 enters termination phase (preStop running)
[{Address1,Serving=true,Terminating=true},{Address2,Serving=false,Terminating=false}]	[{Address1,HealthStatus=HEALTHY},{Address2,HealthStatus=UNHEALTHY}]	Endpoint2 appears, but is not ready yet => allow full use of the terminating endpoint
[{Address1,Serving=true,Terminating=true},{Address2,Serving=true,Terminating=false}]	[{Address1,HealthStatus=DRAINING},{Address2,HealthStatus=HEALTHY}]	Endpoint2 becomes ready => drain the terminating endpoint
[{Address1,Serving=false,Terminating=true},{Address2,Serving=true,Terminating=false}]	[{Address1,HealthStatus=DRAINING},{Address2,HealthStatus=HEALTHY}]	Endpoint1 enters final termination phase (preStop done)

arkodg · 2025-01-10T02:23:36Z

thanks for creating the detailed table, it looks good except for this row

[{Address1,Serving=false,Terminating=true} ,{Address2,Serving=true,Terminating=false}]
which maps to [{Address1,HealthStatus=DRAINING},{Address2,HealthStatus=HEALTHY}]

if ready is false and so is serving, we should not be including this endpoint at all, and the translator doesnt rely on prev state to decipher the state transition change

draining is an important distinction and if computed for the endpoints using draining := !ready && serving, we can still retain these endpoints (which is not done today) and maintain existing connections, and also send new requests to these draining endpoints in case there are no ready endpoints ( the default panic theshold is 50 in envoy https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/panic_threshold which envoy gateway doesn't change, so once health drops below %50 traffic is sent to all endpoints - healthy and non healthy, draining is treated as non healthy )

to implement this

we'd need to introduce a draining field in the IR

gateway/internal/ir/xds.go

Line 1327 in 8b4cf17

type DestinationSetting struct {

set this in

gateway/internal/gatewayapi/route.go

Line 1623 in 8b4cf17

    
           func getIREndpointsFromEndpointSlice(endpointSlice *discoveryv1.EndpointSlice, portName string, portProtocol corev1.Protocol) []*ir.DestinationEndpoint {

use the value to set the status to Draining in xds

gateway/internal/xds/translator/cluster.go

Line 418 in 8b4cf17

func buildXdsClusterLoadAssignment(clusterName string, destSettings []*ir.DestinationSetting) *endpointv3.ClusterLoadAssignment {

fogninid · 2025-01-10T08:47:32Z

[..]
to implement this

* we'd need to introduce a draining field in the IR https://github.com/envoyproxy/gateway/blob/8b4cf17ef7d8ef860a361d49014ffcaf003154bb/internal/ir/xds.go#L1327

* set this in https://github.com/envoyproxy/gateway/blob/8b4cf17ef7d8ef860a361d49014ffcaf003154bb/internal/gatewayapi/route.go#L1623

* use the value to set the status to `Draining` in xds https://github.com/envoyproxy/gateway/blob/8b4cf17ef7d8ef860a361d49014ffcaf003154bb/internal/xds/translator/cluster.go#L418

thanks for the hints, I will give a shot at implementing that way.
One question though: I would have put the complete HealthStatus in the IR (as a type int32 matching those relevant from envoy config). Or do you have a specific reason to prefer only a Draining boolean there?

arkodg · 2025-01-10T19:15:38Z

thanks @fogninid, prefer a bool to reduce memory footprint (at scale the order of endpoints is in the thousands, and the IR is cached in memory)

codecov · 2025-01-11T02:13:41Z

Codecov Report

Attention: Patch coverage is 83.33333% with 5 lines in your changes missing coverage. Please review.

Project coverage is 66.66%. Comparing base (271a697) to head (1848fb3).

Files with missing lines	Patch %	Lines
internal/ir/xds.go	0.00%	4 Missing ⚠️
internal/gatewayapi/route.go	94.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4946      +/-   ##
==========================================
- Coverage   66.73%   66.66%   -0.07%     
==========================================
  Files         210      210              
  Lines       32609    32615       +6     
==========================================
- Hits        21761    21743      -18     
- Misses       9537     9554      +17     
- Partials     1311     1318       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fogninid · 2025-01-13T17:26:40Z

@arkodg can you have a look at the latest changes? I think they now match your suggestions.

I am hacking together an e2e test to convince myself that the implementation is correct together with envoyproxy.
You can find it here if you want to have a look, with this being the source for that custom image for echo-basic. Eventually I would also like to create PRs for them, but for now I would rather focus on completing the implementation changes currently on this branch

internal/ir/xds.go

internal/gatewayapi/route.go

arkodg · 2025-01-13T18:32:18Z

thanks @fogninid ! the PR looks good, added some comments to improve maintenance of the code
would be great if you could also enhance / edit a xds translator test https://github.com/envoyproxy/gateway/blob/main/internal/xds/translator/testdata/in/xds-ir/http-route.yaml and add an ep with draining set
you may have already figured this out :) - make testdata can help automate with updating the test output yaml files

Signed-off-by: Daniele Fognini <[email protected]>

arkodg

LGTM thanks !

fogninid requested a review from a team as a code owner December 18, 2024 19:56

zhaohuabing reviewed Dec 31, 2024

View reviewed changes

fogninid force-pushed the main branch from 6b9024d to a259bfd Compare January 10, 2025 23:37

fogninid changed the title ~~feat: continue routing to serving endpointslices during termination~~ feat: continue using and drain serving endpointslices during termination Jan 10, 2025

fogninid force-pushed the main branch from a259bfd to 24fe670 Compare January 11, 2025 00:38

zirain force-pushed the main branch from 24fe670 to 2e20569 Compare January 11, 2025 01:43

arkodg reviewed Jan 13, 2025

View reviewed changes

internal/ir/xds.go Outdated Show resolved Hide resolved

arkodg reviewed Jan 13, 2025

View reviewed changes

internal/gatewayapi/route.go Outdated Show resolved Hide resolved

arkodg reviewed Jan 13, 2025

View reviewed changes

internal/gatewayapi/route.go Outdated Show resolved Hide resolved

fogninid force-pushed the main branch from 2e20569 to efb5f8d Compare January 13, 2025 20:08

feat: continue using and drain serving endpointslices during termination

1848fb3

Signed-off-by: Daniele Fognini <[email protected]>

fogninid force-pushed the main branch from efb5f8d to 1848fb3 Compare January 13, 2025 20:22

arkodg approved these changes Jan 13, 2025

View reviewed changes

arkodg requested review from a team January 13, 2025 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: continue using and drain serving endpointslices during termination #4946

feat: continue using and drain serving endpointslices during termination #4946

fogninid commented Dec 18, 2024 •

edited

Loading

zhaohuabing left a comment

arkodg commented Dec 31, 2024

fogninid commented Jan 7, 2025

arkodg commented Jan 7, 2025

fogninid commented Jan 8, 2025

arkodg commented Jan 10, 2025 •

edited

Loading

fogninid commented Jan 10, 2025

arkodg commented Jan 10, 2025

codecov bot commented Jan 11, 2025 •

edited

Loading

fogninid commented Jan 13, 2025

arkodg commented Jan 13, 2025 •

edited

Loading

arkodg left a comment

feat: continue using and drain serving endpointslices during termination #4946

Are you sure you want to change the base?

feat: continue using and drain serving endpointslices during termination #4946

Conversation

fogninid commented Dec 18, 2024 • edited Loading

zhaohuabing left a comment

Choose a reason for hiding this comment

arkodg commented Dec 31, 2024

fogninid commented Jan 7, 2025

arkodg commented Jan 7, 2025

fogninid commented Jan 8, 2025

arkodg commented Jan 10, 2025 • edited Loading

fogninid commented Jan 10, 2025

arkodg commented Jan 10, 2025

codecov bot commented Jan 11, 2025 • edited Loading

Codecov Report

fogninid commented Jan 13, 2025

arkodg commented Jan 13, 2025 • edited Loading

arkodg left a comment

Choose a reason for hiding this comment

fogninid commented Dec 18, 2024 •

edited

Loading

arkodg commented Jan 10, 2025 •

edited

Loading

codecov bot commented Jan 11, 2025 •

edited

Loading

arkodg commented Jan 13, 2025 •

edited

Loading