-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve xDS health check #5785
Improve xDS health check #5785
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits 😄
* in {@param newAttributes} has higher precedence. | ||
*/ | ||
@SuppressWarnings("unchecked") | ||
public Endpoint mergeAttrs(Attributes newAttributes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe unrelated, but I noticed that the current withAttrs
behavior is somewhat confusing. withAttr(AttributeKey, ...)
doesn't replace the entire attributes but withAttrs()
does, and yet they both use with
prefix. I feel like we need this change first:
- Change the semantics of
withAttrs()
so it doesn't replace all attributes. - Introduce
replaceAttrs()
that provides the original behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this, but note that this PR will act as an umbrella PR to let reviewers know the overall direction of the health check changes.
This PR will be split into multiple PRs for more thorough/readable changes as usual
core/src/main/java/com/linecorp/armeria/client/endpoint/healthcheck/HealthCheckerParams.java
Outdated
Show resolved
Hide resolved
* A dummy implementation to avoid breaking changes in {@link AbstractHealthCheckedEndpointGroupBuilder}. | ||
*/ | ||
Function<Endpoint, HealthCheckerParams> paramsFactory() { | ||
return endpoint -> new HealthCheckerParamsAdapter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about just always using the same HealthCheckerParams
implementation class (or factory method) so we don't generate multiple classes? It looks like a simple value class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, did a cast instead of the previous method chaining.
0ba8906
to
c2b2403
Compare
Motivation: It seems inconsistent that `Endpoint#withAttr` merges the attribute to the existing attributes, whereas `Endpoint#withAttrs` simply replaces the previous attributes. I propose that we modify the behavior of `Endpoint#withAttrs` to be aligned with `Endpoint#withAttr`. Additionally, in order to support the previous behavior, I propose a new API `Endpoint#replaceAttrs` be added. I've confirmed that there are no uses of this API at least internally. ref: #5785 (comment) Modifications: - Modified the behavior of `Endpoint#withAttrs` to merge attributes - Added a new API `Endpoint#replaceAttrs` to support the previous behavior Result: - More consistent APIs - Preparation for #5785 <!-- Visit this URL to learn more about how to write a pull request description: https://armeria.dev/community/developer-guide#how-to-write-pull-request-description -->
* }</pre> | ||
*/ | ||
@UnstableApi | ||
public SELF healthCheckedEndpointPredicate(Predicate<Endpoint> healthCheckedEndpointPredicate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this API is used in xDS now, how about adding it after #5741 is merged?
Because I think the health status needs to be exposed explicitly to users to implement this healthCheckedEndpointPredicate
.
public SELF healthCheckedEndpointPredicate(BiPredicate<Endpoint, HealthStatus> healthCheckedEndpointPredicate) { ... }
By the way, sorry for the late review for #5741 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this API is used in xDS now, how about adding it after #5741 is merged?
I didn't think #5741 is related to this PR since I didn't think DEGRADED
is not really used internally at the moment. I thought the more important benefit of this PR was that UNHEALTHY
endpoints are now considered for priority calculation.
Because I think the health status needs to be exposed explicitly to users to implement this healthCheckedEndpointPredicate.
I see. Am I understanding correctly that you prefer this API is not exposed publicly unless HealthCheckedEndpoint
is introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the more important benefit of this PR was that UNHEALTHY endpoints are now considered for priority calculation.
I looked over quicky your PR. Now I understood your purpose.
you prefer this API is not exposed publicly unless HealthCheckedEndpoint is introduced?
I thought at least an utility API should be provided to check the health status of an Endpoint
if it takes some time to design HealthCheckedEndpoint
.
It seems like no one will be using this API right now, so it is okay to add it. But changes for HealthCheckedEndpoint
would be needed in a f/u work to make this API useful to normal users.
@@ -89,6 +94,8 @@ final class DefaultHealthCheckerContext | |||
this.clientOptions = clientOptions; | |||
this.retryBackoff = retryBackoff; | |||
this.onUpdateHealth = onUpdateHealth; | |||
endpointAttributes = Attributes.of(HEALTHY_ATTR, false, | |||
DEGRADED_ATTR, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we cache 4 Attributes.of(HEALTHY_ATTR, false|true, DEGRADED_ATTR, false|true)
in a static map and reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
*/ | ||
@UnstableApi | ||
public SELF initialStateResolver( | ||
Function<? super List<Endpoint>, ? extends List<Endpoint>> initialStateResolver) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional) What do you think of changing this method to take an Endpoint at once? I think this method resembles a combination of map + filter
in a role.
initialStateResolver(endpoint -> endpoint)
initialStateResolver(endpoint -> null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, since the xDS implementation needs the health check attribute to be retained from filtered initialStateResolver
, an AttributeRetainingEndpointGroup
has been introduced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent way too long debugging an issue related to this. I realized that initialStateResolver
can have a race condition with the first update of healthy candidates.
I've just removed initialStateResolver
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jrhee17 🙇♂️🙇♂️
* }</pre> | ||
*/ | ||
@UnstableApi | ||
public SELF healthCheckedEndpointPredicate(Predicate<Endpoint> healthCheckedEndpointPredicate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the more important benefit of this PR was that UNHEALTHY endpoints are now considered for priority calculation.
I looked over quicky your PR. Now I understood your purpose.
you prefer this API is not exposed publicly unless HealthCheckedEndpoint is introduced?
I thought at least an utility API should be provided to check the health status of an Endpoint
if it takes some time to design HealthCheckedEndpoint
.
It seems like no one will be using this API right now, so it is okay to add it. But changes for HealthCheckedEndpoint
would be needed in a f/u work to make this API useful to normal users.
78d9d71
to
ee7512c
Compare
1bfc0f4
to
c1b45e5
Compare
This PR is now back to reliably passing the CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic. 👍 👍 👍
.../linecorp/armeria/client/endpoint/healthcheck/AbstractHealthCheckedEndpointGroupBuilder.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/linecorp/armeria/client/endpoint/healthcheck/HealthCheckContextGroup.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/linecorp/armeria/client/endpoint/healthcheck/HealthCheckedEndpointGroup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/client/endpoint/EndpointAttributeKeys.java
Outdated
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/internal/client/endpoint/EndpointAttributeKeys.java
Outdated
Show resolved
Hide resolved
xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/XdsEndpointUtil.java
Outdated
Show resolved
Hide resolved
xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/XdsEndpointUtil.java
Show resolved
Hide resolved
...main/java/com/linecorp/armeria/xds/client/endpoint/XdsHealthCheckedEndpointGroupBuilder.java
Outdated
Show resolved
Hide resolved
...main/java/com/linecorp/armeria/xds/client/endpoint/XdsHealthCheckedEndpointGroupBuilder.java
Outdated
Show resolved
Hide resolved
...main/java/com/linecorp/armeria/xds/client/endpoint/XdsHealthCheckedEndpointGroupBuilder.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍 👍 👍 👍
Motivation:
It is recommended to review #5802 prior to this PR
This changeset attempts to solve several problems:
Custom filter logic
xDS considers all endpoints when computing whether a
PrioritySet
is in panic state. For instance, if the percentage of unhealthy endpoints exceeds a preconfigured panic threshold, the endpoint selection includes all endpoints regardless of the degraded status. While armeria supports anHealthCheckedEndpointGroup
out of the box, it filters out healthy endpoints automatically.armeria/xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/DefaultLoadBalancer.java
Lines 99 to 105 in 72ebfe1
In order to resolve this, I propose that a
AbstractHealthCheckedEndpointGroupBuilder#healthCheckedEndpointPredicate
API is addedPer-cluster health check configuration
Per-cluster member health check is difficult with the current API since a single parameter set is statically defined for an entire health checked endpoint group.
We already have an abstraction
AbstractHealthCheckedEndpointGroupBuilder#newCheckerFactory
. I propose that this API be used for the purpose of xDS. In order to support the parameters xDS allows configuring, I propose that parameters are passed toHttpHealthChecker
via the constructor instead of theHealthCheckerContext
. In order to support this change,HttpHealthChecker
has been moved to an internal package so thexds
module can also access it.Modifications:
AbstractHealthCheckedEndpointGroupBuilder#healthCheckedEndpointPredicate
and modifiedHealthCheckedEndpointGroup
to filter endpoints based on the predicate.Endpoint
s now have attributesHEALTHY
andDEGRADED
set.HealthCheckedEndpointGroup#setEndpoints
is now called on every invocation ofupdateHealth
, not just when a health status changes.Endpoint
, so modified to receive health checked parameters from theHttpHealthChecker
constructor instead ofHealthCheckerContext
.HttpHealthChecker
has been moved to an internal packageHealthCheckerContext#originalEndpoint
has been added to retrieve the original endpoint. The other APIs haven't been deprecated since it is potentially useful information that otherHealthChecker
s may be using.XdsHealthCheckedEndpointGroupBuilder
which implements its owncheckerFactory
.EndpointUtil#coarseHealth
has been updated to considerHEALTHY
,DEGRADED
attributes when determining health.Result:
XdsEndpointGroup
is more aligned with envoy in terms of health checking