Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orca: update the orca validation annotation #109

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wbpcode
Copy link
Contributor

@wbpcode wbpcode commented Dec 4, 2024

No description provided.

Signed-off-by: wangbaiping(wbpcode) <[email protected]>
double mem_utilization = 2 [(validate.rules).double.gte = 0, (validate.rules).double.lte = 1];
double mem_utilization = 2 [(validate.rules).double.gte = 0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value may be larger than 1.0 when the usage exceeds the reporter dependent notion of soft limits, like the application_utilization did.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, to the best of my knowledge, the PGV validation rules are not actually used for validation at runtime, but just as a guideline to the implementers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the validation only works if the call the validate method 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I this there should be a limit that is enforced in the code, regardless of whether the thresholds are explicitly stated here or not.
I would suggest that the code caps the range to something like [0, 1], just to prevent bad upstream actors from the ability to impact the load-balancing decision too much (either intentionally or unintentionally).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that as per the xDS spec, the PGV annotations are really not a first-class part of the API:

https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#protoc-gen-validate-annotations

I believe that Envoy uses these annotations as part of validating resources it receives from the control plane, but I don't know if it does the same thing for ORCA protos. gRPC does not use PGV at all, so changing the PGV annotations does not affect gRPC.

With regard to the utilization values sometimes being greater than 1, I agree that this can sometimes happen. We made this exact same change for the cpu_utilization field in #60, for exactly the same reason. So this change seems fine to me.


// Resource utilization values. Each value is expressed as a fraction of total resources
// available, derived from the latest sample or measurement.
map<string, double> utilization = 5
[(validate.rules).map.values.double.gte = 0, (validate.rules).map.values.double.lte = 1];
map<string, double> utilization = 5 [(validate.rules).map.values.double.gte = 0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Signed-off-by: wangbaiping(wbpcode) <[email protected]>
@wbpcode
Copy link
Contributor Author

wbpcode commented Dec 4, 2024

By the way, I think we may also need to add a validation to the request_cost to ensure every cost value is larger than or equal to 0?

@wbpcode
Copy link
Contributor Author

wbpcode commented Dec 4, 2024

@adisuissa

Signed-off-by: wangbaiping(wbpcode) <[email protected]>
Copy link

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here, as I'm not sure how the fields are used.
I think it would be good to hear from @markdroth and @efimki as I think they are aware of systems that use ORCA.

double mem_utilization = 2 [(validate.rules).double.gte = 0, (validate.rules).double.lte = 1];
double mem_utilization = 2 [(validate.rules).double.gte = 0];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, to the best of my knowledge, the PGV validation rules are not actually used for validation at runtime, but just as a guideline to the implementers.

@adisuissa
Copy link

@efimki @markdroth any reservations about modifying the upper bound limits?

Copy link
Contributor

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I think we may also need to add a validation to the request_cost to ensure every cost value is larger than or equal to 0?

Technically, increasing the strictness of a PGV annotation (as opposed to decreasing it) is considered a breaking API change. The API versioning doc lists the following as a breaking change:

Increasing the strictness of protoc-gen-validate annotations. Exceptions may be granted for scenarios in which these stricter conditions model behavior already implied structurally or by documentation.

In this particular case, though, if we know that Envoy doesn't actually use the PGV annotations for ORCA, then this change is probably safe. As I mentioned in my other comment, gRPC does not use PGV at all, so this doesn't affect us.

double mem_utilization = 2 [(validate.rules).double.gte = 0, (validate.rules).double.lte = 1];
double mem_utilization = 2 [(validate.rules).double.gte = 0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that as per the xDS spec, the PGV annotations are really not a first-class part of the API:

https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#protoc-gen-validate-annotations

I believe that Envoy uses these annotations as part of validating resources it receives from the control plane, but I don't know if it does the same thing for ORCA protos. gRPC does not use PGV at all, so changing the PGV annotations does not affect gRPC.

With regard to the utilization values sometimes being greater than 1, I agree that this can sometimes happen. We made this exact same change for the cpu_utilization field in #60, for exactly the same reason. So this change seems fine to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants