-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orca: update the orca validation annotation #109
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: wangbaiping(wbpcode) <[email protected]>
double mem_utilization = 2 [(validate.rules).double.gte = 0, (validate.rules).double.lte = 1]; | ||
double mem_utilization = 2 [(validate.rules).double.gte = 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value may be larger than 1.0 when the usage exceeds the reporter dependent notion of soft limits, like the application_utilization
did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, to the best of my knowledge, the PGV validation rules are not actually used for validation at runtime, but just as a guideline to the implementers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, the validation only works if the call the validate method 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I this there should be a limit that is enforced in the code, regardless of whether the thresholds are explicitly stated here or not.
I would suggest that the code caps the range to something like [0, 1], just to prevent bad upstream actors from the ability to impact the load-balancing decision too much (either intentionally or unintentionally).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that as per the xDS spec, the PGV annotations are really not a first-class part of the API:
https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#protoc-gen-validate-annotations
I believe that Envoy uses these annotations as part of validating resources it receives from the control plane, but I don't know if it does the same thing for ORCA protos. gRPC does not use PGV at all, so changing the PGV annotations does not affect gRPC.
With regard to the utilization values sometimes being greater than 1, I agree that this can sometimes happen. We made this exact same change for the cpu_utilization
field in #60, for exactly the same reason. So this change seems fine to me.
|
||
// Resource utilization values. Each value is expressed as a fraction of total resources | ||
// available, derived from the latest sample or measurement. | ||
map<string, double> utilization = 5 | ||
[(validate.rules).map.values.double.gte = 0, (validate.rules).map.values.double.lte = 1]; | ||
map<string, double> utilization = 5 [(validate.rules).map.values.double.gte = 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
Signed-off-by: wangbaiping(wbpcode) <[email protected]>
By the way, I think we may also need to add a validation to the |
Signed-off-by: wangbaiping(wbpcode) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here, as I'm not sure how the fields are used.
I think it would be good to hear from @markdroth and @efimki as I think they are aware of systems that use ORCA.
double mem_utilization = 2 [(validate.rules).double.gte = 0, (validate.rules).double.lte = 1]; | ||
double mem_utilization = 2 [(validate.rules).double.gte = 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, to the best of my knowledge, the PGV validation rules are not actually used for validation at runtime, but just as a guideline to the implementers.
@efimki @markdroth any reservations about modifying the upper bound limits? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I think we may also need to add a validation to the request_cost to ensure every cost value is larger than or equal to 0?
Technically, increasing the strictness of a PGV annotation (as opposed to decreasing it) is considered a breaking API change. The API versioning doc lists the following as a breaking change:
Increasing the strictness of protoc-gen-validate annotations. Exceptions may be granted for scenarios in which these stricter conditions model behavior already implied structurally or by documentation.
In this particular case, though, if we know that Envoy doesn't actually use the PGV annotations for ORCA, then this change is probably safe. As I mentioned in my other comment, gRPC does not use PGV at all, so this doesn't affect us.
double mem_utilization = 2 [(validate.rules).double.gte = 0, (validate.rules).double.lte = 1]; | ||
double mem_utilization = 2 [(validate.rules).double.gte = 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that as per the xDS spec, the PGV annotations are really not a first-class part of the API:
https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#protoc-gen-validate-annotations
I believe that Envoy uses these annotations as part of validating resources it receives from the control plane, but I don't know if it does the same thing for ORCA protos. gRPC does not use PGV at all, so changing the PGV annotations does not affect gRPC.
With regard to the utilization values sometimes being greater than 1, I agree that this can sometimes happen. We made this exact same change for the cpu_utilization
field in #60, for exactly the same reason. So this change seems fine to me.
No description provided.