-
Notifications
You must be signed in to change notification settings - Fork 74
feat: merge two metric servers #728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nayihz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
23553ed
to
555aa6b
Compare
CC: @JeffLuoo PTAL |
Subsystem: InferenceModelComponent, | ||
Name: "request_total", | ||
Help: "Counter of inference model requests broken out for each model and target model.", | ||
StabilityLevel: compbasemetrics.ALPHA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the metric stage for all metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prometheus.CounterOpts
do not support StabilityLevel
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stability level is just added to the help message of the metric: https://github.com/kubernetes/component-base/blob/88ca0abd83492f6f11c502946dd2657f8b3374b4/metrics/opts.go#L149-L153
To align with other k8s metrics https://kubernetes.io/docs/reference/instrumentation/metrics/ can you add this back manually to add back the stability level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussed in other comments, can you please just add the stability level to the help message of the metric? Thanks!
Signed-off-by: nayihz <[email protected]>
555aa6b
to
c2fb015
Compare
Signed-off-by: nayihz <[email protected]>
This pr is ready for review. |
Signed-off-by: nayihz <[email protected]>
e4d1ca6
to
559fe48
Compare
What will be the purpose of merging this two metrics server? If the metrics handler is from controller-manager then would it be better to only add controller-level metrics to it, and leave the application-level handler as it? By using the controller-manager I see that the stability is dropped because it uses the prometheus client-go. cc: @richabanker from SIGs Instrumentation. Hi Richa, I have a few questions about supporting metrics in a SIG project:
|
As far as I know, all projects using the controller-manager have only one metric server. |
I'll be honest I am not much familiar with controller-runtime, but it does make sense to reuse the provided capability for metrics in the manager for a custom controller to simplify development and ensuring consistency across controllers
While not strictly enforced across all kubernetes-sigs projects, the stability levels for metrics are primarily critical for core Kubernetes (k/k) components to ensure guarantees for control plane observability. That said, defining and clearly communicating metric stability (e.g., Alpha, Beta, Stable) is highly beneficial for any project. Maybe you could still add that info in the HELP for a metric? It greatly improves user understanding of metric reliability and potential changes, making their adoption and use much easier. I'll add @dgrisonnet to add his thoughts as well |
I don't have a lot of knowledge about controller-manager either, but the benefit of merging metrics handlers in a single endpoint is that you have the guarantee that your users will get the full metrics experience that they might need to debug the application as long as they scrape the usual Having separate endpoints is the most useful when you are exposing metrics about completely different things (e.g kubelet container metrics & kubelet health metrics) or when you want to expose a subset of the metrics exposed in the main endpoint for higher resolution of your application SLIs. It is worth noting that it is possible to serve multiple metrics handler in the same server and from different sources. You can find an example of that in https://github.com/kubernetes-sigs/metrics-server/blob/master/pkg/server/config.go#L115-L119. You can even register Prometheus metrics / collector to the Kubernetes Registry and they will have a stability level to Alpha by default.
The only thing I would add to Richa's point is that today the static analysis we have in k/k to enforce the stability framework is not easily exportable to other projects, so you wouldn't get the full benefits of the framework. However, we can probably take an action item as a SIG to look into that in the future. |
@richabanker @dgrisonnet Thank you both for the information! Given that it's not required in registry outside |
also cc: @rramkumar1 to review the BBR metrics. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@nayihz thanks again for the PR. Can you rebase and resolve review feedback, e.g. metric stability, to move this PR forward? cc: @rramkumar1 regarding BBR metrics. |
Fix #699