Skip to content

feat: merge two metric servers #728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nayihz
Copy link
Contributor

@nayihz nayihz commented Apr 23, 2025

Fix #699

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nayihz
Once this PR has been reviewed and has the lgtm label, please assign kfswain for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

netlify bot commented Apr 23, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 559fe48
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/6809e427c55fe30008ea2182
😎 Deploy Preview https://deploy-preview-728--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 23, 2025
@nayihz nayihz changed the title {WIP} feat: migrate two metric servers {WIP} feat: merge two metric servers Apr 23, 2025
@nayihz nayihz force-pushed the feat_migrate_metric branch 2 times, most recently from 23553ed to 555aa6b Compare April 23, 2025 02:41
@kfswain
Copy link
Collaborator

kfswain commented Apr 23, 2025

CC: @JeffLuoo PTAL

Subsystem: InferenceModelComponent,
Name: "request_total",
Help: "Counter of inference model requests broken out for each model and target model.",
StabilityLevel: compbasemetrics.ALPHA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep the metric stage for all metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prometheus.CounterOpts do not support StabilityLevel field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stability level is just added to the help message of the metric: https://github.com/kubernetes/component-base/blob/88ca0abd83492f6f11c502946dd2657f8b3374b4/metrics/opts.go#L149-L153

To align with other k8s metrics https://kubernetes.io/docs/reference/instrumentation/metrics/ can you add this back manually to add back the stability level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussed in other comments, can you please just add the stability level to the help message of the metric? Thanks!

@nayihz nayihz force-pushed the feat_migrate_metric branch from 555aa6b to c2fb015 Compare April 24, 2025 01:10
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 24, 2025
@nayihz nayihz changed the title {WIP} feat: merge two metric servers feat: merge two metric servers Apr 24, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2025
@nayihz
Copy link
Contributor Author

nayihz commented Apr 24, 2025

This pr is ready for review.
/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 24, 2025
@nayihz nayihz force-pushed the feat_migrate_metric branch from e4d1ca6 to 559fe48 Compare April 24, 2025 07:11
@JeffLuoo
Copy link
Contributor

JeffLuoo commented Apr 24, 2025

What will be the purpose of merging this two metrics server? If the metrics handler is from controller-manager then would it be better to only add controller-level metrics to it, and leave the application-level handler as it?

By using the controller-manager I see that the stability is dropped because it uses the prometheus client-go.

cc: @richabanker from SIGs Instrumentation. Hi Richa, I have a few questions about supporting metrics in a SIG project:

  1. Is it suggested to use controller-manager for all metrics in the project if the project uses controller-manager?
  2. Is stability level for metrics required for projects under kubernetes-sigs?

@nayihz
Copy link
Contributor Author

nayihz commented Apr 25, 2025

  1. Is it suggested to use controller-manager for all metrics in the project if the project uses controller-manager?

As far as I know, all projects using the controller-manager have only one metric server.

@richabanker
Copy link

richabanker commented Apr 25, 2025

  1. Is it suggested to use controller-manager for all metrics in the project if the project uses controller-manager?

I'll be honest I am not much familiar with controller-runtime, but it does make sense to reuse the provided capability for metrics in the manager for a custom controller to simplify development and ensuring consistency across controllers

  1. Is stability level for metrics required for projects under kubernetes-sigs?

While not strictly enforced across all kubernetes-sigs projects, the stability levels for metrics are primarily critical for core Kubernetes (k/k) components to ensure guarantees for control plane observability. That said, defining and clearly communicating metric stability (e.g., Alpha, Beta, Stable) is highly beneficial for any project. Maybe you could still add that info in the HELP for a metric? It greatly improves user understanding of metric reliability and potential changes, making their adoption and use much easier.

I'll add @dgrisonnet to add his thoughts as well

@dgrisonnet
Copy link
Member

dgrisonnet commented Apr 28, 2025

What will be the purpose of merging this two metrics server? If the metrics handler is from controller-manager then would it be better to only add controller-level metrics to it, and leave the application-level handler as it?

I don't have a lot of knowledge about controller-manager either, but the benefit of merging metrics handlers in a single endpoint is that you have the guarantee that your users will get the full metrics experience that they might need to debug the application as long as they scrape the usual :9090/metrics endpoint.

Having separate endpoints is the most useful when you are exposing metrics about completely different things (e.g kubelet container metrics & kubelet health metrics) or when you want to expose a subset of the metrics exposed in the main endpoint for higher resolution of your application SLIs.

It is worth noting that it is possible to serve multiple metrics handler in the same server and from different sources. You can find an example of that in https://github.com/kubernetes-sigs/metrics-server/blob/master/pkg/server/config.go#L115-L119. You can even register Prometheus metrics / collector to the Kubernetes Registry and they will have a stability level to Alpha by default.

  1. Is stability level for metrics required for projects under kubernetes-sigs?

The only thing I would add to Richa's point is that today the static analysis we have in k/k to enforce the stability framework is not easily exportable to other projects, so you wouldn't get the full benefits of the framework. However, we can probably take an action item as a SIG to look into that in the future.

@JeffLuoo
Copy link
Contributor

JeffLuoo commented Apr 28, 2025

@richabanker @dgrisonnet Thank you both for the information! Given that it's not required in registry outside k/k but it's a best practice to follow and I think we can try to align here in the repo for better management of metics stability.

@JeffLuoo
Copy link
Contributor

also cc: @rramkumar1 to review the BBR metrics.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@danehans
Copy link
Contributor

danehans commented May 1, 2025

@nayihz thanks again for the PR. Can you rebase and resolve review feedback, e.g. metric stability, to move this PR forward?

cc: @rramkumar1 regarding BBR metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge two metrics http server
7 participants