Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Cloud Metrics Server #281

Open
Iceber opened this issue Jul 25, 2022 · 8 comments
Open

Multi-Cloud Metrics Server #281

Iceber opened this issue Jul 25, 2022 · 8 comments
Assignees
Labels
kind/feature New feature
Milestone

Comments

@Iceber
Copy link
Member

Iceber commented Jul 25, 2022

What would you like to be added?

The clusterpedia apiserver will have a built-in multi-cluster metrics server that will provide at least the following features:

  • Use kubectl --cluster <cluster> top node/pod to view the resource consumption of a node or pod under a cluster
  • The user can choose to append resource consumption to annotation when List/Get resources
  • Not only does it provide the same cpu and memory metrics as the metrics server, but it should also provide other metrics such as network-related information via external metrics, custom metrics
  • Also supports watch metrics, so that users can listen to resource consumption changes without the need for training rounds

We will refer to the following project implementation

The multi-cloud metrics server will be implemented in two ways:

  1. the same as the metrics server, with direct access to the cluster's /api/v1/nodes//proxy/metrics/resources and /api/v1/nodes//proxy/metrics/cadvisor interfaces on a regular rotation
  2. get metrics information through OpenTelemetry, which can be used for future Agent mode

We plan to implement the first way first in Q3

Why is this needed?

Users could easily get the resource consumption through clusterpedia

@Iceber Iceber added the kind/feature New feature label Jul 25, 2022
@grzesuav
Copy link

will it focus on cpu/memory (resources metrics) + network ones (as external/custom) or will allow exposing others via for example prometheus ?

@Iceber
Copy link
Member Author

Iceber commented Jul 26, 2022

We should be exposing cpu and memory by default, and possibly network and file io

or will allow exposing others via for example prometheus ?

This looks like a custom metrics resource based on promql or some other syntax, which looks good and is a good idea, and we can consider supporting it after we implement the basic features

@grzesuav
Copy link

@Iceber what API you would like to implement first?

  • metrics.k8s.io
  • custommetrics
  • externalmetrics

in first implementation ? As metrics.k8s.io is just for cpu/memory. so for network/io, right ?

@Iceber
Copy link
Member Author

Iceber commented Jul 28, 2022

@grzesuav We will first implement a basic usable minimal functionality, and the first stage will not involve prometheus or opentelemery.

And we will first implement the resource metrics(metrics.k8s.io) —— cpu and memory, which can be obtained directly from /api/v1/nodes/< node >/proxy/metrics/resource. (similar to the metrics server's logic)

Then we'll use /api/v1/nodes/< node >/proxy/metrics/cadvisor to get more information, including network and io, etc

In the third phase, we will start to implement more features based on opentelemetry or prometheus.

More suggestions are welcome.

@Iceber Iceber pinned this issue Aug 4, 2022
@Iceber
Copy link
Member Author

Iceber commented Aug 31, 2022

/assign

@Iceber
Copy link
Member Author

Iceber commented Sep 1, 2022

Clusterpedia & Multi Metrics Server
Multi Metrics Server

@grzesuav
Copy link

grzesuav commented Sep 2, 2022

@Iceber how you distinguish from which cluster metrics come from ?

@Iceber
Copy link
Member Author

Iceber commented Sep 5, 2022

@grzesuav In the early stage, the two implementations of the multicluster metrics server would be two separate modules, and users would have to choose between them to deploy.

Then we would consider combining the two implementations to decide via crd whether to use thanos/otel or node proxy for metric api information for a cluster

@Iceber Iceber added this to the v0.6.0 milestone Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature
Projects
None yet
Development

No branches or pull requests

2 participants