Unsuitable histogram buckets for LBC metrics (`awslbc_readiness_gate_ready_seconds_bucket`) #3987

frittentheke · 2024-12-17T14:38:54Z

Describe the bug
Following the discussion around very slow target registration in #1834 PR #3941 was crafted by @zac-nixon adding metrics about the latency (podReadinessFlipSeconds) of the readiness gate. This cool new feature was merged by @wweiwei-li and @shraddhabang and then released with https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/tag/v2.10.1

Unfortunately the used buckets for the histogram are unsuitable for the latency observed (and realistic) with AWS NLB target registration. As it stands the now improved registration time is about 60 to 70 seconds with the buckets being:
{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10} by default (see https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables). This causes all readiness flips to end up the in the catchall bucket, e.g.:

awslbc_readiness_gate_ready_seconds_bucket{le="+Inf"}

Likely some linear buckets (https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#LinearBuckets) with a range from e.g. 30s to 5m that can be expected from the API and processes behind the health check and readiness gate mechanism makes sense.

Steps to reproduce

Expected outcome
A concise description of what you expected to happen.

Environment

AWS Load Balancer controller version: 2.10.1
Kubernetes version: 1.31.x

Additional Context:

The text was updated successfully, but these errors were encountered:

zac-nixon · 2024-12-18T16:17:22Z

Sorry about the bad buckets chosen. I'll send a patch to add more realistic bucket configurations :).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsuitable histogram buckets for LBC metrics (`awslbc_readiness_gate_ready_seconds_bucket`) #3987

Unsuitable histogram buckets for LBC metrics (`awslbc_readiness_gate_ready_seconds_bucket`) #3987

frittentheke commented Dec 17, 2024 •

edited

Loading

zac-nixon commented Dec 18, 2024

Unsuitable histogram buckets for LBC metrics (awslbc_readiness_gate_ready_seconds_bucket) #3987

Unsuitable histogram buckets for LBC metrics (awslbc_readiness_gate_ready_seconds_bucket) #3987

Comments

frittentheke commented Dec 17, 2024 • edited Loading

zac-nixon commented Dec 18, 2024

Unsuitable histogram buckets for LBC metrics (`awslbc_readiness_gate_ready_seconds_bucket`) #3987

Unsuitable histogram buckets for LBC metrics (`awslbc_readiness_gate_ready_seconds_bucket`) #3987

frittentheke commented Dec 17, 2024 •

edited

Loading