-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: metrics for services and checks #519
base: master
Are you sure you want to change the base?
Conversation
Some investigation into the default metrics that come with the Prometheus Go client: 1 List of Metrics from Go
|
A PoC to add a new type of identity ( 1 Manually Add an Identity from a YAML File$ cat identity.yaml
identities:
bob:
access: read
basicauth:
username: foo
password: bar
$ ./pebble add-identities --from ./identity.yaml
Added 1 new identity. 2 Start Pebble$ ./pebble run --http=:4000
2024-11-26T14:27:53.682Z [pebble] HTTP API server listening on ":4000".
2024-11-26T14:27:53.682Z [pebble] Started daemon.
2024-11-26T14:27:53.686Z [pebble] POST /v1/services 63.667µs 400
2024-11-26T14:27:53.686Z [pebble] Cannot start default services: no default services 3 Access the Metrics Endpoint with the Newly Created Identity$ curl -u foo:bar localhost:4000/metrics
# HELP my_counter A simple counter
# TYPE my_counter counter
my_counter 4 4 Access without Identity or with an Invalid Username/Password$ curl localhost:4000/metrics
{"type":"error","status-code":401,"status":"Unauthorized","result":{"message":"access denied","kind":"login-required"}}
$ curl -u invalid:invalid localhost:4000/metrics
{"type":"error","status-code":401,"status":"Unauthorized","result":{"message":"access denied","kind":"login-required"}} |
According to the last spec review, the following changes have been made:
After the first round of refactoring, here are some results: 1 Baisc Identity Name with Special Characters$ cat identity.yaml
identities:
"bob:asdf":
access: read
basic:
password: bar
$ ./pebble add-identities --from ./identity.yaml
error: identity "bob:asdf" invalid: identity name "bob:asdf" contains invalid characters (only
alphanumeric, underscore, and hyphen allowed) 2 Baisc Identity without Username$ cat identity.yaml
identities:
bob:
access: read
basic:
password: bar
ubuntu@primary:~/work/pebble2$ ./pebble add-identities --from ./identity.yaml
Added 1 new identity. 3 Basic Identity Type "metrics"$ # access type: read
$ cat identity.yaml
identities:
bob:
access: read
basic:
password: bar
$ ./pebble add-identities --from ./identity.yaml
Added 1 new identity.
$ # open access is fine
$ curl -u bob:bar localhost:4000/v1/health
{"type":"sync","status-code":200,"status":"OK","result":{"healthy":true}}
$ # no access on the metrics endpoint
$ curl -u bob:bar localhost:4000/metrics
{"type":"error","status-code":401,"status":"Unauthorized","result":{"message":"access denied","kind":"login-required"}} $ # access type: metrics
$ cat identity.yaml
identities:
bob:
access: metrics
basic:
password: bar
$ ./pebble update-identities --from ./identity.yaml
Updated 1 identity.
$ # open access is fine
$ curl -u bob:bar localhost:4000/v1/health
{"type":"sync","status-code":200,"status":"OK","result":{"healthy":true}}
$ # accessing metrics
$ curl -u bob:bar localhost:4000/metrics
# HELP my_counter Total number of something processed.
# TYPE my_counter counter
my_counter{operation=read,status=success} 11
my_counter{operation=write,status=success} 22
my_counter{operation=read,status=failed} 11
# HELP my_gauge Current value of something.
# TYPE my_gauge gauge
my_gauge{sensor=temperature} 28.12
$ # no access on other endpoints
$ curl -u bob:bar localhost:4000/v1/changes
{"type":"error","status-code":401,"status":"Unauthorized","result":{"message":"access denied","kind":"login-required"}} $ # access type: admin
$ cat identity.yaml
identities:
bob:
access: admin
basic:
password: bar
$ ./pebble update-identities --from ./identity.yaml
Updated 1 identity.
$ # admin can read metrics
$ curl -u bob:bar localhost:4000/v1/metrics
# HELP my_counter Total number of something processed.
# TYPE my_counter counter
my_counter{operation=read,status=success} 176
my_counter{operation=write,status=success} 352
my_counter{operation=read,status=failed} 176
# HELP my_gauge Current value of something.
# TYPE my_gauge gauge
my_gauge{sensor=temperature} 24.48 TODO: hashing password. |
Notes: We need to handle the memory usage issue in the future, to make this easier, after discussion, we decided not to use a self-implemented Prometheus-like module to store the metrics centrally, but rather, store the metrics on existing structs like |
Take some notes before the holiday season: Currently, the check metrics only work when the check is successful. When it fails, both counters are reset to 0 and never increase. We need more debugging. Maybe it has something to do with |
In the latest commits,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious about this change listening to the mid-cycle roadmap presentation. Nobody asked for this review, please feel free to ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor comments, though a few structural things. Main thing that needs discussion is whether the perform_check and recover_check counts are actually going to provide the monitoring we want -- let's discuss.
@IronCore864 Can you please update the PR description to match our new approach? Also, it probably goes without saying, but let's be sure not to merge this before the underlying identities PR that this builds on (#563) is reviewed for security and merged. |
Check metrics updated, tests also updated. |
Add simple metrics for services and health checks in OpenTelemetry exposition format.
Metrics include:
pebble_service_active{service="foo"}
pebble_service_starts_total{service="foo"}
pebble_check_up{check="bar"}
pebble_perform_check_count{check="bar"}
pebble_recover_check_count{check="bar"}
More details:
serviceData
andCheckInfo
).CheckManager.checks
is updated frommap[string]CheckInfo
to pointersmap[string]*CheckInfo
.