-
Couldn't load subscription status.
- Fork 36
Zookeeper duration #541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Zookeeper duration #541
Conversation
| @@ -0,0 +1,16 @@ | |||
| module: zookeeper | |||
| name: zookeeper-health | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| name: zookeeper-health | |
| name: health |
| @@ -0,0 +1,22 @@ | |||
| module: zookeeper | |||
| name: zookeeper-latency | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| name: zookeeper-latency | |
| name: latency |
| lasting_duration: "5m" | ||
| latency_disabled: "false" | ||
| major: | ||
| threshold: 250000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical & major thresholds seems ta have too close values.
| name: zookeeper-health | ||
| transformation: false | ||
| aggregation: true | ||
| exclude_not_running_vm: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary ?
| name: zookeeper-latency | ||
| transformation: false | ||
| aggregation: true | ||
| exclude_not_running_vm: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary ?
| } | ||
|
|
||
| resource "signalfx_detector" "zookeeper_health" { | ||
| /*resource "signalfx_detector" "zookeeper_health" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove
| } | ||
|
|
||
| # zookeeper_health detector | ||
| /*# zookeeper_health detector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove
| @@ -0,0 +1,16 @@ | |||
| module: zookeeper | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This detector should aggregate on all servers in the cluster and trigger a major on loss of part of the servers (half ? third ?) and critical on loss of more than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example :
signal = data('gauge.zk_service_health', filter=filter('env', 'preprod') and filter('sfx_monitored', 'true')).mean(by=['plugin_instance']).publish('signal')
detect(when(signal < 0.66, lasting='5m', at_least=1)).publish('CRIT')
detect(when(signal < 1, lasting='5m', at_least=1)).publish('MAJ')```
| signal: | ||
| metric: "gauge.zk_avg_latency" | ||
| rules: | ||
| critical: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that a high latency on one server should trigger a critical alter.
Maybe 2 detectors:
- one that trigger major and critical if all servers in a cluster have high latency
- one that trigger major for a single server high latency
|
Work in progress, still needs some cleanup, but the detector was split as recommended, and behavior is has expected |
|
Cleanup's details done. Please check our last changes and tell us if all is right now |
|
We also split server-health detector : one critical for cluster and one major for single server |
|
Hello, |
| @@ -0,0 +1,14 @@ | |||
| module: zookeeper | |||
| name: server-health | |||
| disabled: false | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, this is the default value
| @@ -0,0 +1,14 @@ | |||
| module: zookeeper | |||
| name: cluster-health | |||
| disabled: false | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, this is the default value
| module: zookeeper | ||
| name: server-latency | ||
| aggregation: false | ||
| disabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, this is the default value
| module: zookeeper | ||
| name: cluster-latency | ||
| aggregation: ".mean(by=['kubernetes_cluster'])" | ||
| disabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, this is the default value
| comparator: ">" | ||
| description: "Zookeeper cluster latency is too high" | ||
| lasting_duration: "5m" | ||
| latency_disabled: "false" No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latency_disabled variable does not exist, I think this line should be deleted
| comparator: ">" | ||
| description: "Zookeeper server latency is too high" | ||
| lasting_duration: "5m" | ||
| latency_disabled: "false" No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latency_disabled variable does not exist, I think this line should be deleted
| comparator: "==" | ||
| description: "Zookeeper cluster is not running" | ||
| lasting_duration: "5m" | ||
| health_disabled: "false" No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health_disabled variable does not exist, I think this line should be deleted
| comparator: "!=" | ||
| description: "Zookeeper server is not running" | ||
| lasting_duration: "5m" | ||
| health_disabled: "false" No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health_disabled variable does not exist, I think this line should be deleted
|
Hello, |
|
Hello, |
|
Any update please ? |
Add zokeeper-health and zookeeper-latency parameters