Skip to content

Commit

Permalink
UPGRADE: Metric Monitoring Stack (#711)
Browse files Browse the repository at this point in the history
* Upgrade metric monitoring stack
* Remove temporary fix deploying custom Grafana dashboards to replace some w/Angular components
* Disable default deployment of Node Exporter on Mac and AIX
* Bump data source pluging, update log-enabled dashboards and logging datasource definition
  • Loading branch information
gsmith-sas authored Feb 3, 2025
1 parent 36333f8 commit 4446df1
Show file tree
Hide file tree
Showing 12 changed files with 75 additions and 7,626 deletions.
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# SAS Viya Monitoring for Kubernetes

## unreleased
* **Metrics**
* [CHANGE] Removed temporary fix (added w/1.2.27) replacing a small number of Grafana dashboards
inherited from the Kube-Prometheus Stack Helm chart that did not work with Grafana 11.x due to Angular
migration or other issues. Current version of these dashboards all appear to be compatible with Grafana 11.x.
* [CHANGE] Node Exporter deployment onto Mac and AIX nodes disabled by default. This also suppresses
deployment of the (generally unused) `Node Exporter/MacOS` dashboard within Grafana.
* [UPGRADE] Kube-Prometheus Stack Helm chart has been upgraded from 62.7.0 to 68.3.0.
* [UPGRADE] Grafana Helm Chart (for OpenShift deployments) has been upgraded from 8.5.1 to 8.8.4.
* [UPGRADE] Prometheus Pushgateway Helm chart has been upgraded from 2.14.0 to 2.17.0.
* [UPGRADE] Alertmanager has been upgraded from 0.27.0 to 0.28.0.
* [UPGRADE] The config-reloader has been upgraded from 0.76.1 to 0.79.2.
* [UPGRADE] Grafana has been upgraded from 11.2.0 to 11.4.0.
* [UPGRADE] The k8s-sidecar has been upgraded from 1.27.4 to 1.28.0.
* [UPGRADE] Kube-State-Metrics has been upgraded from 2.13.0 to 2.14.0.
* [UPGRADE] Prometheus has been upgraded from 2.54.1 to 3.1.0.
* [UPGRADE] Prometheus Operator has been upgraded from 0.76.1 to 0.79.2.
* [UPGRADE] Prometheus Pushgateway has been upgraded from 1.9.0 to 1.11.0.



## Version 1.2.33 (14JAN2025)
* **Logging**
* [SECURITY] Fluent Bit log collecting pods no longer run as `root` user. In addition, the database used to
Expand Down
26 changes: 13 additions & 13 deletions component_versions.env
Original file line number Diff line number Diff line change
Expand Up @@ -37,34 +37,34 @@ OSD_FULL_IMAGE="docker.io/opensearchproject/opensearch-dashboards:2.17.1"
#Grafana (when deployed on OpenShift)
OPENSHIFT_GRAFANA_CHART_REPO=grafana
OPENSHIFT_GRAFANA_CHART_NAME=grafana
OPENSHIFT_GRAFANA_CHART_VERSION=8.5.1
OPENSHIFT_GRAFANA_CHART_VERSION=8.8.4
OPENSHIFT_OAUTHPROXY_FULL_IMAGE="registry.redhat.io/openshift4/ose-oauth-proxy:latest"

#Grafana (everywhere)
GRAFANA_FULL_IMAGE="docker.io/grafana/grafana:11.2.0"
GRAFANA_SIDECAR_FULL_IMAGE="quay.io/kiwigrid/k8s-sidecar:1.27.4"
GRAFANA_DATASOURCE_PLUGIN_VERSION="2.21.1"
GRAFANA_FULL_IMAGE="docker.io/grafana/grafana:11.4.0"
GRAFANA_SIDECAR_FULL_IMAGE="quay.io/kiwigrid/k8s-sidecar:1.28.0"
GRAFANA_DATASOURCE_PLUGIN_VERSION="2.22.3"

#Kube-Prometheus Stack
KUBE_PROM_STACK_CHART_REPO=prometheus-community
KUBE_PROM_STACK_CHART_NAME=kube-prometheus-stack
KUBE_PROM_STACK_CHART_VERSION=62.7.0
ALERTMANAGER_FULL_IMAGE="quay.io/prometheus/alertmanager:v0.27.0"
KUBE_PROM_STACK_CHART_VERSION=68.3.0
ALERTMANAGER_FULL_IMAGE="quay.io/prometheus/alertmanager:v0.28.0"
ADMWEBHOOK_FULL_IMAGE="registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6"
KSM_FULL_IMAGE="registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0"
KSM_FULL_IMAGE="registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0"
NODEXPORT_FULL_IMAGE="quay.io/prometheus/node-exporter:v1.8.2"
PROMETHEUS_FULL_IMAGE="quay.io/prometheus/prometheus:v2.54.1"
PROMOP_FULL_IMAGE="quay.io/prometheus-operator/prometheus-operator:v0.76.1"
CONFIGRELOAD_FULL_IMAGE="quay.io/prometheus-operator/prometheus-config-reloader:v0.76.1"
PROMETHEUS_FULL_IMAGE="quay.io/prometheus/prometheus:v3.1.0"
PROMOP_FULL_IMAGE="quay.io/prometheus-operator/prometheus-operator:v0.79.2"
CONFIGRELOAD_FULL_IMAGE="quay.io/prometheus-operator/prometheus-config-reloader:v0.79.2"

#Pushgateway
PUSHGATEWAY_CHART_REPO=prometheus-community
PUSHGATEWAY_CHART_NAME=prometheus-pushgateway
PUSHGATEWAY_CHART_VERSION=2.14.0
PUSHGATEWAY_FULL_IMAGE="quay.io/prometheus/pushgateway:v1.9.0"
PUSHGATEWAY_CHART_VERSION=2.17.0
PUSHGATEWAY_FULL_IMAGE="quay.io/prometheus/pushgateway:v1.11.0"

#Prometheus Operator CRD
PROM_OPERATOR_CRD_VERSION=v0.76.1
PROM_OPERATOR_CRD_VERSION=v0.79.2

#Tempo
TEMPO_CHART_REPO=grafana
Expand Down
21 changes: 0 additions & 21 deletions monitoring/bin/deploy_monitoring_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -326,27 +326,6 @@ fi
echo ""
monitoring/bin/deploy_dashboards.sh

# 01JUL24 Temporary Fix
# Some Grafana dashboards inherited from the Kube-Prometheus Stack Helm
# chart do not work with Grafana 11 due to Angular migration or other
# issues. As a **temporary** fix, we will remove these dashboards and
# replace them with our versions of them. This fix will be removed
# when these issues have been resolved.
V4M_TEMP_REPLACE_PROBLEMATIC_MIXIN_DASHBOARDS="${V4M_TEMP_REPLACE_PROBLEMATIC_MIXIN_DASHBOARDS:-true}"
if [ "$V4M_TEMP_REPLACE_PROBLEMATIC_MIXIN_DASHBOARDS" == "true" ]; then
log_info "Replacing some Kube-Prometheus Stack-supplied Grafana dashboards with our own versions due to incompatabilities."

# remove configMaps definining exising Grafana dashboards
kubectl -n $MON_NS delete configmap v4m-cluster-total --ignore-not-found
kubectl -n $MON_NS delete configmap v4m-namespace-by-pod --ignore-not-found
kubectl -n $MON_NS delete configmap v4m-namespace-by-workload --ignore-not-found
kubectl -n $MON_NS delete configmap v4m-prometheus --ignore-not-found

# deploy our versions of these dashboards
monitoring/bin/deploy_dashboards.sh monitoring/dashboards/mixinfixes

fi

set +e
# call function to get HTTP/HTTPS ports from ingress controller
get_ingress_ports
Expand Down
Loading

0 comments on commit 4446df1

Please sign in to comment.