Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPGRADE: Metric Monitoring Stack #711

Merged
merged 4 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# SAS Viya Monitoring for Kubernetes

## unreleased
* **Metrics**
* [CHANGE] Removed temporary fix (added w/1.2.27) replacing a small number of Grafana dashboards
inherited from the Kube-Prometheus Stack Helm chart that did not work with Grafana 11.x due to Angular
migration or other issues. Current version of these dashboards all appear to be compatible with Grafana 11.x.
* [CHANGE] Node Exporter deployment onto Mac and AIX nodes disabled by default. This also suppresses
deployment of the (generally unused) `Node Exporter/MacOS` dashboard within Grafana.
* [UPGRADE] Kube-Prometheus Stack Helm chart has been upgraded from 62.7.0 to 68.3.0.
* [UPGRADE] Grafana Helm Chart (for OpenShift deployments) has been upgraded from 8.5.1 to 8.8.4.
* [UPGRADE] Prometheus Pushgateway Helm chart has been upgraded from 2.14.0 to 2.17.0.
* [UPGRADE] Alertmanager has been upgraded from 0.27.0 to 0.28.0.
* [UPGRADE] The config-reloader has been upgraded from 0.76.1 to 0.79.2.
* [UPGRADE] Grafana has been upgraded from 11.2.0 to 11.4.0.
* [UPGRADE] The k8s-sidecar has been upgraded from 1.27.4 to 1.28.0.
* [UPGRADE] Kube-State-Metrics has been upgraded from 2.13.0 to 2.14.0.
* [UPGRADE] Prometheus has been upgraded from 2.54.1 to 3.1.0.
* [UPGRADE] Prometheus Operator has been upgraded from 0.76.1 to 0.79.2.
* [UPGRADE] Prometheus Pushgateway has been upgraded from 1.9.0 to 1.11.0.



## Version 1.2.33 (14JAN2025)
* **Logging**
* [SECURITY] Fluent Bit log collecting pods no longer run as `root` user. In addition, the database used to
Expand Down
26 changes: 13 additions & 13 deletions component_versions.env
Original file line number Diff line number Diff line change
Expand Up @@ -37,34 +37,34 @@ OSD_FULL_IMAGE="docker.io/opensearchproject/opensearch-dashboards:2.17.1"
#Grafana (when deployed on OpenShift)
OPENSHIFT_GRAFANA_CHART_REPO=grafana
OPENSHIFT_GRAFANA_CHART_NAME=grafana
OPENSHIFT_GRAFANA_CHART_VERSION=8.5.1
OPENSHIFT_GRAFANA_CHART_VERSION=8.8.4
OPENSHIFT_OAUTHPROXY_FULL_IMAGE="registry.redhat.io/openshift4/ose-oauth-proxy:latest"

#Grafana (everywhere)
GRAFANA_FULL_IMAGE="docker.io/grafana/grafana:11.2.0"
GRAFANA_SIDECAR_FULL_IMAGE="quay.io/kiwigrid/k8s-sidecar:1.27.4"
GRAFANA_DATASOURCE_PLUGIN_VERSION="2.21.1"
GRAFANA_FULL_IMAGE="docker.io/grafana/grafana:11.4.0"
GRAFANA_SIDECAR_FULL_IMAGE="quay.io/kiwigrid/k8s-sidecar:1.28.0"
GRAFANA_DATASOURCE_PLUGIN_VERSION="2.22.3"

#Kube-Prometheus Stack
KUBE_PROM_STACK_CHART_REPO=prometheus-community
KUBE_PROM_STACK_CHART_NAME=kube-prometheus-stack
KUBE_PROM_STACK_CHART_VERSION=62.7.0
ALERTMANAGER_FULL_IMAGE="quay.io/prometheus/alertmanager:v0.27.0"
KUBE_PROM_STACK_CHART_VERSION=68.3.0
ALERTMANAGER_FULL_IMAGE="quay.io/prometheus/alertmanager:v0.28.0"
ADMWEBHOOK_FULL_IMAGE="registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6"
KSM_FULL_IMAGE="registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0"
KSM_FULL_IMAGE="registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0"
NODEXPORT_FULL_IMAGE="quay.io/prometheus/node-exporter:v1.8.2"
PROMETHEUS_FULL_IMAGE="quay.io/prometheus/prometheus:v2.54.1"
PROMOP_FULL_IMAGE="quay.io/prometheus-operator/prometheus-operator:v0.76.1"
CONFIGRELOAD_FULL_IMAGE="quay.io/prometheus-operator/prometheus-config-reloader:v0.76.1"
PROMETHEUS_FULL_IMAGE="quay.io/prometheus/prometheus:v3.1.0"
PROMOP_FULL_IMAGE="quay.io/prometheus-operator/prometheus-operator:v0.79.2"
CONFIGRELOAD_FULL_IMAGE="quay.io/prometheus-operator/prometheus-config-reloader:v0.79.2"

#Pushgateway
PUSHGATEWAY_CHART_REPO=prometheus-community
PUSHGATEWAY_CHART_NAME=prometheus-pushgateway
PUSHGATEWAY_CHART_VERSION=2.14.0
PUSHGATEWAY_FULL_IMAGE="quay.io/prometheus/pushgateway:v1.9.0"
PUSHGATEWAY_CHART_VERSION=2.17.0
PUSHGATEWAY_FULL_IMAGE="quay.io/prometheus/pushgateway:v1.11.0"

#Prometheus Operator CRD
PROM_OPERATOR_CRD_VERSION=v0.76.1
PROM_OPERATOR_CRD_VERSION=v0.79.2

#Tempo
TEMPO_CHART_REPO=grafana
Expand Down
21 changes: 0 additions & 21 deletions monitoring/bin/deploy_monitoring_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -326,27 +326,6 @@ fi
echo ""
monitoring/bin/deploy_dashboards.sh

# 01JUL24 Temporary Fix
# Some Grafana dashboards inherited from the Kube-Prometheus Stack Helm
# chart do not work with Grafana 11 due to Angular migration or other
# issues. As a **temporary** fix, we will remove these dashboards and
# replace them with our versions of them. This fix will be removed
# when these issues have been resolved.
V4M_TEMP_REPLACE_PROBLEMATIC_MIXIN_DASHBOARDS="${V4M_TEMP_REPLACE_PROBLEMATIC_MIXIN_DASHBOARDS:-true}"
if [ "$V4M_TEMP_REPLACE_PROBLEMATIC_MIXIN_DASHBOARDS" == "true" ]; then
log_info "Replacing some Kube-Prometheus Stack-supplied Grafana dashboards with our own versions due to incompatabilities."

# remove configMaps definining exising Grafana dashboards
kubectl -n $MON_NS delete configmap v4m-cluster-total --ignore-not-found
kubectl -n $MON_NS delete configmap v4m-namespace-by-pod --ignore-not-found
kubectl -n $MON_NS delete configmap v4m-namespace-by-workload --ignore-not-found
kubectl -n $MON_NS delete configmap v4m-prometheus --ignore-not-found

# deploy our versions of these dashboards
monitoring/bin/deploy_dashboards.sh monitoring/dashboards/mixinfixes

fi

set +e
# call function to get HTTP/HTTPS ports from ingress controller
get_ingress_ports
Expand Down
Loading
Loading