-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add monitoring support for managed EKS WC #2832
Comments
Started by upgrading the I now have a cluster running on grizzly |
Monitoring:
🟡 Scraping targets: 🟡 Alert status in Prometheus |
Status UpdateMonitoring: 🔴 Scraping targets:
🟡 Alerting |
Related giantswarm/default-apps-eks#10 |
Status Update🟠 Scraping targets:
🔴 Prometheus-operator and Kube-state-metrics are unstable 🔴 Alerting
|
Status Update🟢 Prometheus operator and kube-state-metrics keep getting evicted by VPA because the recommender is not able to get metrics from metrics server. Relevant PM: https://github.com/giantswarm/giantswarm/issues/28252 🔴 Alerting |
What to do about about metrics that rely on etcd_kubernetes_resources_count? |
they will have to go away as we do not manage ETCD anymore in this scenario @QuentinBisson |
I know @T-Kukawka but I'm not sure how those could be replaced. |
ah i see :( yeah then we have to align with BigMac and Shield how else this could be monitored or is relevant even |
True but then if we can replace those 2 alerts with something else, we can probably remove the |
true, i believe it should be removed for EKS at least ( we still use it in CAPI and Vintage especially with incidents monitoring etc when ETCD is overflown) |
oh sure :) |
🟢 Scraping targets:
Once it makes it into default-apps-eks, we can release and have everything running. 🟢 Created issues to get rid of etcd |
🟢 Alerts are green. The pending/firing alerts are related to either this PM https://github.com/giantswarm/giantswarm/issues/28252 or https://github.com/giantswarm/giantswarm/issues/27558 |
@TheoBrigitte We're all done for now, this is now blocked because of the 2 issues linked #2832 (comment). I added them at the top as well |
@TheoBrigitte do we still need this issue as we ensured monitoring is working and the issues exists for other teams? |
All our issues are fixed, the rest is distributed to teams closing |
Create a managed EKS cluster from a EKS based CAPA MC (
girzzly
andgolem
) using the following guide.Add support for managed EKS WC in our monitoring and alerting infrastructure.
Checks
Related issues:
The text was updated successfully, but these errors were encountered: