-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Track the total kubelet metrics outage durations with autodl framework #30593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Scheduling required tests: |
|
/label acknowledge-critical-fixes-only |
|
@dgoodwin: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Couldn't repro in the PR but the files are there. |
Only generate metrics down intervals if they do not overlap with node reboots or updates. Sum the total time we were in metrics endpoint down on any node with a new generic monitortest for this purpose. Also sum high cpu intervals. This will allow us to track if we're making things better with changes and compare to past releases.
aa24d89 to
c3bdb3e
Compare
|
/retest |
|
/pipeline required |
|
Scheduling required tests: |
|
@dgoodwin: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: c3bdb3e
|
This is being done to track if we get better or worse and compare to past releases, will be backporting.
Also stopped generating metric endpoint down intervals if they overlap with node reboots. This should allow for more accurate tracking of this total.