Track the total kubelet metrics outage durations with autodl framework #30593

dgoodwin · 2025-12-10T14:06:13Z

This is being done to track if we get better or worse and compare to past releases, will be backporting.

Also stopped generating metric endpoint down intervals if they overlap with node reboots. This should allow for more accurate tracking of this total.

openshift-ci-robot · 2025-12-10T14:06:16Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

openshift-ci · 2025-12-10T14:06:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgoodwin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-12-10T14:27:33Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

dgoodwin · 2025-12-10T20:15:21Z

/label acknowledge-critical-fixes-only
/verified by dgoodwin

openshift-ci-robot · 2025-12-10T20:15:32Z

@dgoodwin: This PR has been marked as verified by dgoodwin.

Details

In response to this:

/label acknowledge-critical-fixes-only
/verified by dgoodwin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

dgoodwin · 2025-12-10T20:15:35Z

Couldn't repro in the PR but the files are there.

Only generate metrics down intervals if they do not overlap with node reboots or updates. Sum the total time we were in metrics endpoint down on any node with a new generic monitortest for this purpose. Also sum high cpu intervals. This will allow us to track if we're making things better with changes and compare to past releases.

dgoodwin · 2025-12-18T13:12:17Z

/retest

dgoodwin · 2025-12-18T15:21:34Z

/pipeline required

openshift-ci-robot · 2025-12-18T15:21:38Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

openshift-ci · 2025-12-18T19:14:04Z

@dgoodwin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-vsphere-ovn-upi	`c3bdb3e`	link	true	`/test e2e-vsphere-ovn-upi`
ci/prow/e2e-gcp-ovn	`c3bdb3e`	link	true	`/test e2e-gcp-ovn`
ci/prow/e2e-vsphere-ovn	`c3bdb3e`	link	true	`/test e2e-vsphere-ovn`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-trt · 2025-12-18T19:57:52Z

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: c3bdb3e

"[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum cleanup" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum collection" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum interval construction" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum preparation" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum setup" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum test evaluation" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
"[Monitor:interval-duration-sum][Jira:"Test Framework"] monitor test interval-duration-sum writing to storage" [Total: 12, Pass: 12, Fail: 0, Flake: 0]

dgoodwin · 2025-12-19T14:53:53Z

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30593/pull-ci-openshift-origin-main-e2e-gcp-ovn/2001674236535508992/artifacts/e2e-gcp-ovn/openshift-e2e-test/artifacts/junit/interval-duration-sum_20251218-163442-autodl.json

Sample file.

openshift-ci bot requested review from deads2k and p0lyn0mial December 10, 2025 14:06

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 10, 2025

openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Dec 10, 2025

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Dec 10, 2025

dgoodwin force-pushed the kubelet-metrics-total-outage branch from aa24d89 to c3bdb3e Compare December 16, 2025 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Track the total kubelet metrics outage durations with autodl framework #30593

Track the total kubelet metrics outage durations with autodl framework #30593

dgoodwin commented Dec 10, 2025 •

edited

Loading

Uh oh!

openshift-ci-robot commented Dec 10, 2025

Uh oh!

openshift-ci bot commented Dec 10, 2025

Uh oh!

openshift-ci-robot commented Dec 10, 2025

Uh oh!

dgoodwin commented Dec 10, 2025

Uh oh!

openshift-ci-robot commented Dec 10, 2025

Uh oh!

dgoodwin commented Dec 10, 2025

Uh oh!

dgoodwin commented Dec 18, 2025

Uh oh!

dgoodwin commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

openshift-trt bot commented Dec 18, 2025

Uh oh!

dgoodwin commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Track the total kubelet metrics outage durations with autodl framework #30593

Are you sure you want to change the base?

Track the total kubelet metrics outage durations with autodl framework #30593

Conversation

dgoodwin commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Dec 10, 2025

Uh oh!

openshift-ci bot commented Dec 10, 2025

Uh oh!

openshift-ci-robot commented Dec 10, 2025

Uh oh!

dgoodwin commented Dec 10, 2025

Uh oh!

openshift-ci-robot commented Dec 10, 2025

Uh oh!

dgoodwin commented Dec 10, 2025

Uh oh!

dgoodwin commented Dec 18, 2025

Uh oh!

dgoodwin commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

openshift-trt bot commented Dec 18, 2025

Uh oh!

dgoodwin commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dgoodwin commented Dec 10, 2025 •

edited

Loading