[release-4.19] OCPBUGS-62670: Fix EgressIP stale GARP post reboot + pod restart #2774

martinkennelly · 2025-10-02T12:08:22Z

/hold

Depends on #2767

Currently, we are force exiting with the trap before the background processes can end, container is removed and the orphaned processes end early causing our config to go into an unknown state because we dont end in an orderly manner. Wait until the pid file for ovnkube controller with node is removed which shows the process has completed. Signed-off-by: Martin Kennelly <[email protected]> (cherry picked from commit 8b29419) (cherry picked from commit d65ec5c)

Prevent ovn-controller from sending stale GARP by adding drop flows on external bridge patch ports until ovnkube-controller synchronizes the southbound database - henceforth known as "drop flows". This addresses race conditions where ovn-controller processes outdated SB DB state before ovnkube-controller updates it, particularly affecting EIP SNAT configurations attached to logical router ports. Fixes: https://issues.redhat.com/browse/FDP-1537 ovnkube-controller controls the lifecycle of the drop flows. ovs / ovn-controller running is required to configure external bridge. Downstream, the external bridge maybe precreated and ovn-controller will use this. This fix considers three primary scenarios: node, container and pod restart. On Node restart means the ovs flows installed priotior to reboot on the node are cleared but the external bridge exists. Add the flows before ovnkube controller with node starts. The reason to add it here is that our gateway code depends on ovn-controller started and running... There is now a race here between ovn-controller starting (and garping) before we set this flow but I think the risk is low however it needs serious testing. The reason I did not naturally at the drop flows before ovn-controller started is because I have no way to detect if its a node reboot or pod reboot and i dont want to inject drop flows for simple ovn-controller container restart which could disrupt traffic. ovnkube-controller starts, we create a new gateway and apply flows the same flows in-order to ensure we always drop GARP when ovnkube controller hasn't sync. Remove the flows when ovnkube-controller has syncd. There is also a race here between ovnkube-controller removing the flows and ovn-controller GARPing with stale SB DB info. There is no easy way to detect what SB DB data ovn-controller has consumed. On Pod restart, we add the drop flows before exit. ovnkube-controller-with-node will also add it before it starts the go code. Container restart: - ovnkube-controller: adds flows upon start and exit - ovn-controller: no changes While the drop flows are set, OVN may not be able to resolve IPs it doesn't know about in its Logical Router pipelines generation. Following removal of the drop flows, OVN may resolve the IPs using GARP requests. OVN-Controller always sends out GARPs with op code 1 on startup. Signed-off-by: Martin Kennelly <[email protected]> (cherry picked from commit 82fc3bf) (cherry picked from commit 50a94e1)

PR 5373 to drop the GARP flows didnt consider that we set the default network controller and later we set the gateway obj. In-between this period, ovnkube node may receive a stop signal and we do not guard against accessing the gateway if its not yet set. OVNKube controller may have sync'd before the gateway obj is set. There is nothing to reconcile if the gateway is not set. Signed-off-by: Martin Kennelly <[email protected]> (cherry picked from commit e60220a) (cherry picked from commit a7869b2)

openshift-ci-robot · 2025-10-02T12:08:29Z

@martinkennelly: This pull request references Jira Issue OCPBUGS-62670, which is invalid:

expected dependent Jira Issue OCPBUGS-62671 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is New instead
expected dependent Jira Issue OCPBUGS-62671 to target a version in 4.20.0, but it targets "4.18.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/hold

Depends on #2767

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-02T12:32:15Z

@martinkennelly: This PR was included in a payload test run from openshift/machine-config-operator#5324
trigger 11 job(s) of type blocking for the nightly release of OCP 4.19

periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-upgrade-fips
periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a89d3170-9f8b-11f0-93fd-ee70f1d60e20-0

Ensure ovn-controller has processed the SB DB updates before removing the GARP drop flows by utilizing the hv_cfg field in NB_Global [1] OVNKube controller increments the nb_cfg value post sync, which is copied to SB DB by northd. OVN-Controllers copy this nb_cfg value from SB DB and write it to their chassis_private tables nb_cfg field after they have processed the SB DB changes. Northd will then look at all the chassis_private tables nb_cfg value and set the NB DBs Nb_global hv_cfg value to the min integer found. Since IC currently only supports one node per zone, we can be sure ovn-controller is running locally and therefore its ok to block removing the drop GARP flows. [1] https://man7.org/linux/man-pages/man5/ovn-nb.5.html Signed-off-by: Martin Kennelly <[email protected]> (cherry picked from commit 3b5da01) (cherry picked from commit a4776fb)

openshift-ci · 2025-10-08T11:56:36Z

@martinkennelly: This PR was included in a payload test run from openshift/machine-config-operator#5324
trigger 11 job(s) of type blocking for the nightly release of OCP 4.19

periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-upgrade-fips
periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c3464cd0-a43d-11f0-8c73-68fbceb6751c-0

martinkennelly · 2025-10-08T15:38:33Z

/test e2e-aws-ovn-windows

It failed during the installation phase. Unrelated to this. This fix doesnt run on windows.

time="2025-10-08T14:16:09Z" level=info msg="  Found ClusterServiceVersion \"openshift-windows-machine-config-operator/windows-machine-config-operator.v10.19.1\" phase: Installing"
E1008 14:20:59.180542     181 request.go:1075] Unexpected error when reading response body: context deadline exceeded
time="2025-10-08T14:20:59Z" level=fatal msg="Failed to run packagemanifests: error waiting for CSV to install: deployment windows-machine-config-operator has error: client rate limiter Wait returned an error: context deadline exceeded\n\n"

martinkennelly · 2025-10-08T17:08:01Z

/test 4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

Hit overall job timeout:

: Job run should complete before timeout expand_less	5h2m33s
{  {"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 5h0m0s timeout","severity":"error","time":"2025-10-08T16:55:46Z"}
}

martinkennelly · 2025-10-08T17:08:59Z

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Probably unrelated nmstate image pull issue:

[sig-arch] events should not repeat pathologically expand_less	0s
{  1 events happened too frequently

event happened 107 times, something is wrong: namespace/openshift-nmstate node/worker-2 pod/nmstate-console-plugin-5964f557cb-krqsm hmsg/d5bf9afefc - reason/Failed Error: ImagePullBackOff (16:13:34Z) result=reject }

jluhrsen · 2025-10-08T21:10:44Z

/test e2e-aws-ovn-windows

martinkennelly · 2025-10-09T15:27:07Z

/test e2e-aws-ovn-windows

Unrelated:

error: image "quay-proxy.ci.openshift.org/openshift/ci@sha256:e685e0585d33b380dccbfb5bc189ab233cff0d688221cda8a691d38c7d45fc4a" not found: manifest unknown: manifest unknown

martinkennelly · 2025-10-09T15:29:03Z

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Unrelated issues including the CNI add error because of mismatch in UID:

[sig-arch] events should not repeat pathologically expand_less	0s
{  1 events happened too frequently

event happened 108 times, something is wrong: namespace/openshift-nmstate node/worker-2 pod/nmstate-console-plugin-5964f557cb-v6kg6 hmsg/d5bf9afefc - reason/Failed Error: ImagePullBackOff (20:57:59Z) result=reject }
: [Unknown][invariant] alert/KubePodNotReady should not be at or above info in all the other namespaces expand_less	0s
{  KubePodNotReady was at or above info for at least 11m10s on platformidentification.JobType{Release:"4.19", FromRelease:"", Platform:"metal", Architecture:"amd64", Network:"ovn", Topology:"ha"} (maxAllowed=0s): pending for 1m56s, firing for 11m10s:

Oct 08 20:48:30.280 - 670s  W namespace/openshift-nmstate pod/nmstate-console-plugin-5964f557cb-v6kg6 alert/KubePodNotReady alertstate/firing severity/warning ALERTS{alertname="KubePodNotReady", alertstate="firing", namespace="openshift-nmstate", pod="nmstate-console-plugin-5964f557cb-v6kg6", prometheus="openshift-monitoring/k8s", severity="warning"}}
[open stdoutopen_in_new](https://prow.ci.openshift.org/spyglass/lens/junit/iframe?req=%7B%22artifacts%22%3A%5B%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fbaremetalds-e2e-test%2Fartifacts%2Fjunit%2Fe2e-monitor-tests__20251008-194249.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fbaremetalds-e2e-test%2Fartifacts%2Fjunit%2Fjunit_e2e__20251008-194249.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fbaremetalds-e2e-test%2Fartifacts%2Fjunit_image-mirroring.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fbaremetalds-e2e-test%2Fartifacts%2Fjunit_nodes.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fgather-extra%2Fartifacts%2Fjunit%2Fjunit_install_status.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fgather-extra%2Fartifacts%2Fjunit%2Fjunit_symptoms.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fgather-must-gather%2Fartifacts%2Fjunit_install.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fofcir-acquire%2Fartifacts%2Fjunit_metal_setup.xml%22%2C%22artifacts%2Fjunit_operator.xml%22%2C%22prowjob_junit.xml%22%5D%2C%22index%22%3A2%2C%22src%22%3A%22gs%2Ftest-platform-results%2Fpr-logs%2Fpull%2Fopenshift_ovn-kubernetes%2F2774%2Fpull-ci-openshift-ovn-kubernetes-release-4.19-e2e-metal-ipi-ovn-dualstack-bgp-local-gw%2F1975973097743847424%22%7D&topURL=https%3A//prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2774/pull-ci-openshift-ovn-kubernetes-release-4.19-e2e-metal-ipi-ovn-dualstack-bgp-local-gw/1975973097743847424&lensIndex=2#)
: [sig-network] pods should successfully create sandboxes by adding pod to network expand_less	0s
{  2 failures to create the sandbox

namespace/e2e-statefulset-6590 node/worker-1.ostest.test.metalkube.org pod/ss2-1 hmsg/1ec66151d1 - 58.97 seconds after deletion - firstTimestamp/2025-10-08T19:57:56Z interesting/true lastTimestamp/2025-10-08T19:57:56Z reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ss2-1_e2e-statefulset-6590_0646a45b-0660-41d5-9721-5f3742b7a18e_0(6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66): error adding pod e2e-statefulset-6590_ss2-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66" Netns:"/var/run/netns/b08be8ba-3464-46bf-8dac-af197d9ba7b9" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=e2e-statefulset-6590;K8S_POD_NAME=ss2-1;K8S_POD_INFRA_CONTAINER_ID=6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66;K8S_POD_UID=0646a45b-0660-41d5-9721-5f3742b7a18e" Path:"" ERRORED: error configuring pod [e2e-statefulset-6590/ss2-1] networking: [e2e-statefulset-6590/ss2-1/0646a45b-0660-41d5-9721-5f3742b7a18e:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-statefulset-6590/ss2-1 6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66 network default NAD default] [e2e-statefulset-6590/ss2-1 6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66 network default NAD default] pod deleted before sandbox ADD operation began. Request Pod UID 0646a45b-0660-41d5-9721-5f3742b7a18e is different from the Pod UID (a212bd99-3276-443b-bead-620c9df9cddc) retrieved from the informer/API
'
': StdinData: {"auxiliaryCNIChainName":"vendor-cni-chain","binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator,openshift-cnv","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}
namespace/e2e-statefulset-3799 node/worker-1.ostest.test.metalkube.org pod/ss3-2 hmsg/4363992b66 - 2.63 seconds after deletion - firstTimestamp/2025-10-08T19:49:39Z interesting/true lastTimestamp/2025-10-08T19:49:39Z reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ss3-2_e2e-statefulset-3799_677625ee-61a3-4eb5-bb94-b93308fa7c0f_0(390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b): error adding pod e2e-statefulset-3799_ss3-2 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b" Netns:"/var/run/netns/ec540669-a79c-45c3-a01a-580332ea7255" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=e2e-statefulset-3799;K8S_POD_NAME=ss3-2;K8S_POD_INFRA_CONTAINER_ID=390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b;K8S_POD_UID=677625ee-61a3-4eb5-bb94-b93308fa7c0f" Path:"" ERRORED: error configuring pod [e2e-statefulset-3799/ss3-2] networking: [e2e-statefulset-3799/ss3-2/677625ee-61a3-4eb5-bb94-b93308fa7c0f:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-statefulset-3799/ss3-2 390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b network default NAD default] [e2e-statefulset-3799/ss3-2 390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b network default NAD default] pod deleted before sandbox ADD operation began. Request Pod UID 677625ee-61a3-4eb5-bb94-b93308fa7c0f is different from the Pod UID (a2d5b9fe-934e-4a83-89bc-9254f4054544) retrieved from the informer/API
'
': StdinData: {"auxiliaryCNIChainName":"vendor-cni-chain","binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator,openshift-cnv","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}}

martinkennelly · 2025-10-09T15:34:14Z

Payload is looking good.

openshift-ci · 2025-10-09T17:16:42Z

@martinkennelly: This PR was included in a payload test run from openshift/machine-config-operator#5324
trigger 11 job(s) of type blocking for the nightly release of OCP 4.19

periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-upgrade-fips
periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/abaa8680-a533-11f0-9725-55c07ace1ad6-0

openshift-ci · 2025-10-09T17:16:57Z

@martinkennelly: This PR was included in a payload test run from openshift/machine-config-operator#5324
trigger 5 job(s) of type blocking for the ci release of OCP 4.19

periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-azure-ovn-upgrade
periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-upgrade
periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aks
periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b7685ab0-a533-11f0-95ec-ed831b73fdca-0

jechen0648 · 2025-10-09T19:45:58Z

/verified by 'pre-merge testing'

openshift-ci-robot · 2025-10-09T19:46:11Z

@jechen0648: This PR has been marked as verified by 'pre-merge testing'.

In response to this:

/verified by 'pre-merge testing'

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jluhrsen · 2025-10-09T20:14:14Z

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

jechen0648 · 2025-10-10T03:29:57Z

/retest

tssurya · 2025-10-13T10:53:48Z

Foregoing process since this is urgent escalation
Expectation is straight -X merges not cherry-picks moving forward for merged code into 4.20.

tssurya · 2025-10-13T10:54:57Z

/retest-required

martinkennelly · 2025-10-13T13:19:33Z

Still no bug for the k-nmsate-console image pull backoff seen on the bgp job. We are trying to find out whats wrong and therefore whos responsibile. Not clear. Its clear its unrelated to this PR but unsure whos problem. See the slack thread with art: https://redhat-internal.slack.com/archives/CJARLA942/p1760354263237459

That job seems to be using images from a QE source (enable-qe-catalogsource for operators) for this console and theres some issue.

jechen0648 · 2025-10-13T15:10:41Z

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

martinkennelly · 2025-10-13T15:37:03Z

@jechen0648 thanks jean but that job is borked for 4.19 for k-nmstate operator image - see previous comment.
Ive a PR to test the removal qe-sources (can be used for pre-release operators) which may not be needed:
openshift/release#70228
Only testing so far and if it works, ill get jaime to review as he added this job and may know why enable-qe-catalogsource was added.

martinkennelly · 2025-10-13T15:42:47Z

For latest comments on the supportability of that step we use in our CI job e2e-metal-ipi-ovn-dualstack-bgp-local-gw , see:

https://redhat-internal.slack.com/archives/CJARLA942/p1760365510760059?thread_ts=1760354263.237459&cid=CJARLA942

openshift-ci-robot · 2025-10-13T19:53:26Z

/retest-required

Remaining retests: 0 against base HEAD 48ed843 and 2 for PR HEAD f7c67b7 in total

tssurya · 2025-10-13T20:05:04Z

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

given the timeline of this escalation, going to override CI for BGP w/o a bug open. But this is clearly unrelated to this PR

openshift-ci · 2025-10-13T20:05:20Z

@tssurya: Overrode contexts on behalf of tssurya: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

given the timeline of this escalation, going to override CI for BGP w/o a bug open. But this is clearly unrelated to this PR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tssurya · 2025-10-13T20:07:28Z

/tide refresh

tssurya · 2025-10-13T20:07:50Z

/shrug

martinkennelly · 2025-10-14T08:40:35Z

/test e2e-aws-ovn-upgrade

Passes 99.6% of the time. No bug. Unrelated.

 [sig-ci] [Early] prow job name should match feature set [Suite:openshift/conformance/parallel] expand_less	3s
{  fail [github.com/openshift/origin/test/extended/util/client.go:332]: Unexpected error:
    <*errors.StatusError | 0xc0023e23c0>: 
    project.project.openshift.io "e2e-test-job-names-zc8d7" already exists
    {
        ErrStatus: 
            code: 409
            details:
              group: project.openshift.io
              kind: project
              name: e2e-test-job-names-zc8d7
            message: project.project.openshift.io "e2e-test-job-names-zc8d7" already exists
            metadata: {}
            reason: AlreadyExists
            status: Failure,
    }
occurred
Ginkgo exit error 1: exit with code 1}

martinkennelly · 2025-10-14T08:42:15Z

/test e2e-metal-ipi-ovn-dualstack

Unrelated. Passes 99% of the time. No bug.

[sig-auth][Feature:ProjectAPI] TestProjectWatch should succeed [apigroup:project.openshift.io][apigroup:authorization.openshift.io][apigroup:user.openshift.io] [Suite:openshift/conformance/parallel] expand_less
Run #0: Failed expand_less	5m21s
{  fail [github.com/openshift/origin/test/extended/project/project.go:239]: timeout: e2e-test-project-api-d2qql
Ginkgo exit error 1: exit with code 1}

martinkennelly · 2025-10-14T08:43:02Z

/tide refresh

martinkennelly · 2025-10-14T09:50:56Z

/test e2e-aws-ovn-upgrade

Unrelated. Failed to create a release image to test.

Create the release image "latest" containing all images built by this job

:)))

martinkennelly · 2025-10-14T10:44:09Z

Same error as previous comment for e2e-metal-ipi-ovn-dualstack . Seeing if release knows something about this. Its unrelated to my PR.

martinkennelly · 2025-10-14T10:47:13Z

Bug for dualstack-bgp-local-gw https://issues.redhat.com/browse/OCPBUGS-63027
Mat K from k-nmstate came up with a hack to remove the k-nmstate console from our job and move on.
The process we were using before isn't supported anymore and no one wants to dig into what happened since we are told to move to a new process. The bug is on us because of this. Theres a PR up to overcome it. See the bug comment.

martinkennelly · 2025-10-14T10:55:18Z

https://redhat-internal.slack.com/archives/CBN38N3MW/p1760439494041289

Asking test platform folks regarding the payload build errors.

martinkennelly · 2025-10-14T11:08:29Z

/test e2e-metal-ipi-ovn-dualstack

Test platforum team says its either image not present or what we pulled was corrupted.

tssurya · 2025-10-14T12:02:12Z

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

https://issues.redhat.com/browse/OCPBUGS-63027

openshift-ci · 2025-10-14T12:03:00Z

@tssurya: Overrode contexts on behalf of tssurya: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

https://issues.redhat.com/browse/OCPBUGS-63027

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-10-14T14:47:48Z

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-serial-ipsec	`2ac68e4`	link	false	`/test e2e-aws-ovn-serial-ipsec`
ci/prow/e2e-openstack-ovn	`2ac68e4`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-aws-ovn-single-node-techpreview	`2ac68e4`	link	false	`/test e2e-aws-ovn-single-node-techpreview`
ci/prow/e2e-aws-ovn-hypershift-kubevirt	`2ac68e4`	link	false	`/test e2e-aws-ovn-hypershift-kubevirt`
ci/prow/e2e-aws-ovn-techpreview	`2ac68e4`	link	false	`/test e2e-aws-ovn-techpreview`
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview	`2ac68e4`	link	false	`/test e2e-aws-ovn-hypershift-conformance-techpreview`
ci/prow/security	`f7c67b7`	link	false	`/test security`
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw	`f7c67b7`	link	true	`/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2025-10-14T14:51:01Z

@martinkennelly: Jira Issue OCPBUGS-62670: Some pull requests linked via external trackers have merged:

openshift/ovn-kubernetes#2774

The following pull request, linked via external tracker, has not merged:

openshift/machine-config-operator#5324 is open

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-62670 has not been moved to the MODIFIED state.

This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload.

In response to this:

/hold

Depends on #2767

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

martinkennelly added 3 commits October 2, 2025 13:04

openshift-ci-robot added the jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. label Oct 2, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2025

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 2, 2025

openshift-ci bot requested review from jcaamano and kyrtapz October 2, 2025 12:09

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 2, 2025

This was referenced Oct 2, 2025

[release-4.18] OCPBUGS-62671: Fix EgressIP stale GARP post reboot + pod restart #2775

Merged

OCPBUGS-62670: [release-4.19] Networking: reset ovn-remote config and allow ovnkube controller to set it openshift/machine-config-operator#5324

Merged

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 9, 2025

openshift-ci bot assigned Meina-rh, mffiedler, qiowang721 and rbbratta Oct 13, 2025

openshift-ci-robot mentioned this pull request Oct 13, 2025

[release-4.19] OCPBUGS-62670: Add drop flows for GARP openshift/cluster-network-operator#2809

Merged

openshift-ci bot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Oct 13, 2025

openshift-merge-bot bot merged commit 1c04cc3 into openshift:release-4.19 Oct 14, 2025
28 of 29 checks passed

[release-4.19] OCPBUGS-62670: Fix EgressIP stale GARP post reboot + pod restart #2774

[release-4.19] OCPBUGS-62670: Fix EgressIP stale GARP post reboot + pod restart #2774

Uh oh!

Conversation

martinkennelly commented Oct 2, 2025

Uh oh!

openshift-ci-robot commented Oct 2, 2025

Uh oh!

openshift-ci bot commented Oct 2, 2025

Uh oh!

openshift-ci bot commented Oct 8, 2025

Uh oh!

martinkennelly commented Oct 8, 2025

Uh oh!

martinkennelly commented Oct 8, 2025

Uh oh!

martinkennelly commented Oct 8, 2025

Uh oh!

jluhrsen commented Oct 8, 2025

Uh oh!

martinkennelly commented Oct 9, 2025

Uh oh!

martinkennelly commented Oct 9, 2025

Uh oh!

martinkennelly commented Oct 9, 2025

Uh oh!

openshift-ci bot commented Oct 9, 2025

Uh oh!

openshift-ci bot commented Oct 9, 2025

Uh oh!

jechen0648 commented Oct 9, 2025

Uh oh!

openshift-ci-robot commented Oct 9, 2025

Uh oh!

jluhrsen commented Oct 9, 2025

Uh oh!

jechen0648 commented Oct 10, 2025

Uh oh!

tssurya commented Oct 13, 2025

Uh oh!

tssurya commented Oct 13, 2025

Uh oh!

martinkennelly commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jechen0648 commented Oct 13, 2025

Uh oh!

martinkennelly commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinkennelly commented Oct 13, 2025

Uh oh!

openshift-ci-robot commented Oct 13, 2025

Uh oh!

tssurya commented Oct 13, 2025

Uh oh!

openshift-ci bot commented Oct 13, 2025

Uh oh!

tssurya commented Oct 13, 2025

Uh oh!

tssurya commented Oct 13, 2025

Uh oh!

martinkennelly commented Oct 14, 2025

Uh oh!

martinkennelly commented Oct 14, 2025

Uh oh!

martinkennelly commented Oct 14, 2025

Uh oh!

martinkennelly commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinkennelly commented Oct 14, 2025

Uh oh!

martinkennelly commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinkennelly commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinkennelly commented Oct 14, 2025

martinkennelly commented Oct 13, 2025 •

edited

Loading

martinkennelly commented Oct 13, 2025 •

edited

Loading

martinkennelly commented Oct 14, 2025 •

edited

Loading

martinkennelly commented Oct 14, 2025 •

edited

Loading

martinkennelly commented Oct 14, 2025 •

edited

Loading