Skip to content

Conversation

@martinkennelly
Copy link
Contributor

/hold

Depends on #2767

Currently, we are force exiting with the trap before the background
processes can end, container is removed and the orphaned processes
end early causing our config to go into an unknown state because we
dont end in an orderly manner.

Wait until the pid file for ovnkube controller with node is removed
which shows the process has completed.

Signed-off-by: Martin Kennelly <[email protected]>
(cherry picked from commit 8b29419)
(cherry picked from commit d65ec5c)
Prevent ovn-controller from sending stale GARP by adding
drop flows on external bridge patch ports
until ovnkube-controller synchronizes the southbound database - henceforth
known as "drop flows".

This addresses race conditions where ovn-controller processes outdated
SB DB state before ovnkube-controller updates it, particularly affecting
EIP SNAT configurations attached to logical router ports.
Fixes: https://issues.redhat.com/browse/FDP-1537

ovnkube-controller controls the lifecycle of the drop flows.
ovs / ovn-controller running is required to configure external bridge.
Downstream, the external bridge maybe precreated and ovn-controller
will use this.

This fix considers three primary scenarios: node, container and pod restart.

On Node restart means the ovs flows installed priotior to reboot on the node are
cleared but the external bridge exists. Add the flows before ovnkube controller
with node starts. The reason to add it here is that our gateway code depends
on ovn-controller started and running...
There is now a race here between ovn-controller starting
(and garping) before we set this flow but I think the risk is low however
it needs serious testing. The reason I did not naturally at the drop
flows before ovn-controller started is because I have no way to detect
if its a node reboot or pod reboot and i dont want to inject drop flows
for simple ovn-controller container restart which could disrupt traffic.
ovnkube-controller starts, we create a new gateway and apply flows the same
flows in-order to ensure we always drop GARP when ovnkube controller
hasn't sync.
Remove the flows when ovnkube-controller has syncd. There is also a race here
between ovnkube-controller removing the flows and ovn-controller GARPing with
stale SB DB info. There is no easy way to detect what SB DB data ovn-controller
has consumed.

On Pod restart, we add the drop flows before exit. ovnkube-controller-with-node
will also add it before it starts the go code.

Container restart:
- ovnkube-controller: adds flows upon start and exit
- ovn-controller: no changes

While the drop flows are set, OVN may not be able to resolve IPs
it doesn't know about in its Logical Router pipelines generation. Following
removal of the drop flows, OVN may resolve the IPs using GARP requests.

OVN-Controller always sends out GARPs with op code 1
on startup.

Signed-off-by: Martin Kennelly <[email protected]>
(cherry picked from commit 82fc3bf)
(cherry picked from commit 50a94e1)
PR 5373 to drop the GARP flows didnt consider that we
set the default network controller and later we set
the gateway obj. In-between this period, ovnkube node
may receive a stop signal and we do not guard against
accessing the gateway if its not yet set.

OVNKube controller may have sync'd before the gateway
obj is set.

There is nothing to reconcile if the gateway is not set.

Signed-off-by: Martin Kennelly <[email protected]>
(cherry picked from commit e60220a)
(cherry picked from commit a7869b2)
@openshift-ci-robot openshift-ci-robot added the jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. label Oct 2, 2025
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 2, 2025
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: This pull request references Jira Issue OCPBUGS-62670, which is invalid:

  • expected dependent Jira Issue OCPBUGS-62671 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is New instead
  • expected dependent Jira Issue OCPBUGS-62671 to target a version in 4.20.0, but it targets "4.18.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/hold

Depends on #2767

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 2, 2025

@martinkennelly: This PR was included in a payload test run from openshift/machine-config-operator#5324
trigger 11 job(s) of type blocking for the nightly release of OCP 4.19

  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a89d3170-9f8b-11f0-93fd-ee70f1d60e20-0

Ensure ovn-controller has processed the SB DB updates before
removing the GARP drop flows by utilizing the hv_cfg field
in NB_Global [1]

OVNKube controller increments the nb_cfg value post sync, which is copied
to SB DB by northd. OVN-Controllers copy this nb_cfg value from SB DB
and write it to their chassis_private tables nb_cfg field after
they have processed the SB DB changes. Northd will then look
at all the chassis_private tables nb_cfg value and set the
NB DBs Nb_global hv_cfg value to the min integer found.

Since IC currently only supports one node per zone, we
can be sure ovn-controller is running locally and therefore
its ok to block removing the drop GARP flows.

[1] https://man7.org/linux/man-pages/man5/ovn-nb.5.html

Signed-off-by: Martin Kennelly <[email protected]>
(cherry picked from commit 3b5da01)
(cherry picked from commit a4776fb)
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 8, 2025

@martinkennelly: This PR was included in a payload test run from openshift/machine-config-operator#5324
trigger 11 job(s) of type blocking for the nightly release of OCP 4.19

  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c3464cd0-a43d-11f0-8c73-68fbceb6751c-0

@martinkennelly
Copy link
Contributor Author

/test e2e-aws-ovn-windows

It failed during the installation phase. Unrelated to this. This fix doesnt run on windows.

time="2025-10-08T14:16:09Z" level=info msg="  Found ClusterServiceVersion \"openshift-windows-machine-config-operator/windows-machine-config-operator.v10.19.1\" phase: Installing"
E1008 14:20:59.180542     181 request.go:1075] Unexpected error when reading response body: context deadline exceeded
time="2025-10-08T14:20:59Z" level=fatal msg="Failed to run packagemanifests: error waiting for CSV to install: deployment windows-machine-config-operator has error: client rate limiter Wait returned an error: context deadline exceeded\n\n"

@martinkennelly
Copy link
Contributor Author

/test 4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

Hit overall job timeout:

: Job run should complete before timeout expand_less	5h2m33s
{  {"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 5h0m0s timeout","severity":"error","time":"2025-10-08T16:55:46Z"}
}

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Probably unrelated nmstate image pull issue:

[sig-arch] events should not repeat pathologically expand_less	0s
{  1 events happened too frequently

event happened 107 times, something is wrong: namespace/openshift-nmstate node/worker-2 pod/nmstate-console-plugin-5964f557cb-krqsm hmsg/d5bf9afefc - reason/Failed Error: ImagePullBackOff (16:13:34Z) result=reject }

@jluhrsen
Copy link
Contributor

jluhrsen commented Oct 8, 2025

/test e2e-aws-ovn-windows

@martinkennelly
Copy link
Contributor Author

/test e2e-aws-ovn-windows

Unrelated:

error: image "quay-proxy.ci.openshift.org/openshift/ci@sha256:e685e0585d33b380dccbfb5bc189ab233cff0d688221cda8a691d38c7d45fc4a" not found: manifest unknown: manifest unknown

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Unrelated issues including the CNI add error because of mismatch in UID:

[sig-arch] events should not repeat pathologically expand_less	0s
{  1 events happened too frequently

event happened 108 times, something is wrong: namespace/openshift-nmstate node/worker-2 pod/nmstate-console-plugin-5964f557cb-v6kg6 hmsg/d5bf9afefc - reason/Failed Error: ImagePullBackOff (20:57:59Z) result=reject }
: [Unknown][invariant] alert/KubePodNotReady should not be at or above info in all the other namespaces expand_less	0s
{  KubePodNotReady was at or above info for at least 11m10s on platformidentification.JobType{Release:"4.19", FromRelease:"", Platform:"metal", Architecture:"amd64", Network:"ovn", Topology:"ha"} (maxAllowed=0s): pending for 1m56s, firing for 11m10s:

Oct 08 20:48:30.280 - 670s  W namespace/openshift-nmstate pod/nmstate-console-plugin-5964f557cb-v6kg6 alert/KubePodNotReady alertstate/firing severity/warning ALERTS{alertname="KubePodNotReady", alertstate="firing", namespace="openshift-nmstate", pod="nmstate-console-plugin-5964f557cb-v6kg6", prometheus="openshift-monitoring/k8s", severity="warning"}}
[open stdoutopen_in_new](https://prow.ci.openshift.org/spyglass/lens/junit/iframe?req=%7B%22artifacts%22%3A%5B%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fbaremetalds-e2e-test%2Fartifacts%2Fjunit%2Fe2e-monitor-tests__20251008-194249.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fbaremetalds-e2e-test%2Fartifacts%2Fjunit%2Fjunit_e2e__20251008-194249.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fbaremetalds-e2e-test%2Fartifacts%2Fjunit_image-mirroring.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fbaremetalds-e2e-test%2Fartifacts%2Fjunit_nodes.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fgather-extra%2Fartifacts%2Fjunit%2Fjunit_install_status.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fgather-extra%2Fartifacts%2Fjunit%2Fjunit_symptoms.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fgather-must-gather%2Fartifacts%2Fjunit_install.xml%22%2C%22artifacts%2Fe2e-metal-ipi-ovn-dualstack-bgp-local-gw%2Fofcir-acquire%2Fartifacts%2Fjunit_metal_setup.xml%22%2C%22artifacts%2Fjunit_operator.xml%22%2C%22prowjob_junit.xml%22%5D%2C%22index%22%3A2%2C%22src%22%3A%22gs%2Ftest-platform-results%2Fpr-logs%2Fpull%2Fopenshift_ovn-kubernetes%2F2774%2Fpull-ci-openshift-ovn-kubernetes-release-4.19-e2e-metal-ipi-ovn-dualstack-bgp-local-gw%2F1975973097743847424%22%7D&topURL=https%3A//prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2774/pull-ci-openshift-ovn-kubernetes-release-4.19-e2e-metal-ipi-ovn-dualstack-bgp-local-gw/1975973097743847424&lensIndex=2#)
: [sig-network] pods should successfully create sandboxes by adding pod to network expand_less	0s
{  2 failures to create the sandbox

namespace/e2e-statefulset-6590 node/worker-1.ostest.test.metalkube.org pod/ss2-1 hmsg/1ec66151d1 - 58.97 seconds after deletion - firstTimestamp/2025-10-08T19:57:56Z interesting/true lastTimestamp/2025-10-08T19:57:56Z reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ss2-1_e2e-statefulset-6590_0646a45b-0660-41d5-9721-5f3742b7a18e_0(6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66): error adding pod e2e-statefulset-6590_ss2-1 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66" Netns:"/var/run/netns/b08be8ba-3464-46bf-8dac-af197d9ba7b9" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=e2e-statefulset-6590;K8S_POD_NAME=ss2-1;K8S_POD_INFRA_CONTAINER_ID=6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66;K8S_POD_UID=0646a45b-0660-41d5-9721-5f3742b7a18e" Path:"" ERRORED: error configuring pod [e2e-statefulset-6590/ss2-1] networking: [e2e-statefulset-6590/ss2-1/0646a45b-0660-41d5-9721-5f3742b7a18e:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-statefulset-6590/ss2-1 6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66 network default NAD default] [e2e-statefulset-6590/ss2-1 6bb402da740bbdcef790c15b93641e6f108cf4e753e52373eefc1b243112ef66 network default NAD default] pod deleted before sandbox ADD operation began. Request Pod UID 0646a45b-0660-41d5-9721-5f3742b7a18e is different from the Pod UID (a212bd99-3276-443b-bead-620c9df9cddc) retrieved from the informer/API
'
': StdinData: {"auxiliaryCNIChainName":"vendor-cni-chain","binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator,openshift-cnv","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}
namespace/e2e-statefulset-3799 node/worker-1.ostest.test.metalkube.org pod/ss3-2 hmsg/4363992b66 - 2.63 seconds after deletion - firstTimestamp/2025-10-08T19:49:39Z interesting/true lastTimestamp/2025-10-08T19:49:39Z reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ss3-2_e2e-statefulset-3799_677625ee-61a3-4eb5-bb94-b93308fa7c0f_0(390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b): error adding pod e2e-statefulset-3799_ss3-2 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b" Netns:"/var/run/netns/ec540669-a79c-45c3-a01a-580332ea7255" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=e2e-statefulset-3799;K8S_POD_NAME=ss3-2;K8S_POD_INFRA_CONTAINER_ID=390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b;K8S_POD_UID=677625ee-61a3-4eb5-bb94-b93308fa7c0f" Path:"" ERRORED: error configuring pod [e2e-statefulset-3799/ss3-2] networking: [e2e-statefulset-3799/ss3-2/677625ee-61a3-4eb5-bb94-b93308fa7c0f:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-statefulset-3799/ss3-2 390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b network default NAD default] [e2e-statefulset-3799/ss3-2 390a991332fef5d57b2fb79f5aeb366219939860ea46c3a75538a8ba36269f5b network default NAD default] pod deleted before sandbox ADD operation began. Request Pod UID 677625ee-61a3-4eb5-bb94-b93308fa7c0f is different from the Pod UID (a2d5b9fe-934e-4a83-89bc-9254f4054544) retrieved from the informer/API
'
': StdinData: {"auxiliaryCNIChainName":"vendor-cni-chain","binDir":"/var/lib/cni/bin","clusterNetwork":"/host/run/multus/cni/net.d/10-ovn-kubernetes.conf","cniVersion":"0.3.1","daemonSocketDir":"/run/multus/socket","globalNamespaces":"default,openshift-multus,openshift-sriov-network-operator,openshift-cnv","logLevel":"verbose","logToStderr":true,"name":"multus-cni-network","namespaceIsolation":true,"type":"multus-shim"}}

@martinkennelly
Copy link
Contributor Author

Payload is looking good.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

@martinkennelly: This PR was included in a payload test run from openshift/machine-config-operator#5324
trigger 11 job(s) of type blocking for the nightly release of OCP 4.19

  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-azure-aks-ovn-conformance
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/abaa8680-a533-11f0-9725-55c07ace1ad6-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

@martinkennelly: This PR was included in a payload test run from openshift/machine-config-operator#5324
trigger 5 job(s) of type blocking for the ci release of OCP 4.19

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aks
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b7685ab0-a533-11f0-95ec-ed831b73fdca-0

@jechen0648
Copy link
Contributor

/verified by 'pre-merge testing'

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 9, 2025
@openshift-ci-robot
Copy link
Contributor

@jechen0648: This PR has been marked as verified by 'pre-merge testing'.

In response to this:

/verified by 'pre-merge testing'

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jluhrsen
Copy link
Contributor

jluhrsen commented Oct 9, 2025

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

@jechen0648
Copy link
Contributor

/retest

@tssurya
Copy link
Contributor

tssurya commented Oct 13, 2025

Foregoing process since this is urgent escalation
Expectation is straight -X merges not cherry-picks moving forward for merged code into 4.20.

@tssurya
Copy link
Contributor

tssurya commented Oct 13, 2025

/retest-required

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Oct 13, 2025

Still no bug for the k-nmsate-console image pull backoff seen on the bgp job. We are trying to find out whats wrong and therefore whos responsibile. Not clear. Its clear its unrelated to this PR but unsure whos problem. See the slack thread with art: https://redhat-internal.slack.com/archives/CJARLA942/p1760354263237459

That job seems to be using images from a QE source (enable-qe-catalogsource for operators) for this console and theres some issue.

@jechen0648
Copy link
Contributor

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Oct 13, 2025

@jechen0648 thanks jean but that job is borked for 4.19 for k-nmstate operator image - see previous comment.
Ive a PR to test the removal qe-sources (can be used for pre-release operators) which may not be needed:
openshift/release#70228
Only testing so far and if it works, ill get jaime to review as he added this job and may know why enable-qe-catalogsource was added.

@martinkennelly
Copy link
Contributor Author

For latest comments on the supportability of that step we use in our CI job e2e-metal-ipi-ovn-dualstack-bgp-local-gw , see:

https://redhat-internal.slack.com/archives/CJARLA942/p1760365510760059?thread_ts=1760354263.237459&cid=CJARLA942

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 48ed843 and 2 for PR HEAD f7c67b7 in total

@tssurya
Copy link
Contributor

tssurya commented Oct 13, 2025

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

given the timeline of this escalation, going to override CI for BGP w/o a bug open. But this is clearly unrelated to this PR

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2025

@tssurya: Overrode contexts on behalf of tssurya: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

given the timeline of this escalation, going to override CI for BGP w/o a bug open. But this is clearly unrelated to this PR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tssurya
Copy link
Contributor

tssurya commented Oct 13, 2025

/tide refresh

@tssurya
Copy link
Contributor

tssurya commented Oct 13, 2025

/shrug

@openshift-ci openshift-ci bot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Oct 13, 2025
@martinkennelly
Copy link
Contributor Author

/test e2e-aws-ovn-upgrade

Passes 99.6% of the time. No bug. Unrelated.

 [sig-ci] [Early] prow job name should match feature set [Suite:openshift/conformance/parallel] expand_less	3s
{  fail [github.com/openshift/origin/test/extended/util/client.go:332]: Unexpected error:
    <*errors.StatusError | 0xc0023e23c0>: 
    project.project.openshift.io "e2e-test-job-names-zc8d7" already exists
    {
        ErrStatus: 
            code: 409
            details:
              group: project.openshift.io
              kind: project
              name: e2e-test-job-names-zc8d7
            message: project.project.openshift.io "e2e-test-job-names-zc8d7" already exists
            metadata: {}
            reason: AlreadyExists
            status: Failure,
    }
occurred
Ginkgo exit error 1: exit with code 1}

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack

Unrelated. Passes 99% of the time. No bug.

[sig-auth][Feature:ProjectAPI] TestProjectWatch should succeed [apigroup:project.openshift.io][apigroup:authorization.openshift.io][apigroup:user.openshift.io] [Suite:openshift/conformance/parallel] expand_less
Run #0: Failed expand_less	5m21s
{  fail [github.com/openshift/origin/test/extended/project/project.go:239]: timeout: e2e-test-project-api-d2qql
Ginkgo exit error 1: exit with code 1}

@martinkennelly
Copy link
Contributor Author

/tide refresh

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Oct 14, 2025

/test e2e-aws-ovn-upgrade

Unrelated. Failed to create a release image to test.

Create the release image "latest" containing all images built by this job 

:)))

@martinkennelly
Copy link
Contributor Author

Same error as previous comment for e2e-metal-ipi-ovn-dualstack . Seeing if release knows something about this. Its unrelated to my PR.

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Oct 14, 2025

Bug for dualstack-bgp-local-gw https://issues.redhat.com/browse/OCPBUGS-63027
Mat K from k-nmstate came up with a hack to remove the k-nmstate console from our job and move on.
The process we were using before isn't supported anymore and no one wants to dig into what happened since we are told to move to a new process. The bug is on us because of this. Theres a PR up to overcome it. See the bug comment.

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Oct 14, 2025

https://redhat-internal.slack.com/archives/CBN38N3MW/p1760439494041289

Asking test platform folks regarding the payload build errors.

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack

Test platforum team says its either image not present or what we pulled was corrupted.

@tssurya
Copy link
Contributor

tssurya commented Oct 14, 2025

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

https://issues.redhat.com/browse/OCPBUGS-63027

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 14, 2025

@tssurya: Overrode contexts on behalf of tssurya: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

In response to this:

/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw

https://issues.redhat.com/browse/OCPBUGS-63027

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 14, 2025

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial-ipsec 2ac68e4 link false /test e2e-aws-ovn-serial-ipsec
ci/prow/e2e-openstack-ovn 2ac68e4 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node-techpreview 2ac68e4 link false /test e2e-aws-ovn-single-node-techpreview
ci/prow/e2e-aws-ovn-hypershift-kubevirt 2ac68e4 link false /test e2e-aws-ovn-hypershift-kubevirt
ci/prow/e2e-aws-ovn-techpreview 2ac68e4 link false /test e2e-aws-ovn-techpreview
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview 2ac68e4 link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/security f7c67b7 link false /test security
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw f7c67b7 link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 1c04cc3 into openshift:release-4.19 Oct 14, 2025
28 of 29 checks passed
@openshift-ci-robot
Copy link
Contributor

@martinkennelly: Jira Issue OCPBUGS-62670: Some pull requests linked via external trackers have merged:

The following pull request, linked via external tracker, has not merged:

All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-62670 has not been moved to the MODIFIED state.

This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload.

In response to this:

/hold

Depends on #2767

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯

Projects

None yet

Development

Successfully merging this pull request may close these issues.