Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump OVN to ovn-24.03.2-19 to fix multicast bug #4457

Merged
merged 3 commits into from
Jun 25, 2024

Conversation

ricky-rav
Copy link
Contributor

@ricky-rav ricky-rav commented Jun 19, 2024

Bumps OVN to 24.03.2-19, which reverts multicast-related commits that introduced a regression.
Extends the unit test to cover the scenario that was broken: add an additional receiver to the same node where the sender is.
https://issues.redhat.com/browse/OCPBUGS-34778
https://issues.redhat.com/browse/FDP-656

@ricky-rav ricky-rav requested a review from a team as a code owner June 19, 2024 13:05
@ricky-rav ricky-rav requested a review from maiqueb June 19, 2024 13:05
@coveralls
Copy link

Coverage Status

coverage: 52.76% (+0.01%) from 52.749%
when pulling 8594725 on ricky-rav:OCPBUGS-34778_upstream
into 9f1f3f2 on ovn-org:master.

@ricky-rav ricky-rav force-pushed the OCPBUGS-34778_upstream branch 4 times, most recently from 3e4ea4b to 9c4179a Compare June 20, 2024 12:35
@ricky-rav ricky-rav changed the title [WIP] Bump OVN to ovn-24.03.2-19.fc39 to fix multicast bug Bump OVN to ovn-24.03.2-19 to fix multicast bug Jun 20, 2024
@coveralls
Copy link

Coverage Status

Changes unknown
when pulling 9c4179a on ricky-rav:OCPBUGS-34778_upstream
into ** on ovn-org:master**.

martinkennelly
martinkennelly previously approved these changes Jun 20, 2024
tssurya
tssurya previously approved these changes Jun 20, 2024
Copy link
Member

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for CI

@tssurya
Copy link
Member

tssurya commented Jun 20, 2024

@qinqon oh both kv-migrations failed... looks like we need to look into this

@tssurya
Copy link
Member

tssurya commented Jun 20, 2024

@tssurya
Copy link
Member

tssurya commented Jun 20, 2024

2024-06-20T13:47:05.8175952Z �[0mMulticast �[0m�[1mshould be able to send multicast UDP traffic between nodes�[0m
2024-06-20T13:47:05.8177373Z �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/multicast.go:79�[0m
2024-06-20T13:47:05.8178804Z   �[1mSTEP:�[0m Creating a kubernetes client �[38;5;243m@ 06/20/24 13:47:05.817�[0m
2024-06-20T13:47:05.8179917Z   Jun 20 13:47:05.817: INFO: >>> kubeConfig: /home/runner/ovn.conf
2024-06-20T13:47:05.8183639Z   �[1mSTEP:�[0m Building a namespace api object, basename multicast �[38;5;243m@ 06/20/24 13:47:05.818�[0m
2024-06-20T13:47:05.8221785Z   Jun 20 13:47:05.821: INFO: Skipping waiting for service account
2024-06-20T13:47:05.8421587Z   �[1mSTEP:�[0m creating a pod as a multicast source in node ovn-worker �[38;5;243m@ 06/20/24 13:47:05.841�[0m
2024-06-20T13:47:05.8487617Z   W0620 13:47:05.848044   74981 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "pod-client" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "pod-client" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "pod-client" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "pod-client" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2024-06-20T13:47:07.8552911Z   �[1mSTEP:�[0m creating first multicast listener pod in node ovn-worker2 �[38;5;243m@ 06/20/24 13:47:07.854�[0m
2024-06-20T13:47:07.8605049Z   W0620 13:47:07.859650   74981 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "pod-server1" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "pod-server1" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "pod-server1" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "pod-server1" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2024-06-20T13:47:09.8687194Z   �[1mSTEP:�[0m creating second multicast listener pod in node ovn-worker2 �[38;5;243m@ 06/20/24 13:47:09.868�[0m
2024-06-20T13:47:09.8735850Z   W0620 13:47:09.872877   74981 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "pod-server2" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "pod-server2" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "pod-server2" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "pod-server2" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2024-06-20T13:47:11.8815733Z   �[1mSTEP:�[0m creating first multicast listener pod in node ovn-worker �[38;5;243m@ 06/20/24 13:47:11.881�[0m
2024-06-20T13:47:11.8867196Z   W0620 13:47:11.886000   74981 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "pod-server3" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "pod-server3" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "pod-server3" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "pod-server3" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2024-06-20T13:47:13.8938814Z   �[1mSTEP:�[0m checking if pod server1 received multicast traffic �[38;5;243m@ 06/20/24 13:47:13.893�[0m
2024-06-20T13:47:13.9057278Z   �[1mSTEP:�[0m checking if pod server2 does not received multicast traffic �[38;5;243m@ 06/20/24 13:47:13.905�[0m
2024-06-20T13:47:13.9089851Z   �[1mSTEP:�[0m checking if pod server3 received multicast traffic �[38;5;243m@ 06/20/24 13:47:13.908�[0m
2024-06-20T13:47:13.9194687Z   �[1mSTEP:�[0m Destroying namespace "multicast-8182" for this suite. �[38;5;243m@ 06/20/24 13:47:13.919�[0m
2024-06-20T13:47:13.9221258Z �[38;5;10m• [8.105 seconds]�[0m

test passes.

@tssurya
Copy link
Member

tssurya commented Jun 20, 2024

@tssurya
Copy link
Member

tssurya commented Jun 20, 2024

Given its an OVN Bump and both the live migration jobs failed, I cannot merge with a red CI:

2024-06-20T13:19:20.8836054Z   Latency metrics for node ovn-worker3
2024-06-20T13:19:20.8837410Z   �[1mSTEP:�[0m Destroying namespace "kv-live-migration-1853" for this suite. �[38;5;243m@ 06/20/24 13:19:20.883�[0m
2024-06-20T13:19:20.8884104Z �[38;5;9m• [FAILED] [214.253 seconds]�[0m
2024-06-20T13:19:20.8885911Z �[0mKubevirt Virtual Machines �[38;5;243mwith default pod network �[0mwhen live migration �[38;5;9m�[1m[It] with pre-copy succeeds, should keep connectivity�[0m
2024-06-20T13:19:20.8887975Z �[38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/kubevirt.go:1093�[0m
2024-06-20T13:19:20.8888816Z 
2024-06-20T13:19:20.8889269Z   �[38;5;9m[FAILED] worker1: Expose tcpServer as a service
2024-06-20T13:19:20.8889948Z   Unexpected error:
2024-06-20T13:19:20.8890539Z       <*fmt.wrapError | 0xc000e8e160>: 
2024-06-20T13:19:20.8891567Z       failed DialTCP: dial tcp 172.18.0.2:32485: connect: connection refused
2024-06-20T13:19:20.8892420Z       {
2024-06-20T13:19:20.8893456Z           msg: "failed DialTCP: dial tcp 172.18.0.2:32485: connect: connection refused",
2024-06-20T13:19:20.8894626Z           err: <*net.OpError | 0xc000d9bf90>{
2024-06-20T13:19:20.8895290Z               Op: "dial",
2024-06-20T13:19:20.8896407Z               Net: "tcp",
2024-06-20T13:19:20.8897012Z               Source: nil,
2024-06-20T13:19:20.8897757Z               Addr: <*net.TCPAddr | 0xc001210000>{
2024-06-20T13:19:20.8898718Z                   IP: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 172, 18, 0, 2],
2024-06-20T13:19:20.8899401Z                   Port: 32485,
2024-06-20T13:19:20.8900010Z                   Zone: "",
2024-06-20T13:19:20.8900430Z               },
2024-06-20T13:19:20.8901008Z               Err: <*os.SyscallError | 0xc000e8e140>{
2024-06-20T13:19:20.8901670Z                   Syscall: "connect",
2024-06-20T13:19:20.8902173Z                   Err: <syscall.Errno>0x6f,
2024-06-20T13:19:20.8902787Z               },
2024-06-20T13:19:20.8903209Z           },
2024-06-20T13:19:20.8903610Z       }
2024-06-20T13:19:20.8904012Z   occurred�[0m
2024-06-20T13:19:20.8905263Z   �[38;5;9mIn �[1m[It]�[0m�[38;5;9m at: �[1m/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/asm_amd64.s:1650�[0m �[38;5;243m@ 06/20/24 13:19:19.612�[0m
2024-06-20T13:19:20.8906000Z 

I see this it might not be related but can't risk a regression. Gut tells me to see at least 1 lane green; hence triggered a re-run of failed lanes

@tssurya
Copy link
Member

tssurya commented Jun 20, 2024

live migration has failed again. We need some investigation on the CI failure @ricky-rav FYI before I can merge this

@qinqon
Copy link
Contributor

qinqon commented Jun 21, 2024

@tssurya, few weeks ago we did test some ovn changes related to arp_proxy and they were working allright, maybe it is still problematic and we didn't test it well,

ovn-org/ovn@cc4187b

@ricky-rav ricky-rav dismissed stale reviews from tssurya and martinkennelly via 6e18c80 June 24, 2024 12:53
@github-actions github-actions bot added the feature/kubevirt-live-migration All issues related to kubevirt live migration label Jun 24, 2024
@tssurya tssurya added feature/multicast kind/bug All issues that are bugs and PRs opened to fix bugs area/core-networking Issues related to traffic flows, OVN/OVS bugs, network disruption labels Jun 24, 2024
@coveralls
Copy link

Coverage Status

coverage: 52.734% (+0.03%) from 52.707%
when pulling 935951b on ricky-rav:OCPBUGS-34778_upstream
into ebf2c68 on ovn-org:master.

Contains revert for a known multicast bug.
(https://issues.redhat.com/browse/OCPBUGS-34778)

Signed-off-by: Riccardo Ravaioli <[email protected]>
The kubevirt e2e tests were testing network policy wrongly by using an
already active network, after bumping OVN this is no longer working and
also dial is still not failing after creating the deny-all policy. This
change skip de network policy for now.

Signed-off-by: Enrique Llorente <[email protected]>
@tssurya tssurya merged commit 798eb14 into ovn-org:master Jun 25, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core-networking Issues related to traffic flows, OVN/OVS bugs, network disruption area/e2e-testing feature/kubevirt-live-migration All issues related to kubevirt live migration feature/multicast kind/bug All issues that are bugs and PRs opened to fix bugs
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants