-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent timeouts for connections to external services via IPv4 #2037
Comments
I didn't get this issue on my env. Could you try to check if the traffic at least is rightly forwarded outside of the node when you do the test from the pod? |
Thank you for looking into this! A successful execution of
A failed execution of
If I understand it correctly, 142.250.186.78 is one of the IP addresses of the target server (here google). And for some reason sometimes no TCP connection can be established at all? 🤔 |
Where are you doing the tcpdump? From the pod or from the node? Could you do it from the node? You can omit your public IP address in case. |
You are right, that was from the pod. Here are the results from the node. I however made two adjustments: the main network interface is enp5s0. Also, as there is quite some traffic for 443, I adjusted it to 587 and smtp.gmail.com (I originally noticed these issues when sending emails). There is no other traffic for this port on the server => A successful execution of
A failed execution of
Not sure if that is relevant, but the server is a dedicated server hosted by Hetzner |
I don't know about Hetzner but the traffic seems rightly forwarded outside the node by flannel. You aren't getting any reply from internet. Are there any configuration on the provider virtual network that could drop the traffic? I don't know probably some rules to avoid a DDoS attack. |
Hmm... nothing that I´m aware of. What bugs me the most is that it is 100% reliable when running directly from the node. Anything from the provider side would also affect this, right? Only when running from within a pod, it starts to fail. It also only fails from the pod using IPv4, using IPv6 is also there 100% reliable. 🤔 I did a small test with 100 attempts: From Pod: |
Could you try this from the node? |
I think I found the reason, however, I can´t explain it fully yet. There is a firewall on the provider side that by default does not filter ipv6. This explains why it always works for IPv6 - both on the node and from the pod. Next to some rules like only allowing incoming 80 and 443, this also contains by default an entry called "TCP established" with When I temporarily replace this with something like Is a different ephemeral port range than 32768-65535 used when the traffic comes from the pod via flannel? 🤔 |
For the traffic from the pods Flannel is only configuring a basic NAT with iptables and MASQUERADE. |
In that context I found canonical/microk8s#3909 describing the same issue with calico. It looks like that for some reason, a wider port range for ephemeral ports is used. 1024-65535 seem to work - matching https://datatracker.ietf.org/doc/html/rfc6056 🤔. I currently don´t know where to follow up on this but at least it does not seem to be an issue exclusively with flannel. Thanks a lot @rbrtbnfgl for the support! 🙏 |
Hi,
I installed k3s on top of ubuntu 24.04 using flannel vxlan (k3s config below). When connecting to external services using IPv4 from within a pod, the connections sometimes succeed and sometimes time out. When connecting to IPv6, it always works. Also, the same connections directly from the host always succeed (both using IPv4 and IPv6).
Unfortunately, my knowledge of networking is quite limited. Do you have any idea what could cause this behavior?
Thanks,
Fabian
Connecting to google.com from a pod using IPv4 sometimes fails:
Connecting to google.com from a pod using IPv6 always works:
Connecting to google.com from the host using IPv4 and IPv6 always works:
Expected Behavior
Reliable connectivity from cluster to external service
Current Behavior
Frequent timeouts when connecting to external services using IPv4
Steps to Reproduce (for bugs)
nc -4 -zv -w1 google.com 443
from within a podContext
Your Environment
The text was updated successfully, but these errors were encountered: