Pod Networking Issues #136

a-voitov-mitgo · 2025-07-03T12:57:57Z

a-voitov-mitgo
Jul 3, 2025

Hi everyone!
First of all, we want to say that we really enjoy using Rackspace Spot — the pricing is great and the support team has been wonderful. Below is a description of our current setup and a few issues we’re running into. We’d really appreciate any advice or shared experience!

🚀 Our Architecture

🔄 Custom NAT via Cilium + IPSec

Since there’s no built-in NAT Gateway in spot.rackspace, we deploy our cluster in bring-your-own-CNI mode.
We then install Cilium, enabling egressGateway so we can route outbound traffic through dedicated OnDemand nodes via CiliumEgressGatewayPolicy.
IPSec is also enabled in Cilium — since each node has a public IP, we suspect that some traffic may traverse public networks, and encryption gives us confidence in data security.
All ports are open on the nodes by default, and we’d like to configure a proper firewall. While Cilium’s hostFirewall feature should help, in our case it conflicted with IPSec — so this remains unresolved.

🌐 Preserving Client IP for Ingress

The default LoadBalancer service in spot.rackspace overwrites the client IP with internal addresses.
We need the real client IP for access control (whitelisting), so we run nginx-ingress in hostPort mode on OnDemand nodes.
We’ve read that this is solved in Gen‑2, but our clusters were created before that was available, and we're not ready to migrate yet.

🔍 Problems We’re Facing

🚫 Intermittent Network Loss in Pods

Symptom:
Occasionally, newly created pods lose all in-cluster networking — DNS fails and they can’t reach other cluster services. But the internet works fine, but you need to replace the DNS address in resolve.conf with an external one.
The node itself and other pods on that node work normally. The problem only occurs in new pods on this node.
Workaround:
Recycling the node (cordon → drain → recycle) usually fixes the issue, but the root cause remains unclear.
It might be related to our Cilium configuration, but we also can't rule out a problem on the Rackspace side.
A regular reboot of the node doesn't help, and other nodes in the cluster continue to work just fine.

🌀 Nodes Stuck in “Recycling”

On one of our clusters, recycling has recently stopped working.
Nodes get stuck in the Recycling state and remain there for days, with no further actions possible.
Not sure how to recover these nodes or whether it's a bug or misconfiguration.

🤔 Our Questions

Has anyone experienced similar pod-level network failures on spot.rackspace?
Could our use of Cilium (NAT or IPSec) be interfering with internal routing or DNS?
Has anyone successfully implemented firewall rules on the nodes using Cilium or another tool?
Are there known issues or API limitations in spot.rackspace that could cause nodes to hang in Recycling?
What logs, metrics, or debug steps would you recommend for diagnosing these network and node recycling issues?

Thank you so much for your time and help!
We really appreciate this community and Rackspace’s ongoing support 💙

sahil-lakhwani · 2025-07-04T07:14:40Z

sahil-lakhwani
Jul 4, 2025

Hey @a-voitov-mitgo , we will try Cilium with the mentioned configs and try to see if and where the problem is.

Meanwhile, do you have any nodes stuck in recycling? We can take a look immediately.

2 replies

a-voitov-mitgo Jul 7, 2025
Author

Hello @sahil-lakhwani, Thank you for help)
These nodes were recycled precisely because the new pods lacked an internal network. And now they are currently stuck in the "Recycling" status.

cluster: takeads-dev-rsp
prod-instance-1751022700623 (openstack:///99377f03-747a-486c-ba6d-180bc988ff24)
prod-instance-1751022705745 (openstack:///7474e626-c11f-4766-8cda-b741dac462ed)
prod-instance-1751023083762 (openstack:///55a27adf-8993-469a-8d31-93433cfadad8)

a-voitov-mitgo Jul 7, 2025
Author

and our cilium helm configuration (chart version "1.17.2")

debug:
   enabled: false
   verbose: ""
 autoDirectNodeRoutes: false
 directRoutingSkipUnreachable: false
 bpf:
   root: /sys/fs/bpf
   lbMapMax: 65536
   policyMapMax: 16384
   mapDynamicSizeRatio: 0.0025
   events:
     drop:
       enabled: true
     policyVerdict:
       enabled: true
     trace:
       enabled: true
   lbExternalClusterIP: false
   lbSourceRangeAllTypes: false
   lbAlgorithmAnnotation: false
   lbModeAnnotation: false
   masquerade: true
   datapathMode: veth
   enableTCX: true
 bpfClockProbe: false
 envoy:
   baseID: 100
 cni:
   exclusive: true
   logFile: /var/run/cilium/cilium-cni.log
   customConf: false
 agentNotReadyTaintKey: "node.cilium.io/agent-not-ready"
 ciliumEndpointSlice:
   enabled: false
 cluster:
   id: 100
   name: rackspace
 clustermesh:
   useAPIServer: true
   service:
     type: LoadBalancer
 nodePort:
   enabled: true
 bgpControlPlane:
   enabled: true
 egressGateway:
   # -- Enables egress gateway to redirect and SNAT the traffic that leaves the
   # cluster.
   enabled: true

 kubeProxyReplacement: true

 debug:
   enabled: true
 
 l2announcements:
   enabled: true
 clusterPoolIPv4PodCIDR: "10.0.0.0/8"
 clusterPoolIPv4MaskSize: 24
 enableIPv4: true
 enableIPv6: false
 enablePolicy: "default"
 enableSourceIPVerification: true
 enableHubble: true
 hubble:
   listenAddress: ":4244"
   ui:
     enabled: true
   relay:
     enabled: true
   socketPath: /var/run/cilium/hubble.sock
   tls:
     server:
       cert: /var/lib/cilium/tls/hubble/server.crt
       key: /var/lib/cilium/tls/hubble/server.key
     client:
       caFiles:
         - /var/lib/cilium/tls/hubble/client-ca.crt

 ipam:
   mode: cluster-pool
   operator:
     updateRate: 15s
 routingMode: tunnel
 tunnelProtocol: vxlan
 serviceNoBackendResponse: reject
 writeCniConfWhenReady: /host/etc/cni/net.d/05-cilium.conflist

 encryption:
   enabled: true
   type: ipsec

sahil-lakhwani · 2025-07-08T12:51:22Z

sahil-lakhwani
Jul 8, 2025

@a-voitov-mitgo It seems there was some problem recycling the nodes. While we get to the root of it, we have solved the problem of the nodes stuck for you.

We're also looking in to the Cilium networking problem, will get back to you ASAP.

0 replies

sahil-lakhwani · 2025-07-10T11:10:07Z

sahil-lakhwani
Jul 10, 2025

@a-voitov-mitgo The ability to preserve client IP in Gen1 loadbalancers should be available in coming release, see #109

0 replies

sahil-lakhwani · 2025-07-14T08:18:24Z

sahil-lakhwani
Jul 14, 2025

@a-voitov-mitgo I tried to replicate this by installing cilium with the config you shared (thanks again for that).
I deployed some pods spread across 2 nodes. I could see them communicate as expected. Is there anything I am missing?

1 reply

a-voitov-mitgo Jul 16, 2025
Author

The issue with the network bug appears suddenly and unpredictably. It seems that the likelihood of it occurring increases with the number of nodes and pods in the cluster.

We've encountered this problem in several of our clusters. In one case, we have 7 nodes and 117 pods. Recently, even the on-demand nodes started experiencing the issue, and the nginx-ingress-controller running on one of them stopped forwarding traffic to the pods.

a-voitov-mitgo · 2025-07-23T10:06:03Z

a-voitov-mitgo
Jul 23, 2025
Author

@sahil-lakhwani
It seems I figured out what the problem might be. The issue is that I enabled kubeProxyReplacement. I enabled it to make hostPort and IPsec work...

I realized that with this setting, Cilium is supposed to replace kube-proxy. But in my case, it seems Cilium is conflicting with kube-proxy. That’s why (supposedly) I sometimes encounter a bug where the network disappears — because Cilium starts handling the traffic.

Now I tried disabling kube-proxy. But in that case, ClusterIP routing doesn't work.
For example, CoreDNS doesn't work because it tries to connect to kubernetes.default.svc but fails. I noticed that the API server is being handled by vcp-proxy, which is an Envoy proxy for the API server. The Kubernetes Endpoint (10.21.0.40) points to it, and the Service (10.21.0.1) points to that endpoint. From test pod I can reach the endpoint, but I can’t reach the service.

In the Cilium documentation, in the section "Kubernetes Without kube-proxy: Troubleshooting"

I found that there could be issues with BPF cgroup program attachment.

To check this, you need to run a couple of commands, and in my case, the output doesn’t match what it should be. I don’t yet know why — maybe it’s a containerd configuration issue, or something else.

root@prod-instance-1753172723733:/home/cilium# mount | grep cgroup2
none on /run/cilium/cgroupv2 type cgroup2 (rw,relatime)

root@prod-instance-1753172723733:/home/cilium# bpftool cgroup tree /run/cilium/cgroupv2/
ID       AttachType      AttachFlags     Name           
/run/cilium/cgroupv2
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
/run/cilium/cgroupv2/system.slice/systemd-udevd.service
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
/run/cilium/cgroupv2/system.slice/systemd-journald.service
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
/run/cilium/cgroupv2/system.slice/systemd-logind.service
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF

3 replies

a-voitov-mitgo Jul 29, 2025
Author

Hi @sahil-lakhwani , I also found the following in the Cilium logs:

time="2025-07-23T16:20:24.798654504Z" level=debug msg="removing directory" ciliumEndpointName=/ containerID= containerInterface= datapathPolicyRevision=0 desiredPolicyRevision=1 directory=83_next endpointID=83 identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
time="2025-07-23T16:20:24.799358312Z" level=debug msg="Controller func execution time: 2.846µs" name=ipcache-inject-labels subsys=controller uuid=aaee283d-f10f-4756-8419-29491c69bbca
time="2025-07-23T16:20:24.799547107Z" level=debug msg="Controller run succeeded; waiting for next controller update or stop" name=ipcache-inject-labels subsys=controller uuid=aaee283d-f10f-4756-8419-29491c69bbca
time="2025-07-23T16:20:24.987831467Z" level=debug msg="Compilation had peak RSS of 141900 bytes" compiler-pid=25882 output=/var/run/cilium/state/bpf_sock.o subsys=datapath-loader
time="2025-07-23T16:20:25.000718324Z" level=debug msg="No pinned link '/sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_pre_bind', querying cgroup" subsys=socketlb
time="2025-07-23T16:20:25.000878699Z" level=debug msg="No programs in cgroup /run/cilium/cgroupv2 with attach type CGroupInet6Bind" subsys=socketlb
time="2025-07-23T16:20:25.001067262Z" level=info msg="No existing link found at /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock4_sendmsg for program cil_sock4_sendmsg" subsys=socketlb
time="2025-07-23T16:20:25.001220255Z" level=debug msg="Performing PROG_ATTACH for program cil_sock4_sendmsg" subsys=socketlb
time="2025-07-23T16:20:25.001428411Z" level=debug msg="Program cil_sock4_sendmsg was attached using PROG_ATTACH" subsys=socketlb
time="2025-07-23T16:20:25.001553359Z" level=debug msg="No pinned link '/sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock4_pre_bind', querying cgroup" subsys=socketlb
time="2025-07-23T16:20:25.001686174Z" level=debug msg="No programs in cgroup /run/cilium/cgroupv2 with attach type CGroupInet4Bind" subsys=socketlb
time="2025-07-23T16:20:25.001818826Z" level=info msg="No existing link found at /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_connect for program cil_sock6_connect" subsys=socketlb
time="2025-07-23T16:20:25.001938713Z" level=debug msg="Performing PROG_ATTACH for program cil_sock6_connect" subsys=socketlb
time="2025-07-23T16:20:25.002077206Z" level=debug msg="Program cil_sock6_connect was attached using PROG_ATTACH" subsys=socketlb
time="2025-07-23T16:20:25.00221384Z" level=info msg="No existing link found at /sys/fs/bpf/cilium/socketlb/links/cgroup/cil_sock6_sendmsg for program cil_sock6_sendmsg" subsys=socketlb
time="2025-07-23T16:20:25.002324596Z" level=debug msg="Performing PROG_ATTACH for program cil_sock6_sendmsg" subsys=socketlb

As you can see from the logs, Cilium attempts to attach programs (Performing PROG_ATTACH...), but then checks and finds nothing attached: No programs in cgroup /run/cilium/cgroupv2 with attach type CGroupInet4Bind.

I tried attaching them manually using bpftool and checking as well — and I also saw an empty list (Possibly I did something wrong).

Regarding the warning:

libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?

When I check the same thing on the host system, the command works and shows attached programs — although I had to install bpftool in the host system first.

About the host system: I can't exec into pods because networking doesn’t work. So I create a pod and launch a process like this:

socat TCP-LISTEN:8080,reuseaddr,fork EXEC:"/bin/bash -i",pty,ctty,stderr,setsid,sane

Then I connect to the console using:

socat -,raw,echo=0 TCP:<node-ip>:8080

And then I chroot into the host system.

I also tried updating the kernel in the VM, and the warnings in Cilium disappeared when running the command — but the programs are still not attached:

root@prod-instance-1753172723733:/home/cilium# bpftool cgroup tree /run/cilium/cgroupv2/
ID       AttachType      AttachFlags     Name           
/run/cilium/cgroupv2
/run/cilium/cgroupv2/system.slice/systemd-udevd.service
/run/cilium/cgroupv2/system.slice/systemd-journald.service
/run/cilium/cgroupv2/system.slice/systemd-logind.service

@sahil-lakhwani, do you have any updates or insights into why socketlb might not be working?

a-voitov-mitgo Aug 1, 2025
Author

Hi. I finally found the root of the problem. The thing is, the routing itself works.

That is, if you:

Install Cilium with the option kubeProxyReplacement: true
Disable the rxt-kube-proxy DaemonSet
Reboot all nodes so that no iptables rules created by kube-proxy remain

Then, pod-to-pod routing will work. The ClusterIP will also function correctly.

The only thing that doesn't work is the kubernetes.default.svc service itself.

And I found a way to temporarily restore its functionality on one of the nodes.

To do this, you need to exec into the cilium-agent container,
then run the command: cilium-dbg service list

In the screenshot, 10.21.0.1:443 is the kubernetes.default.svc service.
It points to the vcp-proxy service (an Envoy proxy to the Kubernetes API).
And vcp-proxy has node addresses in its endpoints, since it is deployed as a DaemonSet with hostNetwork: true and hostPort: 6443.

To temporarily fix this endpoint, I can replace 10.21.0.40:6443 with the node addresses.
That is, the kubernetes endpoint should point directly to the node addresses.

cilium-dbg service update \
  --id 1 \
  --frontend 10.21.0.1:443 \
  --backends 134.213.138.144:6443,134.213.138.157:6443

Now, in the screenshot, you can see that 10.21.0.1:443 points directly to the nodes.

If I try to access kubernetes.default.svc from another regular pod, it works. Also, CoreDNS started working.

However, after restarting Cilium on this node, everything resets.

I also tried manually setting the node endpoints in the Kubernetes CRD endpoint, but it has no effect.

a-voitov-mitgo Aug 3, 2025
Author

Hi, @sahil-lakhwani . It seems I’ve found a working configuration using kubeProxyReplacement.

Disable rxt-kube-proxy
I do this by adding a non-existent nodeSelector to the DaemonSet (I don’t want to delete it entirely).
Workaround for kubernetes.default.svc route
Since the route to kubernetes.default.svc doesn’t work, I found a fairly simple solution.
An init container creates a network interface and assigns it the address 10.21.0.40.
As I mentioned earlier, this is the address of the vcp-proxy service.
Cilium cannot route traffic from one ClusterIP (kubernetes.default.svc) to another ClusterIP (vcp-proxy.kube-system.svc).
But using a virtual network interface helps solve this problem.
Kernel update is required
The default kernel version 5.4.0 used on Rackspace machines does not support BPF.
Some versions of Cilium will fail to start with the error:

"failed while reinitializing datapath: failed loading eBPF collection into the kernel: program cil_sock4_sendmsg: load program: invalid argument: invalid func unknown#122 (327 line(s) omitted)"

Other cilium versions might start and appear to be working, but networking will not function properly.
You may still find similar errors in the logs.
I update the kernel via an init container and then reboot the machine.
It would be great if a newer kernel were available by default.
Changed encryption method to WireGuard
I switched to WireGuard encryption, since IPsec conflicts with hostFirewall.
New version of Cilium
Since the kernel is now updated, we can use the latest version of Cilium.

Below are the values with the new configuration.
Hopefully, the networking bug is now resolved — we’ll keep testing.

# helm upgrade --install cilium cilium/cilium --version 1.18.0 --namespace kube-system --values cilium.yaml

k8sServiceHost: "127.0.0.1"
k8sServicePort: "6443"
localRedirectPolicy: true
localRedirectPolicies:
  enabled: true
bpf:
  autoMount:
    enabled: true
  events:
    drop:
      enabled: true
    policyVerdict:
      enabled: true
    trace:
      enabled: true
  masquerade: true
  hostLegacyRouting: false
envoy:
  baseID: 100
cni:
  exclusive: true
  logFile: /var/run/cilium/cilium-cni.log
  customConf: false
agentNotReadyTaintKey: "node.cilium.io/agent-not-ready"
cluster:
  id: 100
  name: rackspace

kubeProxyReplacement: true

debug:
  enabled: true

bgpControlPlane:
  enabled: true

egressGateway:
  enabled: true

ipam:
  mode: cluster-pool
  operator:
    clusterPoolIPv4PodCIDRList: ["10.0.0.0/8"]
    updateRate: 15s
routingMode: tunnel
tunnelProtocol: geneve #vxlan

devices: "eth0"
hostFirewall:
  enabled: true
encryption:
  enabled: true
  type: wireguard

extraVolumes:
  - name: host-root
    hostPath:
      path: /
      type: Directory

extraInitContainers:
  - name: kernel-updater
    image: alpine:3.18
    imagePullPolicy: IfNotPresent
    command:
        - /usr/sbin/chroot
        - /host
        - /bin/bash
        - -c
        - -x
    args:
         - |
          if [ -f "/sys/kernel/btf/vmlinux" ]; then
            echo "BTF support detected. Kernel is compatible."
            exit 0
          fi
          apt update
          apt install -y linux-image-5.15.0-139-generic
          reboot
    securityContext:
        privileged: true
        capabilities:
          add: ["SYS_ADMIN", "SYS_CHROOT"]
          drop:
            - ALL
    volumeMounts:
        - name: host-root
          mountPath: /host

  - name: fix-kube-api
    image: alpine:3.18
    imagePullPolicy: IfNotPresent
    command:
        - /bin/sh
        - -c
        - -x
    args:
         - |
           ip link add kubeapi type dummy
           ip addr add 10.21.0.40/32 dev kubeapi
           ip link set dev kubeapi up
    securityContext:
        privileged: true
        capabilities:
          add: ["NET_ADMIN", "NET_RAW"]
          drop:
            - ALL

Pod Networking Issues #136

Uh oh!

a-voitov-mitgo Jul 3, 2025

🚀 Our Architecture

🔄 Custom NAT via Cilium + IPSec

🌐 Preserving Client IP for Ingress

🔍 Problems We’re Facing

🚫 Intermittent Network Loss in Pods

🌀 Nodes Stuck in “Recycling”

🤔 Our Questions

Replies: 5 comments · 6 replies

Uh oh!

sahil-lakhwani Jul 4, 2025

Uh oh!

a-voitov-mitgo Jul 7, 2025 Author

Uh oh!

a-voitov-mitgo Jul 7, 2025 Author

Uh oh!

sahil-lakhwani Jul 8, 2025

Uh oh!

sahil-lakhwani Jul 10, 2025

Uh oh!

sahil-lakhwani Jul 14, 2025

Uh oh!

a-voitov-mitgo Jul 16, 2025 Author

Uh oh!

Uh oh!

a-voitov-mitgo Jul 23, 2025 Author

Uh oh!

a-voitov-mitgo Jul 29, 2025 Author

Uh oh!

a-voitov-mitgo Aug 1, 2025 Author

Uh oh!

Uh oh!

a-voitov-mitgo Aug 3, 2025 Author

a-voitov-mitgo
Jul 3, 2025

Replies: 5 comments 6 replies

sahil-lakhwani
Jul 4, 2025

a-voitov-mitgo Jul 7, 2025
Author

a-voitov-mitgo Jul 7, 2025
Author

sahil-lakhwani
Jul 8, 2025

sahil-lakhwani
Jul 10, 2025

sahil-lakhwani
Jul 14, 2025

a-voitov-mitgo Jul 16, 2025
Author

a-voitov-mitgo
Jul 23, 2025
Author

a-voitov-mitgo Jul 29, 2025
Author

a-voitov-mitgo Aug 1, 2025
Author

a-voitov-mitgo Aug 3, 2025
Author