systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core #2131

eliassal · 2024-12-10T08:35:45Z

I have K8S 2 node cluster up and running with flannel CNI, master node on fedora 41. When I check the pods I see that all pods are up and running but I noticed that flannel kube was restarted 129 times

So I started investigating, I issued
journalctl -b -l -p err | cat > journal.txt

in the journal I see several texts as follows

Dec 09 18:03:53 puppetmaster29 systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core.
                                                       
                                                       Module /opt/cni/bin/flannel without build-id.
                                                       Stack trace of thread 4209:
                                                       #0  0x0000000000461ec1 runtime.vdsoFindVersion (/opt/cni/bin/flannel + 0x61ec1)
                                                       #1  0x0000000000462456 runtime.vdsoauxv (/opt/cni/bin/flannel + 0x62456)
                                                       #2  0x000000000043104d runtime.sysauxv (/opt/cni/bin/flannel + 0x3104d)
                                                       #3  0x0000000000430da5 runtime.sysargs (/opt/cni/bin/flannel + 0x30da5)
                                                       #4  0x0000000000446e3f runtime.args (/opt/cni/bin/flannel + 0x46e3f)
                                                       #5  0x0000000000473a25 runtime.args.abi0 (/opt/cni/bin/flannel + 0x73a25)
                                                       #6  0x000000000046f3b2 runtime.rt0_go.abi0 (/opt/cni/bin/flannel + 0x6f3b2)
                                                       ELF object binary architecture: AMD x86-64
Dec 09 18:03:54 puppetmaster29 systemd-coredump[4197]: Process 4195 (flannel) of user 0 dumped core.
                                                       
                                                       Module /opt/cni/bin/flannel without build-id.
                                                       Stack trace of thread 4195:
                                                       #0  0x00000000004729c0 _rt0_amd64_linux (/opt/cni/bin/flannel + 0x729c0)
                                                       ELF object binary architecture: AMD x86-64

Current Behavior

described up

Possible Solution

Cant suggest

Steps to Reproduce (for bugs)

setup k8s
apply cni Flannel

Your Environment

Flannel version: latest version
Backend used (e.g. vxlan or udp):
Etcd version: 3.5.15-0
Kubernetes version (if used): 1.31
Operating System and version: fedora 41

The text was updated successfully, but these errors were encountered:

thomasferrandiz · 2024-12-10T09:18:12Z

We need more information to help.
Can you share logs from the flannel pod and the content of the flannel configmap?

eliassal · 2024-12-10T09:57:23Z

Hi Thomas, enclosed 2 logs for 2 flannel pods 1 running on the master, 2nd on the worker node
but when I issue the command
kubectl logs kube-flannel-ds-fptps -n kube-flannel > flannellog1.txt
I get

Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
flannelPodOnMaster.txt
flannelPodLog.txt

here is the configmap for the pod in question
$ kubectl get configmaps kube-flannel-cfg -n kube-flannel -o yaml

apiVersion: v1
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "EnableNFTables": false,
      "Backend": {
        "Type": "vxlan"
      }
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"cni-conf.json":"{\n  \"name\": \"cbr0\",\n  \"cniVersion\": \"0.3.1\",\n  \"plugins\": [\n    {\n      \"type\": \"flannel\",\n      \"delegate\": {\n        \"hairpinMode\": true,\n        \"isDefaultGateway\": true\n      }\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\n        \"portMappings\": true\n      }\n    }\n  ]\n}\n","net-conf.json":"{\n  \"Network\": \"10.244.0.0/16\",\n  \"EnableNFTables\": false,\n  \"Backend\": {\n    \"Type\": \"vxlan\"\n  }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"flannel","k8s-app":"flannel","tier":"node"},"name":"kube-flannel-cfg","namespace":"kube-flannel"}}
  creationTimestamp: "2024-11-16T17:41:27Z"
  labels:
    app: flannel
    k8s-app: flannel
    tier: node
  name: kube-flannel-cfg
  namespace: kube-flannel
  resourceVersion: "581"
  uid: 4efe798e-9d71-442a-bc82-419a5d3fdc76

eliassal · 2024-12-10T12:42:12Z

Hi Thomas, any feedback please?

thomasferrandiz · 2024-12-11T15:01:04Z

Hi @eliassal
I don't see anything wrong in your flannel configuration.

However based on these lines in your log:

pkg/subnet/kube/kube.go:470: failed to list *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?resourceVersion=547538": dial tcp 10.96.0.1:443: connect: no route to host

it looks like you have a network issue between the nodes.

Can they ping each other?
Is there a firewall active?

Based on the coredump log in your original message, it's the flannel CNI binary called by the kubelet and not the flanneld daemon running in the pod that is crashing so I think the most likely explanation is that there is an issue on the host itself not in flannel or the kubernetes deployment.

eliassal · 2024-12-11T15:31:19Z

Yes ping works fine between both
Firewall is inactive state on both
What possible issue on the host itself? I really have no clue? When I create deployments or replicasets they work fine

thomasferrandiz · 2024-12-11T16:37:23Z

I don't know exactly but if you get no route to host errors in the flannel logs it means that there is some kind of connectivity issue between the two nodes.

You could check other errors or warnings with journalctl or the kernel logs with dmesg maybe?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core #2131

systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core #2131

eliassal commented Dec 10, 2024 •

edited

Loading

thomasferrandiz commented Dec 10, 2024

eliassal commented Dec 10, 2024 •

edited

Loading

eliassal commented Dec 10, 2024 •

edited

Loading

thomasferrandiz commented Dec 11, 2024

eliassal commented Dec 11, 2024

thomasferrandiz commented Dec 11, 2024

systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core #2131

systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core #2131

Comments

eliassal commented Dec 10, 2024 • edited Loading

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Your Environment

thomasferrandiz commented Dec 10, 2024

eliassal commented Dec 10, 2024 • edited Loading

eliassal commented Dec 10, 2024 • edited Loading

thomasferrandiz commented Dec 11, 2024

eliassal commented Dec 11, 2024

thomasferrandiz commented Dec 11, 2024

eliassal commented Dec 10, 2024 •

edited

Loading

eliassal commented Dec 10, 2024 •

edited

Loading

eliassal commented Dec 10, 2024 •

edited

Loading