Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core #2131

Open
eliassal opened this issue Dec 10, 2024 · 6 comments
Open

systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core #2131

eliassal opened this issue Dec 10, 2024 · 6 comments

Comments

@eliassal
Copy link

eliassal commented Dec 10, 2024

I have K8S 2 node cluster up and running with flannel CNI, master node on fedora 41. When I check the pods I see that all pods are up and running but I noticed that flannel kube was restarted 129 times

image

So I started investigating, I issued
journalctl -b -l -p err | cat > journal.txt

in the journal I see several texts as follows

Dec 09 18:03:53 puppetmaster29 systemd-coredump[4211]: Process 4209 (flannel) of user 0 dumped core.
                                                       
                                                       Module /opt/cni/bin/flannel without build-id.
                                                       Stack trace of thread 4209:
                                                       #0  0x0000000000461ec1 runtime.vdsoFindVersion (/opt/cni/bin/flannel + 0x61ec1)
                                                       #1  0x0000000000462456 runtime.vdsoauxv (/opt/cni/bin/flannel + 0x62456)
                                                       #2  0x000000000043104d runtime.sysauxv (/opt/cni/bin/flannel + 0x3104d)
                                                       #3  0x0000000000430da5 runtime.sysargs (/opt/cni/bin/flannel + 0x30da5)
                                                       #4  0x0000000000446e3f runtime.args (/opt/cni/bin/flannel + 0x46e3f)
                                                       #5  0x0000000000473a25 runtime.args.abi0 (/opt/cni/bin/flannel + 0x73a25)
                                                       #6  0x000000000046f3b2 runtime.rt0_go.abi0 (/opt/cni/bin/flannel + 0x6f3b2)
                                                       ELF object binary architecture: AMD x86-64
Dec 09 18:03:54 puppetmaster29 systemd-coredump[4197]: Process 4195 (flannel) of user 0 dumped core.
                                                       
                                                       Module /opt/cni/bin/flannel without build-id.
                                                       Stack trace of thread 4195:
                                                       #0  0x00000000004729c0 _rt0_amd64_linux (/opt/cni/bin/flannel + 0x729c0)
                                                       ELF object binary architecture: AMD x86-64

Current Behavior

described up

Possible Solution

Cant suggest

Steps to Reproduce (for bugs)

  1. setup k8s
  2. apply cni Flannel

Your Environment

  • Flannel version: latest version
  • Backend used (e.g. vxlan or udp):
  • Etcd version: 3.5.15-0
  • Kubernetes version (if used): 1.31
  • Operating System and version: fedora 41
@thomasferrandiz
Copy link
Contributor

We need more information to help.
Can you share logs from the flannel pod and the content of the flannel configmap?

@eliassal
Copy link
Author

eliassal commented Dec 10, 2024

Hi Thomas, enclosed 2 logs for 2 flannel pods 1 running on the master, 2nd on the worker node
but when I issue the command
kubectl logs kube-flannel-ds-fptps -n kube-flannel > flannellog1.txt
I get

Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
flannelPodOnMaster.txt
flannelPodLog.txt

here is the configmap for the pod in question
$ kubectl get configmaps kube-flannel-cfg -n kube-flannel -o yaml

apiVersion: v1
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "EnableNFTables": false,
      "Backend": {
        "Type": "vxlan"
      }
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"cni-conf.json":"{\n  \"name\": \"cbr0\",\n  \"cniVersion\": \"0.3.1\",\n  \"plugins\": [\n    {\n      \"type\": \"flannel\",\n      \"delegate\": {\n        \"hairpinMode\": true,\n        \"isDefaultGateway\": true\n      }\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\n        \"portMappings\": true\n      }\n    }\n  ]\n}\n","net-conf.json":"{\n  \"Network\": \"10.244.0.0/16\",\n  \"EnableNFTables\": false,\n  \"Backend\": {\n    \"Type\": \"vxlan\"\n  }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"flannel","k8s-app":"flannel","tier":"node"},"name":"kube-flannel-cfg","namespace":"kube-flannel"}}
  creationTimestamp: "2024-11-16T17:41:27Z"
  labels:
    app: flannel
    k8s-app: flannel
    tier: node
  name: kube-flannel-cfg
  namespace: kube-flannel
  resourceVersion: "581"
  uid: 4efe798e-9d71-442a-bc82-419a5d3fdc76

@eliassal
Copy link
Author

eliassal commented Dec 10, 2024

Hi Thomas, any feedback please?

@thomasferrandiz
Copy link
Contributor

Hi @eliassal
I don't see anything wrong in your flannel configuration.

However based on these lines in your log:

pkg/subnet/kube/kube.go:470: failed to list *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes?resourceVersion=547538": dial tcp 10.96.0.1:443: connect: no route to host

it looks like you have a network issue between the nodes.

Can they ping each other?
Is there a firewall active?

Based on the coredump log in your original message, it's the flannel CNI binary called by the kubelet and not the flanneld daemon running in the pod that is crashing so I think the most likely explanation is that there is an issue on the host itself not in flannel or the kubernetes deployment.

@eliassal
Copy link
Author

Yes ping works fine between both
Firewall is inactive state on both
What possible issue on the host itself? I really have no clue? When I create deployments or replicasets they work fine

@thomasferrandiz
Copy link
Contributor

I don't know exactly but if you get no route to host errors in the flannel logs it means that there is some kind of connectivity issue between the two nodes.

You could check other errors or warnings with journalctl or the kernel logs with dmesg maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants