Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1 can not running #2031

Open
bencyq opened this issue Aug 13, 2024 · 8 comments
Open

docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1 can not running #2031

bencyq opened this issue Aug 13, 2024 · 8 comments

Comments

@bencyq
Copy link

bencyq commented Aug 13, 2024

Expected Behavior

k8s pod kube-flannel-ds-vjhqf is ready; docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1 is functioning properly

Current Behavior

k8s pod kube-flannel-ds-vjhqf is always at state: Init:RunContainerError
pod kube-flannel-ds-vjhqf failed at

Init Containers:
  install-cni-plugin:
    Container ID:  
    Image:         docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /flannel
      /opt/cni/bin/flannel
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      **cannot join network of a non running container: 0aae74cae93531c157f438f6f3aed81ca48f5fff388e3a6b6ddabc6d69837884**
      Exit Code:    128
      Started:      Tue, 13 Aug 2024 10:53:59 +0800
      Finished:     Tue, 13 Aug 2024 10:53:59 +0800
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/cni/bin from cni-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xpgnc (ro)

Possible Solution

Steps to Reproduce (for bugs)

  1. $kubectl join ...
  2. $kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
  3. $kubectl get pods --all-namespaces
NAMESPACE      NAME                                 READY   STATUS                   RESTARTS           AGE
kube-flannel   kube-flannel-ds-vjhqf                0/1     Init:RunContainerError   0                  17d
kube-flannel   kube-flannel-ds-x2xhk                1/1     Running                  0                  19d
kube-flannel   kube-flannel-ds-zzhls                1/1     Running                  0                  19d
kube-system    coredns-7db6d8ff4d-m4282             1/1     Running                  0                  27d
kube-system    coredns-7db6d8ff4d-tchbs             1/1     Running                  0                  27d
kube-system    etcd-k8s-master                      1/1     Running                  1 (27d ago)        27d
kube-system    kube-apiserver-k8s-master            1/1     Running                  1 (27d ago)        27d
kube-system    kube-controller-manager-k8s-master   1/1     Running                  1 (27d ago)        27d
kube-system    kube-proxy-nb2jc                     1/1     Running                  0                  20d
kube-system    kube-proxy-xx8vt                     0/1     CrashLoopBackOff         5170 (2m13s ago)   20d
kube-system    kube-proxy-zc9r8                     1/1     Running                  0                  27d
kube-system    kube-scheduler-k8s-master            1/1     Running                  1 (27d ago)        27d
  1. $kubectl describe pods kube-flannel-ds-vjhqf -n kube-flannel
Name:                 kube-flannel-ds-vjhqf
Namespace:            kube-flannel
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      flannel
Node:                 hd-ascend/10.90.1.237
Start Time:           Fri, 26 Jul 2024 10:57:59 +0800
Labels:               app=flannel
                      controller-revision-hash=bb4dc6cbf
                      k8s-app=flannel
                      pod-template-generation=2
                      tier=node
Annotations:          <none>
Status:               Pending
IP:                   10.90.1.237
IPs:
  IP:           10.90.1.237
Controlled By:  DaemonSet/kube-flannel-ds
Init Containers:
  install-cni-plugin:
    Container ID:  
    Image:         docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /flannel
      /opt/cni/bin/flannel
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      cannot join network of a non running container: 0aae74cae93531c157f438f6f3aed81ca48f5fff388e3a6b6ddabc6d69837884
      Exit Code:    128
      Started:      Tue, 13 Aug 2024 10:53:59 +0800
      Finished:     Tue, 13 Aug 2024 10:53:59 +0800
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/cni/bin from cni-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xpgnc (ro)
  install-cni:
    Container ID:  
    Image:         docker.io/flannel/flannel:v0.25.5
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /etc/kube-flannel/cni-conf.json
      /etc/cni/net.d/10-flannel.conflist
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xpgnc (ro)
Containers:
  kube-flannel:
    Container ID:  
    Image:         docker.io/flannel/flannel:v0.25.5
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:           kube-flannel-ds-vjhqf (v1:metadata.name)
      POD_NAMESPACE:      kube-flannel (v1:metadata.namespace)
      EVENT_QUEUE_DEPTH:  5000
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run/flannel from run (rw)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xpgnc (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False 
  Initialized                 False 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /run/flannel
    HostPathType:  
  cni-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  kube-api-access-xpgnc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 :NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason          Age                        From     Message
  ----     ------          ----                       ----     -------
  Warning  BackOff         41m (x698821 over 17d)     kubelet  Back-off restarting failed container install-cni-plugin in pod kube-flannel-ds-vjhqf_kube-flannel(75d80e9d-5ae7-4287-9184-ace04731cd62)
  Normal   SandboxChanged  6m27s (x1403921 over 17d)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          87s (x702562 over 17d)     kubelet  Container image "docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1" already present on machine

Context

node can never be ready in k8s cluster
I'm using an arm64 machine as node to join a x86 cluster, does it matter?

Your Environment

  • Flannel version: flannel:v0.25.5
  • Backend used (e.g. vxlan or udp):
  • Etcd version:
  • Kubernetes version (if used):v1.30.1
  • Operating System and version:centos7
  • Link to your project (optional):
@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Aug 14, 2024

So the only node that it's failing is the one with arm64? I'll check if the container for arm64 was rightly created.
Could you check the logs of the failing pod with kubectl?

@bencyq
Copy link
Author

bencyq commented Aug 16, 2024

So the only node that it's failing is the one with arm64? I'll check if the container for arm64 was rightly created. Could you check the logs of the failing pod with kubectl?

Thank you for your reply.
Here are the logs.

$ kubectl logs kube-flannel-ds-vjhqf -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
Error from server (BadRequest): container "kube-flannel" in pod "kube-flannel-ds-vjhqf" is waiting to start: PodInitializing

$ kubectl logs kube-proxy-xx8vt -n kube-system
failed to try resolving symlinks in path "/var/log/pods/kube-system_kube-proxy-xx8vt_7758f284-f039-4aeb-bbf5-11da20d35c8f/kube-proxy/6043.log": lstat /var/log/pods/kube-system_kube-proxy-xx8vt_7758f284-f039-4aeb-bbf5-11da20d35c8f/kube-proxy/6043.log: no such file or directory

@rbrtbnfgl
Copy link
Contributor

Which is the output for
kubectl logs kube-flannel-ds-vjhqf -n kube-flannel -c install-cni-plugin

@zhangguanzhang
Copy link
Contributor

cricrl ps -a 

@x3nb63
Copy link

x3nb63 commented Sep 4, 2024

I face a very similar looking problem: the Flannel DaemonSet pod fails to come up.

It fails on the install-cni-plugin init-container which gets into state CreateContainerConfigError pretty much immediatelly.

NAME↑               PF IMAGE                                                 READY  STATE                       INIT   RESTARTS PROBES(L:R) CPU/R:L MEM/R:L PORTS AGE
install-cni         ●  docker.io/flannel/flannel:v0.25.6                     true   Completed                   true          0 off:off         0:0     0:0       6h22
install-cni-plugin  ●  docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel2  false  CreateContainerConfigError  true          0 off:off         0:0     0:0       6h22
kube-flannel        ●  docker.io/flannel/flannel:v0.25.6                     false  Unknown                     false         0 off:off       100:0    50:0       6h22

It does not give any output:

unable to retrieve container logs for containerd://f00f752cb6d46e4b2f866d5f6ec5ca3be330353121177d7480db396ecace6904

as I understand CreateContainerConfigError no output is to be expected as the error happens before any binary/entrypoint/... from the image gets started

kubectl describe pod kube-flannell-ds-59hwh has this error:

Warning  Failed          16m (x12 over 18m)    kubelet          Error: services have not yet been read at least once, cannot construct envvars

This comes while upgrading Kubernetes from v1.30.3 to v1.31.0 as in "it happens with all nodes I reboot into the later version";

Looking into CHANGELOG-1.31 I am lost at what could be releated.

I somehow guess it may be related to the use of Downward API for two env: variables, which get filled via fieldRef: -> fieldPath: metadata.XYZ. Thats more guessing then knowing.

@x3nb63
Copy link

x3nb63 commented Sep 5, 2024

I reversed my Kubernetes version from v1.31.0 to v1.30.3 and the flannel-cni-plugin:v1.5.1-flannel2 init-container succeeds making flannel:v0.25.6 startup fine as a consequence

... so I think there is clearly a problem coming from the changes with Kubernetes v1.31.0.

@thomasferrandiz
Copy link
Contributor

Hi
I tested flannel with k8s 1.31 on both amd64 and arm64 and I had no issue.
My test was on Ubuntu 24.04.

Can you show the kernel version that you're using and the kernel logs?

@x3nb63
Copy link

x3nb63 commented Sep 6, 2024

the system is

$ uname -a
Linux kc04  6.6.43-flatcar #1 SMP PREEMPT_DYNAMIC Mon Aug  5 20:36:27 -00 2024 x86_64 Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz GenuineIntel GNU/Linux

$ cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3975.2.0
VERSION_ID=3975.2.0
BUILD_ID=2024-08-05-2103
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3975.2.0 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3975.2.0:*:*:*:*:*:*:*"

to switch the K8s version I toggle the /etc/extensions/kubernetes.raw softlink between /opt/extensions/kubernetes/kubernetes-v1.30.3-x86-64.raw and /opt/extensions/kubernetes/kubernetes-v1.31.0-x86-64.raw

(this is Flatcars way of "blending in" software utilizing systemd-sysext with their sysext-bakery

For kernel logs I can't do that right now, as I would need to take down a node to get a clean one and they are all busy.

Note that the kernel version and OS version does not change here, I really only toggle the K8s binaries and reboot to have it all start properly after "blending".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants