Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using flannel, the CoreDNS remains "pending" #2064

Open
lengcangche-gituhub opened this issue Sep 23, 2024 · 6 comments
Open

When using flannel, the CoreDNS remains "pending" #2064

lengcangche-gituhub opened this issue Sep 23, 2024 · 6 comments

Comments

@lengcangche-gituhub
Copy link

lengcangche-gituhub commented Sep 23, 2024

When I use flannel as the pod network of Kubernetes, the coredns pod stays in the "pending" status.

Expected Behavior

The coredns pod should be in the "running" status.

Current Behavior

NAME                                   READY   STATUS    RESTARTS   AGE
coredns-66f779496c-8ddns               0/1     Pending   0          4m19s
coredns-66f779496c-b54jf               0/1     Pending   0          4m19s
etcd-davinci-mini                      1/1     Running   6          4m36s
kube-apiserver-davinci-mini            1/1     Running   6          4m32s
kube-controller-manager-davinci-mini   1/1     Running   7          4m32s
kube-proxy-g7l22                       1/1     Running   0          4m19s
kube-scheduler-davinci-mini            1/1     Running   7          4m32s

Steps to Reproduce (for bugs)

1.kubeadm init --pod-network-cidr=100.100.0.0/16 --image-repository=registry.aliyuncs.com/google_containers --apiserver-advertise-address=192.168.1.122 (192.168.1.122 is my private IP address, which is available to the cluster nodes)
2.mkdir -p $HOME/.kube
3.sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
4.sudo chown $(id -u):$(id -g) $HOME/.kube/config (steps 2,3,4 are directed by the above kubeadm init command's output)
5.kubectl apply -f kube-flannel.yml (this file is attached)
6.kubectl get pod -n kube-system

Your Environment

  • Backend used:vxlan
  • Kubernetes version: 1.28
  • Operating System and version: Linux davinci-mini 5.10.0+ (on Huawei's NPU Atlas)

Attached: the content of "kube-flannel.yml"

---
kind: Namespace
apiVersion: v1
metadata:
  name: kube-flannel
  labels:
    k8s-app: flannel
    pod-security.kubernetes.io/enforce: privileged
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  labels:
    k8s-app: flannel
  name: flannel
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  labels:
    k8s-app: flannel
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: flannel
  name: flannel
  namespace: kube-flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-flannel
  labels:
    tier: node
    k8s-app: flannel
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "100.100.0.0/16",
      "EnableNFTables": false,
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-flannel
  labels:
    tier: node
    app: flannel
    k8s-app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni-plugin
        image: rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
        command:
        - cp
        args:
        - -f
        - /flannel
        - /opt/cni/bin/flannel
        volumeMounts:
        - name: cni-plugin
          mountPath: /opt/cni/bin
      - name: install-cni
        image: rancher/mirrored-flannelcni-flannel:v0.18.1
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: rancher/mirrored-flannelcni-flannel:v0.18.1
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: false
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: EVENT_QUEUE_DEPTH
          value: "5000"
        volumeMounts:
        - name: run
          mountPath: /run/flannel
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
        - name: xtables-lock
          mountPath: /run/xtables.lock
      volumes:
      - name: run
        hostPath:
          path: /run/flannel
      - name: cni-plugin
        hostPath:
          path: /opt/cni/bin
      - name: cni
        hostPath:
          path: /etc/cni/net.d
      - name: flannel-cfg
        configMap:
          name: kube-flannel-cfg
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
@rbrtbnfgl
Copy link
Contributor

If you do kubectl describe for the coredns pod what do you get?

@lengcangche-gituhub
Copy link
Author

If you do kubectl describe for the coredns pod what do you get?

I'm sorry. There is no flannel pod.

@rbrtbnfgl
Copy link
Contributor

Flannel create its own namespace you should check kube-flannel namespace

@f5-jay
Copy link

f5-jay commented Nov 25, 2024

I'm running into the same issue. Nodes never go into "ready" state after deploying flannel.

root@master # kubectl get nodes
NAME STATUS ROLES AGE VERSION
master.lab NotReady control-plane 37m v1.31.3
node1.lab NotReady 36m v1.31.3
node2.lab NotReady 36m v1.31.3

root@master # kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-98rss 1/1 Running 0 10m
kube-flannel kube-flannel-ds-bdc7p 1/1 Running 0 10m
kube-flannel kube-flannel-ds-d8dvd 1/1 Running 0 10m
kube-system coredns-7c65d6cfc9-phvrv 0/1 Pending 0 22m
kube-system coredns-7c65d6cfc9-z7rd8 0/1 Pending 0 22m
kube-system etcd-master.lab 1/1 Running 1 22m
kube-system kube-apiserver-master.lab 1/1 Running 1 22m
kube-system kube-controller-manager-master.lab 1/1 Running 1 22m
kube-system kube-proxy-4j6wx 1/1 Running 0 22m
kube-system kube-proxy-4kdlb 1/1 Running 0 22m
kube-system kube-proxy-gmh8t 1/1 Running 0 22m
kube-system kube-scheduler-master.lab 1/1 Running 1 22m

Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


NetworkUnavailable False Sun, 24 Nov 2024 23:26:33 -0700 Sun, 24 Nov 2024 23:26:33 -0700 FlannelIsUp Flannel is running on this node
MemoryPressure False Sun, 24 Nov 2024 23:31:21 -0700 Sun, 24 Nov 2024 22:58:25 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 24 Nov 2024 23:31:21 -0700 Sun, 24 Nov 2024 22:58:25 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sun, 24 Nov 2024 23:31:21 -0700 Sun, 24 Nov 2024 22:58:25 -0700 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Sun, 24 Nov 2024 23:31:21 -0700 Sun, 24 Nov 2024 22:58:25 -0700 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/cni/net.d/. Has your network provider started?

root@master # ll /etc/cni/net.d/
total 4
-rw-r--r-- 1 root root 292 Nov 24 23:26 10-flannel.conflist

root@master # cat /etc/cni/net.d/10-flannel.conflist
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}

root@master # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:20:e2:c7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.80/22 brd 192.168.3.255 scope global noprefixroute enp1s0
valid_lft forever preferred_lft forever
inet6 ::17f:ac00::80/64 scope global noprefixroute
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe20:e2c7/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:f7:2c:81 brd ff:ff:ff:ff:ff:ff
4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:8f:e1:b2 brd ff:ff:ff:ff:ff:ff
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether aa:73:de:d6:05:59 brd ff:ff:ff:ff:ff:ff
inet 10.68.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::a873:deff:fed6:559/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
6: flannel-v6.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UNKNOWN group default
link/ether b2:e9:8b:31:87:c8 brd ff:ff:ff:ff:ff:ff
inet6 fc00:10:68::/128 scope global
valid_lft forever preferred_lft forever
inet6 fe80::b0e9:8bff:fe31:87c8/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever

Fedora 41
kernel 6.11.8-300.fc41
kubelet 1.31.3-150500.1.1
cri-o 1.31.2-150500.1.1

@f5-jay
Copy link

f5-jay commented Nov 25, 2024

This might be a clue. I observed this error when running: journalctl -u crio -f

Nov 24 23:43:08 master.lab crio[26766]: time="2024-11-24 23:43:08.876402310-07:00" level=warning msg="Error validating CNI config file /etc/cni/net.d/10-flannel.conflist: [failed to find plugin "portmap" in path [/opt/cni/bin/]]"

@f5-jay
Copy link

f5-jay commented Nov 25, 2024

Ok, just figured it out. I had to add this to /etc/crio/crio.conf and restart the crio service on each node. Everything is working now.

[crio.network]
plugin_dir = "/usr/libexec/cni"

For some reason Fedora uses a different plugin directory /usr/libexec/cni. Not sure the background on why its different with Fedora (and likely other redhat based distros). Ubuntu uses /opt/cni/bin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants