Skip to content

Commit

Permalink
Add health check including the storage network
Browse files Browse the repository at this point in the history
Signed-off-by: Jian Wang <[email protected]>
  • Loading branch information
w13915984028 committed Aug 30, 2024
1 parent 6d509ff commit 2423253
Showing 1 changed file with 89 additions and 0 deletions.
89 changes: 89 additions & 0 deletions kb/2024-07-22/harvester_cluster_shutdown_and_restart.md
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,95 @@ harv43 Ready control-plane,etcd,master 54d v1.27.10+rke2r1
```

#### Healthy Check

##### Basic Components

Harvester deploys some basic components on the following namespaces. When a bare-metal server is powered on, it may take upto around 15 minutes for the Harvester OS to be running and all the deployments on this node to be ready.

If any of them does not have a `Running` status and continues to fail, a troubleshooting is needed to confirm the root cause.

```
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-fleet-local-system fleet-agent-645766877f-bt424 1/1 Running 0 11m
cattle-fleet-system fleet-controller-57f78dcd48-5tkkj 1/1 Running 4 (14m ago) 42h
cattle-fleet-system gitjob-d5bb7b548-jscgk 1/1 Running 2 (14m ago) 42h
cattle-system harvester-cluster-repo-6c6458bd46-7jcrl 1/1 Running 2 (14m ago) 42h
cattle-system system-upgrade-controller-6f86d6d4df-f8jg7 1/1 Running 2 (14m ago) 42h
cattle-system rancher-7bc9d94b87-g4k4v 1/1 Running 3 (14m ago) 42h // note: if embedded Rancher was stopped in the above steps, it is not Running now
cattle-system rancher-webhook-6c5c6fbb65-2cbbs 1/1 Running 2 (14m ago) 42h
harvester-system harvester-787b467f4-qlfwt 1/1 Running 2 (14m ago) 39h
harvester-system harvester-load-balancer-56d9c8758c-cvcmk 1/1 Running 2 (14m ago) 42h
harvester-system harvester-load-balancer-webhook-6b4d4d9d6b-4tsgl 1/1 Running 2 (14m ago) 42h
harvester-system harvester-network-controller-9pzxh 1/1 Running 2 (14m ago) 42h
harvester-system harvester-network-controller-manager-69bcf67c7f-44zqj 1/1 Running 2 (14m ago) 42h
harvester-system harvester-network-webhook-6c5d48bdf5-8kn9r 1/1 Running 2 (14m ago) 42h
harvester-system harvester-node-disk-manager-c4c5k 1/1 Running 3 (14m ago) 42h
harvester-system harvester-node-manager-qbvbr 1/1 Running 2 (14m ago) 42h
harvester-system harvester-node-manager-webhook-6d8b48f559-m5shk 1/1 Running 2 (14m ago) 42h
harvester-system harvester-webhook-87dc4cdd8-jg2q6 1/1 Running 2 (14m ago) 39h
harvester-system kube-vip-n4s8l 1/1 Running 3 (14m ago) 42h
harvester-system virt-api-799b99fb65-g8wgq 1/1 Running 2 (14m ago) 42h
harvester-system virt-controller-86b84c8f8f-4hhlg 1/1 Running 2 (14m ago) 42h
harvester-system virt-controller-86b84c8f8f-krq4f 1/1 Running 3 (14m ago) 42h
harvester-system virt-handler-j9gwn 1/1 Running 2 (14m ago) 42h
harvester-system virt-operator-7585847fbc-hvs26 1/1 Running 2 (14m ago) 42h
kube-system cloud-controller-manager-harv41 1/1 Running 5 (14m ago) 42h
kube-system etcd-harv41 1/1 Running 2 42h
kube-system harvester-snapshot-validation-webhook-8594c5f8f8-8mk57 1/1 Running 2 (14m ago) 42h
kube-system harvester-snapshot-validation-webhook-8594c5f8f8-dkjmf 1/1 Running 2 (14m ago) 42h
kube-system harvester-whereabouts-cpqvl 1/1 Running 2 (14m ago) 42h
kube-system kube-apiserver-harv41 1/1 Running 2 42h
kube-system kube-controller-manager-harv41 1/1 Running 4 (14m ago) 42h
kube-system kube-proxy-harv41 1/1 Running 2 (14m ago) 42h
kube-system kube-scheduler-harv41 1/1 Running 2 (14m ago) 42h
kube-system rke2-canal-d5kmc 2/2 Running 4 (14m ago) 42h
kube-system rke2-coredns-rke2-coredns-84b9cb946c-qbwnb 1/1 Running 2 (14m ago) 42h
kube-system rke2-coredns-rke2-coredns-autoscaler-b49765765-6bjsk 1/1 Running 2 (14m ago) 42h
kube-system rke2-ingress-nginx-controller-cphgw 1/1 Running 2 (14m ago) 42h
kube-system rke2-metrics-server-655477f655-gsnsc 1/1 Running 2 (14m ago) 42h
kube-system rke2-multus-8nqg4 1/1 Running 2 (14m ago) 42h
kube-system snapshot-controller-5fb6d65787-nmjdh 1/1 Running 2 (14m ago) 42h
kube-system snapshot-controller-5fb6d65787-phvq7 1/1 Running 3 (14m ago) 42h
longhorn-system backing-image-manager-5c32-ea70 1/1 Running 0 13m
longhorn-system csi-attacher-749459cf65-2x792 1/1 Running 6 (13m ago) 42h
longhorn-system csi-attacher-749459cf65-98tj4 1/1 Running 5 (13m ago) 42h
longhorn-system csi-attacher-749459cf65-nwglq 1/1 Running 5 (13m ago) 42h
longhorn-system csi-provisioner-775b4f76f4-h9mwd 1/1 Running 5 (13m ago) 42h
longhorn-system csi-provisioner-775b4f76f4-nvjzt 1/1 Running 5 (13m ago) 42h
longhorn-system csi-provisioner-775b4f76f4-zvd6w 1/1 Running 5 (13m ago) 42h
longhorn-system csi-resizer-68867d54f5-4hf5j 1/1 Running 5 (13m ago) 42h
longhorn-system csi-resizer-68867d54f5-fs9ht 1/1 Running 5 (13m ago) 42h
longhorn-system csi-resizer-68867d54f5-ht5hj 1/1 Running 6 (13m ago) 42h
longhorn-system csi-snapshotter-8469656cc7-6c47f 1/1 Running 6 (13m ago) 42h
longhorn-system csi-snapshotter-8469656cc7-9kk2v 1/1 Running 5 (13m ago) 42h
longhorn-system csi-snapshotter-8469656cc7-vf9z4 1/1 Running 5 (13m ago) 42h
longhorn-system engine-image-ei-94d5ee6c-pqx9h 1/1 Running 2 (14m ago) 42h
longhorn-system instance-manager-beb75434e263a2aa9eedc0609862fed2 1/1 Running 0 13m
longhorn-system longhorn-csi-plugin-85qm7 3/3 Running 14 (13m ago) 42h
longhorn-system longhorn-driver-deployer-6448498bc6-sv857 1/1 Running 2 (14m ago) 42h
longhorn-system longhorn-loop-device-cleaner-bqg9v 1/1 Running 2 (14m ago) 42h
longhorn-system longhorn-manager-nhxbl 2/2 Running 6 (14m ago) 42h
longhorn-system longhorn-ui-7f56fcf5ff-clc8b 1/1 Running 6 (13m ago) 42h
longhorn-system longhorn-ui-7f56fcf5ff-m95sh 1/1 Running 7 (13m ago) 42h
```

:::note

If Longhorn PODs continue to fail to start due to any reasons, do not execute the following steps as many of them rely on the Longhorn to provision persistant volumes for running.

:::

##### Storage Network

When the [Storage Network](https://docs.harvesterhci.io/v1.3/advanced/storagenetwork) has been enabled on the cluster, follow [those steps] (https://docs.harvesterhci.io/v1.3/advanced/storagenetwork#verify-configuration-is-completed) to check if the Longhorn PODs have the correct second IP assigned to them.

### 5.3 Enable Addons

Enable those previously disabled addons, wait until they are `DepoloySuccessful`.
Expand Down

0 comments on commit 2423253

Please sign in to comment.