ODF disk-cleaner

Python and Shell script packaged as Docker images to clean up disks after an ODF (OpenShift Data Foundation) uninstall.

Recovering ODF node with OSD and MON pods in CrashLoopBackOff due to improper disk cleanup

An OpenShift Data Foundation (ODF) installation can fail with one or multiple nodes in the cluster failing to enter the Ready state. Associated Ceph osd and mon pods get stuck in CrashLoopBackOff. The suspected root cause is improper or incomplete disk cleanup of the node.

Root cause

Incorrect or incomplete cleanup of local disks causes issues with Ceph OSD/MON pods and node readiness.

Resolution steps

Remove the node from Local Storage Operator
- Edit the LocalVolumeDiscovery and LocalVolumeSet CRs to remove the affected node

Drain and cordon the node:

oc adm drain <node-name> --ignore-daemonsets --delete-emptydir-data
oc adm cordon <node-name>

Remove PersistentVolumes by identifying and deleting PersistentVolume objects related to the node in openshift-storage namespace:
```
oc get pv | grep <node-name>
oc delete pv <pv-name>
```
Manual disk cleanup on node by running the scripts:
1. Authenticate against the OpenShift-cluster
2. Start a debug pod: oc debug node/<node>
3. Change root to host to access all binaries and files: chroot /host
4. Run the script as a container: podman run ghcr.io/stakater/odf-disk-cleaner:vX.Y.Z --disks "/path/to/disk1 /path/to/disk2 ..."
Reboot the node: sudo reboot
Re-add node to Local Storage Operator:
- Revert the PR or update the LocalVolumeDiscovery and LocalVolumeSet CRs to add the node back
- This will cause discovery pods to start running again on the node
Delete all pods in openshift-storage namespace to force recreation: oc delete pod --all -n openshift-storage
Verify ODF health
- Check Ceph cluster health: oc get cephcluster -n openshift-storage
- Confirm all pods are running and the node is back in Ready state:
```
oc get nodes
oc get pods -n openshift-storage
```

Expected outcome

Node successfully rejoined the cluster
All OSD and MON pods are stabilized
ODF cluster health returned to healthy

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.vale.ini		.vale.ini
Dockerfile-python		Dockerfile-python
Dockerfile-shell		Dockerfile-shell
LICENSE		LICENSE
README.md		README.md
disk-cleanup.py		disk-cleanup.py
disk-cleanup.sh		disk-cleanup.sh
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

ODF disk-cleaner

Recovering ODF node with OSD and MON pods in CrashLoopBackOff due to improper disk cleanup

Root cause

Resolution steps

Expected outcome

About

Uh oh!

Releases 4

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Uh oh!

License

stakater/odf-disk-cleaner

Folders and files

Latest commit

History

Repository files navigation

ODF disk-cleaner

Recovering ODF node with OSD and MON pods in CrashLoopBackOff due to improper disk cleanup

Root cause

Resolution steps

Expected outcome

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages