-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
When a VM crashes or the parent bbox process is killed (SIGKILL, OOM, etc.), WithCleanDataDir() never runs. The COW-cloned rootfs at ~/.config/broodbox/vms/<name>/data/rootfs-work/ survives — potentially ~800MB per orphaned VM.
Brood-box already has stale cleanup for two similar cases:
infravm.CleanupStaleLogs()removes old VM log directories using PID sentinel filesinfraws.CleanupStaleSnapshots()removes old workspace snapshots
But orphaned rootfs-work/ dirs inside VM data directories are not covered.
Proposal
Extend the existing stale cleanup to also handle orphaned VM data directories (which contain the rootfs-work/ clone).
Approach
go-microvm already persists VM state in <dataDir>/state.json with the runner PID and an active flag. The cleanup logic should:
- On startup, scan
~/.config/broodbox/vms/*/data/state.json - For each entry where
active: true, check if the PID is still alive - If the PID is dead, the VM was orphaned — remove the entire data directory (including
rootfs-work/)
go-microvm's terminateStaleRunner() already does steps 1-2 for killing orphaned processes. The data dir cleanup is the missing step 3 — cleanDataDir handles it for the current VM's data dir, but not for data dirs from other crashed VMs.
Where this lives
This could be:
- A new
CleanupStaleVMData()function ininternal/infra/vm/alongside the existing cleanup helpers - Called from the composition root (
cmd/bbox/main.go) on startup, next to the existingCleanupStaleLogsandCleanupStaleSnapshotscalls
Notes
- The PID-liveness check is already battle-tested in the existing stale cleanup code
- This is safe for concurrent VMs: each VM has its own named data directory, and we only clean dirs whose PID is confirmed dead
- This pairs with the image cache GC work — cleaning both the rootfs cache and orphaned rootfs clones covers both sides of the disk waste