Skip to content

Clean up orphaned rootfs-work dirs from crashed VMs #87

@JAORMX

Description

@JAORMX

Problem

When a VM crashes or the parent bbox process is killed (SIGKILL, OOM, etc.), WithCleanDataDir() never runs. The COW-cloned rootfs at ~/.config/broodbox/vms/<name>/data/rootfs-work/ survives — potentially ~800MB per orphaned VM.

Brood-box already has stale cleanup for two similar cases:

  • infravm.CleanupStaleLogs() removes old VM log directories using PID sentinel files
  • infraws.CleanupStaleSnapshots() removes old workspace snapshots

But orphaned rootfs-work/ dirs inside VM data directories are not covered.

Proposal

Extend the existing stale cleanup to also handle orphaned VM data directories (which contain the rootfs-work/ clone).

Approach

go-microvm already persists VM state in <dataDir>/state.json with the runner PID and an active flag. The cleanup logic should:

  1. On startup, scan ~/.config/broodbox/vms/*/data/state.json
  2. For each entry where active: true, check if the PID is still alive
  3. If the PID is dead, the VM was orphaned — remove the entire data directory (including rootfs-work/)

go-microvm's terminateStaleRunner() already does steps 1-2 for killing orphaned processes. The data dir cleanup is the missing step 3 — cleanDataDir handles it for the current VM's data dir, but not for data dirs from other crashed VMs.

Where this lives

This could be:

  • A new CleanupStaleVMData() function in internal/infra/vm/ alongside the existing cleanup helpers
  • Called from the composition root (cmd/bbox/main.go) on startup, next to the existing CleanupStaleLogs and CleanupStaleSnapshots calls

Notes

  • The PID-liveness check is already battle-tested in the existing stale cleanup code
  • This is safe for concurrent VMs: each VM has its own named data directory, and we only clean dirs whose PID is confirmed dead
  • This pairs with the image cache GC work — cleaning both the rootfs cache and orphaned rootfs clones covers both sides of the disk waste

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions