-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
Every distinct image digest gets its own fully-flattened rootfs directory in the cache (~/.cache/broodbox/images/sha256-*/). The claude-code, codex, and opencode images all share the same wolfi-base layer (~500MB+), but each stores a complete copy. With 3 agent images cached, the shared base content is duplicated 3x on disk.
Additionally, when an image is rebuilt with only a small top-layer change (e.g., updating Claude Code), the entire flattened rootfs is re-extracted and stored as a new entry — even though 95%+ of the content is identical layers already in the layer cache.
Proposal
Instead of flattening all layers into a single rootfs directory at pull time, compose the rootfs at boot time from individually-cached layers. go-microvm already has a LayerCache that stores extracted layers by DiffID — the building blocks are there.
Approach options
Option A — Overlayfs composition (Linux only):
Mount the cached layers as overlayfs lowerdir stack with a tmpfs upperdir. The VM gets a composed view without any copying. Fastest boot, lowest disk usage, but requires overlayfs support and appropriate privileges.
Option B — Reflink-based composition:
At boot time, compose layers bottom-to-top into rootfs-work/ using reflinks (FICLONE). Similar to today's CloneDir of the flattened rootfs, but starting from individual cached layers. Disk usage is reduced because the layer cache is shared, even though the composed result is a full directory tree.
Option C — Lazy composition with hardlinks:
Hardlink files from cached layers into the composed rootfs. Faster than copying, same disk blocks. Caveat: rootfs hooks that modify files in-place would modify the cached layer too — need COW semantics or hook-awareness.
Expected savings
| Scenario | Current | With layer dedup |
|---|---|---|
| 3 agent images (shared base) | ~2.4 GB | ~1.0 GB (base stored once) |
| Image rebuild (top layer only) | +800 MB new entry | +50 MB new top layer |
Where this lives
This is primarily a go-microvm change:
image.Cacheor a newrootfs.Composerwould handle layer-to-rootfs compositionmicrovm.Run()would compose from layers instead of cloning the flattened cache entry- The flattened rootfs cache entries (
sha256-*) could become optional or be replaced entirely by the layer-only cache
Notes
- The layer cache (
~/.cache/broodbox/images/layers/) already works and is populated during layered extraction - Layer ordering and whiteout handling during composition already exists in
applyLayerToDir()in go-microvm'simage/pull.go - This pairs well with layer-aware GC: only layers referenced by at least one cached image config need to be retained