Skip to content

Compose rootfs from cached layers at boot instead of storing flattened copies #85

@JAORMX

Description

@JAORMX

Problem

Every distinct image digest gets its own fully-flattened rootfs directory in the cache (~/.cache/broodbox/images/sha256-*/). The claude-code, codex, and opencode images all share the same wolfi-base layer (~500MB+), but each stores a complete copy. With 3 agent images cached, the shared base content is duplicated 3x on disk.

Additionally, when an image is rebuilt with only a small top-layer change (e.g., updating Claude Code), the entire flattened rootfs is re-extracted and stored as a new entry — even though 95%+ of the content is identical layers already in the layer cache.

Proposal

Instead of flattening all layers into a single rootfs directory at pull time, compose the rootfs at boot time from individually-cached layers. go-microvm already has a LayerCache that stores extracted layers by DiffID — the building blocks are there.

Approach options

Option A — Overlayfs composition (Linux only):
Mount the cached layers as overlayfs lowerdir stack with a tmpfs upperdir. The VM gets a composed view without any copying. Fastest boot, lowest disk usage, but requires overlayfs support and appropriate privileges.

Option B — Reflink-based composition:
At boot time, compose layers bottom-to-top into rootfs-work/ using reflinks (FICLONE). Similar to today's CloneDir of the flattened rootfs, but starting from individual cached layers. Disk usage is reduced because the layer cache is shared, even though the composed result is a full directory tree.

Option C — Lazy composition with hardlinks:
Hardlink files from cached layers into the composed rootfs. Faster than copying, same disk blocks. Caveat: rootfs hooks that modify files in-place would modify the cached layer too — need COW semantics or hook-awareness.

Expected savings

Scenario Current With layer dedup
3 agent images (shared base) ~2.4 GB ~1.0 GB (base stored once)
Image rebuild (top layer only) +800 MB new entry +50 MB new top layer

Where this lives

This is primarily a go-microvm change:

  • image.Cache or a new rootfs.Composer would handle layer-to-rootfs composition
  • microvm.Run() would compose from layers instead of cloning the flattened cache entry
  • The flattened rootfs cache entries (sha256-*) could become optional or be replaced entirely by the layer-only cache

Notes

  • The layer cache (~/.cache/broodbox/images/layers/) already works and is populated during layered extraction
  • Layer ordering and whiteout handling during composition already exists in applyLayerToDir() in go-microvm's image/pull.go
  • This pairs well with layer-aware GC: only layers referenced by at least one cached image config need to be retained

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions