fix: remove ~48 transitive dependencies from asset packaging#1654
Conversation
The zip tool used to package CDK file assets relied on `archiver`, which pulls ~50 transitive packages into the CLI and toolkit libraries. Swap it for `yazl`, a focused write-only ZIP library, reducing that to 2 packages (`yazl` + `buffer-crc32`). The change is behaviour-preserving: archives remain byte-for-byte deterministic (entry dates pinned to the 1980 epoch), Unix file modes are preserved, symlinks are followed, and output is still streamed to disk. Both libraries delegate compression to Node's native zlib, so asset packaging is slightly faster (8-11% in local benchmarks) at comparable memory. Existing zip unit tests pass unchanged.
Benchmark Report: Replacing
|
| Library | Transitive deps | Compressor | Streaming | Mode support | Notes |
|---|---|---|---|---|---|
archiver (current) |
~50 | native zlib |
✅ | mode option |
general-purpose, heavy dep tree |
yazl |
2 (buffer-crc32) |
native zlib |
✅ (outputStream) |
native mode option |
write-only, mature/stable, MIT |
fflate |
1 (zero deps) | pure-JS | ✅ (streaming API) | attrs: mode<<16 + os:3 |
tiny, modern, MIT, but JS compressor |
jszip |
several | pure-JS (pako) | partial | unixPermissions |
already a devDep; heavier |
Methodology
A self-contained harness (packages/@aws-cdk/private-tools/bench/, isolated
node_modules, not part of the build) measures each implementation through an
identical code path: fast-glob the directory, read each file, append serially
with the epoch date + file mode, and write the zip.
- Fixtures (both ~24 MB, deterministic seeded content via
gen-fixture.mjs):- mixed — 2009 files: 2000 small source-like files + 8 × 2 MB incompressible blobs + 1 executable.
- compressible — 6001 small source-like files (resembles a JS/Lambda bundle).
- Timing —
hyperfine --warmup 3 --runs 12(wall-clock, end-to-endnodeprocess). - Memory —
/usr/bin/time -lpeak resident set size (RSS). - Correctness —
verify.mjsloads each zip withjszipand asserts the
same guarantees as the unit test: byte-identical content, a single unique
date of1980-01-01T00:00:00.000Z, byte-identical output across two runs
(determinism), and a preserved executable bit. All implementations passed.
Environment
| OS | macOS 26.5.1 (arm64) |
| CPU | Apple M4 Pro, 14 cores |
| Node | v24.14.1 |
| hyperfine | 1.19.0 |
Results
Wall-clock — mixed fixture (24 MB, 2009 files)
| Implementation | Mean | vs archiver |
|---|---|---|
| yazl | 722 ms | 0.92× (8% faster) |
| archiver (current) | 789 ms | 1.00× |
| fflate (async / workers) | 825 ms ± 227 ms | 1.05× (high variance) |
| fflate (zipSync, in-memory) | 904 ms | 1.15× |
| fflate (streaming) | 1.028 s | 1.30× |
Wall-clock — compressible fixture (24 MB, 6001 files)
| Implementation | Mean | vs archiver |
|---|---|---|
| yazl | 1.722 s | 0.89× (11% faster) |
| fflate (zipSync, in-memory) | 1.912 s | 0.98× |
| fflate (async / workers) | 1.934 s | 1.00× |
| archiver (current) | 1.942 s | 1.00× |
| fflate (streaming) | 2.136 s | 1.10× |
Peak memory (RSS) — mixed fixture
| Implementation | Peak RSS | Streaming? |
|---|---|---|
| archiver (current) | 133 MB | ✅ |
| yazl | 146 MB | ✅ |
| fflate (zipSync) | 162 MB | ❌ (buffers whole archive) |
| fflate (streaming) | 164 MB | ✅ |
| fflate (async / workers) | 312 MB | ❌ (workers + buffering) |
Analysis
Why yazl wins
yazl and archiver both delegate compression to Node's native zlib,
which runs on the libuv threadpool and overlaps compression with file I/O.
yazl is a focused, write-only library with far less pipeline overhead than the
general-purpose archiver, so it is consistently a little faster while using a
comparable amount of memory — and it carries 48 fewer transitive packages.
Why fflate was rejected (despite zero deps)
fflate is an excellent pure-JavaScript compressor, but that is exactly the
problem on Node:
- Single-threaded JS deflate can't overlap with I/O the way native
zlib
does, so wall-clock is slower in every mode. - The streaming mode — the one we'd actually need to keep memory bounded —
is the slowest variant and saves no memory (164 MB vs archiver's 133 MB). zipSyncis faster but buffers the entire archive (and all inputs) in RAM.- The async / multi-threaded mode (fflate's headline feature) spawns a
worker per entry; on many small files (typical CDK assets) the overhead
dominates, producing unstable wall-clock (±227 ms) and 2.4× the memory
(312 MB). It only pays off for a handful of very large files.
fflate's real strengths — tiny bundle size and browser support — don't apply
to a Node-only CLI tool that already has native zlib available.
Implementation note: in
fflate's streaming API, the per-entrymtime,
attrs, andosmust be assigned as properties on theZipDeflate
instance, not passed via the constructor options (the constructor options
only reach the deflater). Passing them as options is silently ignored and the
archive falls back toDate.now(), breaking determinism. This was found and
fixed during testing.
Maintenance
| Package | Latest | Published | Weekly downloads |
|---|---|---|---|
yazl |
3.3.1 | 2024-11-23 | ~3.0M |
fflate |
0.8.3 | 2026-05-16 | ~52M |
yazl has a slower release cadence, but it is a small, feature-complete ZIP
writer; the ZIP format is stable, so "no recent release" reflects maturity
rather than abandonment. (The @indutny/yazl fork is also stale — 2024-02 — and
still depends on buffer-crc32, so it offers no advantage.)
Security
Authoritative advisory databases (OSV.dev and npm's GitHub Advisory DB) report
zero advisories, for any version, of both yazl and buffer-crc32.
A note on relevance: the well-known ZIP vulnerability classes (zip bombs,
malformed-header DoS, path traversal) live in readers/parsers that consume
untrusted archives — e.g. yauzl (the separate un-zip companion library) has
a DoS advisory. yazl is a writer fed our own files, so its attack surface
is structurally smaller. Dropping ~48 transitive packages also shrinks the
overall surface relative to archiver.
Decision
Adopt yazl. It satisfies every requirement and improves on the status quo:
- ✅ Fewer dependencies: ~50 → 2.
- ✅ Performance unchanged — in fact 8–11% faster wall-clock.
- ✅ Streaming / low-memory model preserved (comparable RSS).
- ✅ Functionally identical: deterministic, epoch dates, mode preserved.
- ✅ Clean security history; smaller attack surface.
Zero-dependency alternative (not adopted): a hand-rolled ~150-line ZIP
writer over native zlib would match yazl's performance with no runtime
dependency at all (a ~15-line CRC-32 is needed because zlib.crc32 requires
Node ≥ 20 and the repo supports Node ≥ 18). It was deemed not worth owning the
ZIP-format code when yazl provides it in two well-audited packages.
Dependency ReviewThe following issues were found:
License Issuespackages/@aws-cdk/cdk-assets-lib/package.json
packages/@aws-cdk/private-tools/package.json
packages/@aws-cdk/toolkit-lib/package.json
packages/aws-cdk/package.json
OpenSSF Scorecard
Scanned Files
|
|
Total lines changed 9790 is greater than 1000. Please consider breaking this PR down. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1654 +/- ##
==========================================
+ Coverage 88.73% 88.80% +0.06%
==========================================
Files 77 77
Lines 11365 11354 -11
Branches 1588 1584 -4
==========================================
- Hits 10085 10083 -2
+ Misses 1250 1241 -9
Partials 30 30
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Whenever the CDK Toolkit packages file assets — Lambda function code, S3 file assets and the like — it builds a ZIP archive. That work was previously done by
archiver, a dependency that brings roughly fifty transitive packages along with it into every install of the CDK CLI and@aws-cdk/toolkit-lib. This pull request replaces it withyazl, a focused, write-only ZIP library, which brings the footprint for this functionality down to just two packages (yazlandbuffer-crc32).The reason for the change is customer experience. A smaller dependency graph means a faster
npm installand a lighter footprint on disk and in CI for everyone who installs the CLI or builds on the toolkit library. It also shrinks the set of third-party code that has to be audited and patched when security advisories are published, which is a recurring cost for our users. Because the CDK only ever creates archives and never parses untrusted ones, a write-only library is all we need, so this also drops a large amount of reader and format-handling code that was never exercised in our use case.Asset packaging gets a little faster as a side effect. Both libraries hand compression to Node's native zlib, and in local benchmarks across representative asset trees
yazlproduced archives 8–11% faster thanarchiverat comparable memory use. The existing streaming behaviour is kept, so large assets are still written to disk without buffering the whole archive in memory.The change is otherwise invisible to users. Archives stay byte-for-byte deterministic — entry timestamps remain pinned to the 1980 epoch, so identical content continues to produce an identical hash — Unix file modes such as the executable bit are still preserved, and symbolic links are still followed. The existing unit tests for the zip tool pass unchanged.
A full benchmark and security report is attached as a separate comment on this PR.
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license