fix(producer): localize remote media sources + strip audio crossorigin#1146
Conversation
Two bugs affecting compositions that use remote S3 URLs for video/audio. Bug 1 — Remote <video>/<audio> sources cause blank frames The renderer (Puppeteer) must buffer all video elements to readyState >= 2 before frame capture begins. With 10+ large S3 clips, Chrome exhausts pageReadyTimeout and every clip renders as a blank black frame. Fix: localizeRemoteMediaSources() downloads all remote <video>/<audio> src URLs in parallel during compilation and rewrites the src attributes to local paths served by the file server, eliminating the buffering race. Bug 2 — crossorigin on <audio> elements not stripped htmlCompiler.ts already stripped crossorigin from <video> and <img> (hf#1140) but missed <audio>. Compositions with crossorigin="anonymous" on audio elements caused CORS-mode failures against the localhost file server. Extended the strip to cover <audio>. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jrusso1020
left a comment
There was a problem hiding this comment.
Reviewed at 63b8f1dd. Two real fixes (remote-media localization + <audio crossorigin> strip), each well-motivated. One real correctness issue on Windows; one notable test-coverage gap; rest is minor. PR body has the description (Home's "no description" framing was stale — same pattern as hf#1129/1140).
Diagnosis verified
Both root causes check out:
- Remote media → blank frames. Renderer waits for
<video readyState >= 2>before capture; with 10+ S3 clips the network buffering blowspageReadyTimeout;frameCapture.tswarns + emits black rectangles. Localizing the sources before render eliminates the network race. ✓ <audio crossorigin>not stripped.htmlCompiler.tshad<video>(pre-#1140) and<img>(#1140) strips but no<audio>. S3 buckets without CORS headers reject the localhost preflight; audio element ends in failed network state. ✓ The added strip mirrors the existing pattern exactly. Comment makes the orthogonality with the FFmpeg audio path explicit, which is nice — clears up "why isn't this a regression?"
Real blocker — absPath.split("/").at(-1) is non-portable
// htmlCompiler.ts (new code in localizeRemoteMediaSources)
const relPath = `${REMOTE_MEDIA_SUBDIR}/${absPath.split("/").at(-1)}`;downloadToTemp builds localPath via Node's path.join, which is OS-aware. On Windows that produces backslash-separated paths like C:\tmp\foo\_remote_media\download_abc.mp4. Splitting on / returns the whole path as one element, so relPath ends up as _remote_media/C:\tmp\foo\_remote_media\download_abc.mp4 — wrong URL for the file server + garbage when written into the HTML src attribute.
CI runs Tests on windows-latest. Currently green because the new code path doesn't have a Windows-platform regression test (more on this below). Production renders on Windows would silently regress.
Fix is one line:
import { basename } from "path";
// ...
const relPath = `${REMOTE_MEDIA_SUBDIR}/${basename(absPath)}`;The / separator in relPath is correct as-is — that's a URL path, not a filesystem path, and HTML src attributes always use / regardless of OS. The issue is just the basename extraction.
Test coverage gap on the main fix
localizeRemoteMediaSources is ~80 lines of substantive logic — download, dedup by URL, fallback on download failure, HTML rewrite across both quote styles, basename derivation, externalAssets registration. The PR adds one new test (strips crossorigin from <audio> elements), which is fine for the audio-CORS half, but the localize half has zero unit test coverage. Manual smoke on skybound_trailer_index.html (the bug-report composition) is mentioned in the PR body but doesn't pin the behaviors for the next person.
A handful of cheap tests with fetch mocked would pin:
- Multiple
<video>tags pointing at the SAME URL → one download, both rewritten. (Dedup correctness.) - Download failure (
fetchrejects) → original URL preserved in HTML + warning logged + render continues. (Graceful-degradation contract.) - Mixed quote styles (
src="https://..."vssrc='https://...') → both rewritten. (replaceAllcovers both.) - Mix of remote + local sources → only remote get rewritten; locals untouched.
- (Windows guard) basename extraction works on a path with backslashes — easiest if it's an explicit unit on the basename derivation.
The dedup behavior in particular is non-obvious from the diff (uses a Set + a per-URL urlToLocal map); without a test it can easily regress to per-tag downloads on a future refactor.
Echo of Magi's hf#1145 catch — regex [^>] truncation
The new regexes both use [^>] spans:
const REMOTE_MEDIA_TAG_RE =
/<(?:video|audio)\b[^>]*?\bsrc\s*=\s*["'](https?:\/\/[^"']+)["'][^>]*>/gi;
// ...and the existing crossorigin-strip pattern:
compiledHtml = compiledHtml.replace(/(<audio\b[^>]*)\s+crossorigin(?:=["'][^"']*["'])?/gi, "$1");A literal > inside a quoted attribute value on the same tag (e.g. data-title="A > B") would truncate the regex match early → tag missed → fallback to remote URL. Composition <video> / <audio> roots don't typically carry prose-bearing attributes so this isn't a practical bite, but it'd be worth a one-line comment recording the assumption — same suggestion Magi made on hf#1145's detection regex; the codebase is accumulating [^>]-span regexes and the documented gotcha would scale.
Severity verification
Walking the next user action from the broken state:
- Localize bug: user renders a cinematic with 10+ remote sources → render "succeeds" (no error code) → MP4 emitted → user downloads + plays → black frames + silent audio with no explanation. Destructive output — the asset looks complete but isn't usable. User can't tell whether their composition is wrong or the renderer is. This is the more serious of the two halves; "blank video silently delivered" is the exact thing API customers can't tolerate.
- Audio CORS strip: same shape as hf#1140's video/img strips. Compositions with remote
<audio>(background music, voiceover stems on S3) would have silently dropped audio at the engine layer until they hit the FFmpeg out-of-band path. Net effect: also production asset loss for the audio half of any composition with remote audio sources. Equally P1-ish.
Both halves grade as real production bugs; the PR's "fix" framing is right.
Other observations
fallow-ignore-next-line complexityoncompileForRender— fair use; the function is doing genuine orchestration with several conditional branches. Not a blocker, but the function is getting unwieldy. Future PRs might benefit from extracting the per-feature compilation steps into named helpers (compileWithExternalAssets,compileWithPositionScript,compileWithLocalizedMedia) so each step is reviewable on its own.downloadToTempalready has process-local caching (downloadPathCache+inFlightDownloadsin the engine helper). Subsequent renders within the same worker reuse downloads transparently. Nice — no per-render network thrash.- No interaction with hf#1140 — that PR fixed the audio mixer + image strip; this one extends image strip to audio. Orthogonal. No shared lines.
Verdict
Materially LGTM on diagnosis + fix shape. Real blocker on the Windows basename bug — needs to land before merge or CI's Windows job will quietly regress. Test coverage gap on the main fix is the next thing; non-blocking but worth folding in. Echo-Magi comment on the regex span is cosmetic.
Stamp held — same protocol. James gates.
— Rames Jusso (hyperframes)
Addresses Rames' review on hf#1146:
- Replace `absPath.split('/').at(-1)` with `path.basename(absPath)`. On
Windows, path.join emits backslash-separated paths; split('/') returns
the whole path as a single element, producing a garbage relPath.
path.basename delegates to the OS separator on the current platform.
- Export `localizeRemoteMediaSources` for unit testing. Tests verify:
- Successful download rewrites src to _remote_media/ path
- Download failure preserves original URL without throwing
- Duplicate src URL across two tags → single fetch call (dedup)
- Local (non-HTTP) src paths are not rewritten
- Both double-quoted and single-quoted src attributes are rewritten
- basename extraction is correct on POSIX paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Correction to my earlier review: I flagged "would be worth a one-line comment recording the assumption" for the I scanned the diff in chunks and missed the comment lines. Same sampling-miss pattern I hit on hf#1123 — the lesson on grepping raw artifacts before asserting "X isn't in the diff" didn't catch this time because I framed it as "would be worth adding," which sounded additive but was actually a false claim about what wasn't there. Filing as a memory refinement: trigger phrases like "worth a comment noting" or "would benefit from documenting X" require the same The other two findings stand — Magi's commits address them cleanly:
— Rames Jusso (hyperframes) |
vanceingalls
left a comment
There was a problem hiding this comment.
The fix is correct and well-reasoned. Pre-downloading remote media before the file server starts is the right solution to the readyState >= 2 race. Traced the full path resolution chain: downloads/_remote_media/ → externalAssets → writeCompiledArtifacts → compiled/_remote_media/ → resolveProjectRelativeSrc picking up compiledDir first. Solid end-to-end.
Magi's follow-up commits resolve my two blockers:
localizeRemoteMediaSourcesnow has 6 unit tests covering success, fail-fallback, dedup, local-src passthrough, quote styles, and Windows basename — same bar asinlineExternalScriptsabove it. ✓path.basename(absPath)replaces the split-based extraction. ✓
Nit (clean up in follow-up)
// fallow-ignore-next-line complexity is a no-op — fallow is not a recognized linting directive in this repo (oxlint uses oxlint-disable-next-line). If the complexity rule isn't active, remove it; if it is, swap in the correct directive.
Nit
\bsrc in REMOTE_MEDIA_TAG_RE will match data-src because - is a non-word character and \b fires between - and s. Not a current bug (no data-src in this codebase), but a silent correctness landmine if it ever appears. Fix: (?<![\w-])src\b.
Nit
The replaceAll('"..."', ...) rewrite operates on the full HTML string — same URL in an inlined JS literal, CSS url(), or JSON block would also get rewritten. Currently harmless for the composition format, but brittle. A comment explaining why DOM-based rewrite isn't used (DOM round-trip perturbs the position-reapply <script> injected just above) would close the gap.
What's solid
downloadToTempmodule-level cache is safe: rechecksexistsSyncon hits, so files deleted withworkDiron render cleanup don't produce stale paths. ✓recompileWithResolutionsis clean: starts from already-rewrittencompiled.html, spreadsexternalAssetsvia...compiled. ✓- Fail-open fallback on download error is the right default for a render system. ✓
- Crossorigin strip regex is correct for all attribute positions and both value forms. ✓
LGTM. Core fix is sound, path chain is correct, blockers are addressed.
— Vai
Root causes
Compositions using remote S3 URLs for
<video>and<audio>(e.g. generated cinematic trailers) rendered with blank black frames and silent audio.Bug 1 — Remote
<video>/<audio>sources → blank framesThe renderer (Puppeteer) waits for all
<video>elements to reachreadyState >= 2(HAVE_CURRENT_DATA — a frame is actually decoded) before beginning capture. With 10+ large S3 clips the browser must buffer all of them over the network simultaneously. This consistently exhaustspageReadyTimeout;frameCapture.tslogs a warning and continues with every clip as a black rectangle.Fix:
localizeRemoteMediaSources()incompileForRender— scans<video>and<audio>opening tags for HTTPsrcURLs, downloads all unique URLs in parallel during compilation, rewrites the src attributes to local relative paths (_remote_media/<file>), and registers the downloaded files asexternalAssetssowriteCompiledArtifactscopies them intocompiledDir. The file server serves them from localhost; the browser reads local files and reachesreadyState >= 2immediately.Download failures (404, timeout, DNS) warn and fall back to the original remote URL so the render degrades gracefully rather than crashing.
Bug 2 —
crossorigin="anonymous"on<audio>not strippedhtmlCompiler.tsalready strippedcrossoriginfrom<video>(pre-#1140) and<img>(#1140) but not<audio>. When the browser's preload path tries to fetch audio from S3 with CORS mode from localhost, S3 may reject the request, leaving the element in a failed network state. Extended the strip to<audio>.Changes
packages/producer/src/services/htmlCompiler.tslocalizeRemoteMediaSources()function (downloads + rewrites)compileForRenderafter external asset collection<audio>crossorigin strip incompileHtmlFilepackages/producer/src/services/htmlCompiler.test.tsit("strips crossorigin from <audio> elements")testTesting
bun test packages/producer/src/services/htmlCompiler.test.ts— 44/44 pass (all crossorigin tests green, download fallback warnings expected for example.com URLs)skybound_trailer_index.html(10 S3 video clips, 9 S3 audio tracks) — video and audio confirmed in rendered output (see thread)