fix(producer): audio drops + blank images on FFmpeg 4.x/CORS-restricted origins#1140
Conversation
amix's normalize=0 option is absent from many FFmpeg builds (e.g. FFmpeg 4.2 on Ubuntu 20.04). When the option is not recognized, FFmpeg fails the entire filter graph initialization, processCompositionAudio returns success:false, and the assembled video has no audio stream. Replace normalize=0 + weights='1...' with the amix default behavior (normalize=true, divides by track count) and multiply the master output gain by the track count to restore the original per-track volumes. The net volume is identical across all FFmpeg versions. Fixes #1136-adjacent: reported as 'audio doesn't play' in rendered MP4.
jrusso1020
left a comment
There was a problem hiding this comment.
Reviewed end-to-end. (Home flagged "no description" but the PR body is detailed — root cause, repro, fix, test plan all present.) Diagnosis verified, fix math checks out, but I found a sister-bug surface in audioExtractor.ts that has the same FFmpeg-4.x incompatibility and isn't touched here.
Diagnosis verified
amixnormalizeoption was added in FFmpeg 5.0; Ubuntu 20.04 ships 4.2.7, plenty of 6.x packages omit it. ✓- Confirmed the team already knows about this —
packages/producer/src/utils/audioRegression.ts:137-146has an existing comment: "Avoids amix'snormalizeoption (not available on ffmpeg 4.x) — we use volume=-1 + amix to" — institutional knowledge that didn't propagate whenaudioMixer.tswas last touched.
Fix math
The PR replaces amix=...:normalize=0 (raw sum) + volume=masterGain with amix=... (sum/N default) + volume=masterGain × N. Net:
- Old:
Σ inputs × masterGain - New:
(Σ inputs / N) × (masterGain × N) = Σ inputs × masterGain
Mathematically identical at the output. The weights='1 1 1' clause being removed is correct — uniform weights are the default. ✓
Sister-bug — audioExtractor.ts has the SAME pattern
// packages/producer/src/services/audioExtractor.ts
const mixFilter = `${mixInputs}amix=inputs=${tracks.length}:duration=longest:normalize=0[out]`;Same :normalize=0 clause, same FFmpeg-4.x abort, same silent-failure path (mixer returns success: false → assembled MP4 has no audio). This file is in the production rendering path (extracting audio from video tracks for compositions that have data-has-audio="true" on <video> elements). A user with a composition that uses video audio (per yesterday's hf#1140-adjacent work in this area) on FFmpeg 4.x hits the same bug.
This isn't a sample I had to dig for — audioExtractor.ts is in the same package and on the same code path that handles audio compositing. Worth folding into the same PR rather than shipping a half-fix.
Severity discipline (per the retro pattern)
Walking next-user-action from the broken state:
- User renders a composition with audio on FFmpeg 4.x
- Mixer silently fails → MP4 emitted with video only
- User downloads, plays back, hears no audio
- No error surfaced anywhere — user thinks they authored the composition wrong
This is production asset loss, not "minor compatibility." The audio data was assembled correctly upstream; it just never made it into the final MP4. PR body's "silent failure" framing is accurate. Severity should be treated as a P1-ish bug, not a P3 compatibility nit.
Test coverage
PR claims existing producer regression tests pass (same net volume). No new test specifically pinning the FFmpeg-4.x compatibility. A minimal hermetic test would be:
- Construct a filter graph via
mixAudioTrackswith mockedrunFfmpeg - Assert the filter string does NOT include
normalize=orweights= - Assert the post-mix
volume=filter equalsmasterOutputGain * tracks.length
That pins the contract Vai's review will look for tomorrow if someone re-introduces the option. Non-blocking, but cheap to add.
Other observations
audioExtractor.ts(production) — definitely needs the same fix, per above.examples/aws-lambda/scripts/eval.sh— also usesnormalize=0but it's an eval script for sample-cancellation testing, not the production path. If someone runseval.shon FFmpeg 4.x for a metrics check, it'd fail there too. Minor.audioRegression.ts— already correct. Good precedent the audioMixer fix could have cited.
Verdict
Materially LGTM on the audioMixer.ts math + the fix shape. Real blocker: the sibling fix in audioExtractor.ts should be in the same PR. Same root cause, same blast radius, two-line diff. Right move is to extend this PR rather than leave the half-fix on main and follow up with another.
Holding the stamp — same protocol. James gates.
— Rames Jusso (hyperframes)
Two follow-up fixes: 1. htmlCompiler: strip crossorigin attribute from <img> elements during compilation. External images (e.g. S3) with crossorigin='anonymous' force CORS-mode requests against the renderer's localhost file server, which S3 rejects → images render blank. Matches the existing video strip at line 261. 2. audioExtractor: same amix normalize=0 bug as audioMixer.ts. The audioExtractor path is used for <video data-has-audio='true'> mixing in the CLI's local render pipeline; on FFmpeg 4.x it would also drop audio silently. Fix: remove normalize=0, compensate with volume=N.
vanceingalls
left a comment
There was a problem hiding this comment.
Verdict: APPROVE
The fix is correct and the root cause analysis is solid. Here's the full breakdown.
Math
Old: amix normalize=0 passes samples through at full amplitude → output = Σ(track[i]) → volume=masterGain brings it to the intended level.
New: amix default (normalize=true) divides by N → output = Σ(track[i]) / N → volume=masterGain×N exactly restores it.
The compensation is exact because apad=whole_dur=totalDuration is preserved unchanged — all N inputs remain active for the same full duration, so amix always divides by a fixed N, not by the number of inputs still active at a given sample. If apad were ever dropped, this equivalence would drift. That's fine for this PR but worth keeping in mind if apad gets revisited.
Per-track volume= filters (the buildVolumeExpression path) are not touched — individual track gains are unchanged.
Scope
packages/producer/src/services/audioMixer.ts is a pure re-export from engine — there's no duplicate implementation to update. The fix covers both consumers.
formatFilterNumber on the compensated gain
Using formatFilterNumber here is correct. Without it, a 3-track composition with masterOutputGain=1 would emit volume=3 which is fine, but a non-unit PRODUCER_AUDIO_GAIN (e.g. 0.7) would produce volume=2.0999999999999996 — a malformed filter string on some FFmpeg builds. Good catch.
What's missing (nit only)
nit: The test at line 65 ([mixed]volume=1[out]) passes by coincidence — single track → 1 * 1 = 1. There's no multi-track test that exercises compensatedGain = masterOutputGain * tracks.length with N>1. A 2-track case asserting [mixed]volume=2[out] (or volume=1.4 with a custom gain) would lock in the semantics. Not a blocker — the math is simple and the comment is clear — but it's the only novel logic in this commit.
nit: Commit message says "Fixes #1136-adjacent" — if there's a linked issue, a clean Fixes #XXXX reference is preferable. If not, drop the -adjacent.
Test plan assessment
The manual repro (ffprobe showing video-only before, video+AAC after) and the note that "existing producer regression audio correlation tests pass" are both meaningful signals. The unit test suite does exercise processCompositionAudio with the runFfmpeg mock, and the filter-string assertions for the single-track default case pass correctly.
Ship it.
— Vai
|
Correction to my earlier review: I claimed Root cause of my mistake: I sourced the sister-bug finding from Net review: math correct on both files, FFmpeg 2.5+ syntax (works everywhere realistic), — Rames Jusso (hyperframes) |
jrusso1020
left a comment
There was a problem hiding this comment.
Re-reviewed at 8f7f69d6. Three files, 16+/5-, all three changes verified. Materially LGTM; holding the stamp per protocol — James gates.
What changed since my first pass
audioExtractor.ts— applies the samenormalize=0→ default-amix+volume=Npattern (resolves what I incorrectly called a "missing sister-bug"; see my correction comment).htmlCompiler.ts— new: stripscrossoriginfrom<img>, mirroring the existing<video>strip.
audioExtractor.ts — verified
const mixFilter = `${mixInputs}amix=inputs=${tracks.length}:duration=longest[mixed]`;
const postMixGain = `[mixed]volume=${tracks.length}[out]`;Math: pre-fix output was Σ inputs (raw sum, via normalize=0). Post-fix output is (Σ inputs / N) × N = Σ inputs. Identical. ✓ No master gain in this path (consistent with the original code's normalize=0[out] having no separate volume filter).
Small consistency nit, non-blocking: audioMixer.ts runs the compensated gain through formatFilterNumber(); audioExtractor.ts interpolates tracks.length raw. For pure integers it's fine — FFmpeg's volume filter parses volume=3 unambiguously — but you might want to mirror the helper for symmetry. Pure style, take or leave.
htmlCompiler.ts — verified
compiledHtml = compiledHtml.replace(/(<img\b[^>]*)\s+crossorigin(?:=["'][^"']*["'])?/gi, "$1");Regex is the same shape as the <video> strip three lines above (<video\b[^>]* … crossorigin(?:=…)?). Matches both quoted and unquoted forms, optional attribute value. ✓
Rationale checks out: the renderer captures via headless Chrome's BeginFrame screenshot path (compositor-level), not canvas readback (toDataURL/getImageData). CORS-tainted images render fine to the compositor; they just can't be read back into JS. The fix unblocks the actual render path.
One edge case worth knowing — not a blocker: if a composition uses WebGL textures (gl.texImage2D(img)), it DOES need CORS-clean images, and stripping crossorigin would silently break those textures. Blast radius is small (HTML/CSS compositions don't hit this) and the workaround for any future WebGL author is to host the asset on the localhost file server. Worth a one-line comment in the code if WebGL is ever a supported path, otherwise fine to leave.
Tests
Echo of Vai's nit, expanded:
- Audio path: no multi-track unit test pins the
compensatedGain = masterOutputGain × Nmath. With N=1 the existing test passes either formulation; would only catch a regression on N≥2. - CORS path: matching strip-
crossorigin-from-<video>likely has a test;<img>should have a parallel one. Two-line addition to whatever the existing video-strip test asserts.
Both non-blocking — if Miguel wants to fold them in here vs. follow-up is a judgment call, but pinning the multi-track math seems cheap enough to do now.
Verdict
Math correct on both audio files. Both production paths covered (engine + producer). CORS-img strip is safe under the current render model. Test coverage is the only remaining open item. Stamp still held — James gates.
— Rames Jusso (hyperframes)
vanceingalls
left a comment
There was a problem hiding this comment.
Second-pass review covering both commits in full.
Commit 1 (22d33de) — audioMixer.ts: Already covered in prior round. Fix is correct.
Commit 2 (8f7f69d) — audioExtractor.ts + htmlCompiler.ts:
audioExtractor.ts (mixTracks): same pattern as audioMixer. normalize=0 dropped, replaced with amix default (divides by N) + volume=${tracks.length} post-mix gain. Math is correct — per-track volume is applied earlier in filterParts via volume=${track.volume}, so the chain is track.volume × (1/N) × N = track.volume, identical to the pre-patch intent. No masterOutputGain concept here, so no multiplier needed beyond N — correct.
htmlCompiler.ts (<img crossorigin> strip): Regex is identical in shape to the existing <video> strip immediately above. The renderer captures DOM frames via compositor screenshot (BeginFrame), not canvas readback, so crossorigin on <img> is not needed for rendering correctness. I traced the WebGL texture path in hyper-shader.ts — textures are sourced from captured frame blobs (blobToTextureSource), not from raw <img> elements, so stripping crossorigin from <img> does not affect shader transitions. Change is safe.
normalize=0 audit: Grepped the full codebase — only these two files used amix ... normalize=0. Both are fixed. No other sites.
CI: All required checks pass (CLI smoke, Build, Test, Typecheck, Lint, Format, Test: runtime contract, Semantic PR title, etc.). Pending checks are regression shards and Windows tests, which are non-required.
Clean across both commits. No blockers.
— Vai
…strip - audioMixer.test.ts: assert filter has no normalize=/weights=; add 3-track test confirming compensatedGain = masterGain × N = 3 - htmlCompiler.test.ts: parallel tests for img and video crossorigin strip (covers both elements, not just video)
Issues fixed
1. Audio silently dropped on FFmpeg 4.x/some 6.x builds
mixAudioTracksandmixTracks(audioExtractor) both usedamix=...:normalize=0. Thenormalizeoption was added to FFmpeg'samixfilter in FFmpeg 5.0 and is absent from many system packages (Ubuntu 20.04 ships 4.2.7). When unrecognized, FFmpeg aborts filter-graph initialization —processCompositionAudioreturnssuccess:false,runAudioStagesetshasAudio=false, and the assembled MP4 has no audio stream. No error surfaced to the user.2. Images with
crossorigin="anonymous"render blankExternal
<img>elements from CORS-restricted origins (e.g. S3 buckets without anAccess-Control-Allow-Originheader for localhost) fail to load whencrossorigin="anonymous"forces CORS mode against the renderer's localhost file server. The same strip was already applied to<video>elements (htmlCompiler.ts:261) but not images.Changes
engine/audioMixer.tsnormalize=0+weights=fromamix, multiply master gain bytracks.lengthto compensateproducer/audioExtractor.tsdata-has-audiomixing pathproducer/htmlCompiler.tscrossoriginfrom<img>elements, matching the existing<video>stripMath
amixwithnormalize=true(default) divides output by track count. Addingvolume=Nafter the mix restores the net level — identical tonormalize=0on FFmpeg versions that support it.Verification
Rendered the composition from the bug report (3 audio tracks + CORS-restricted S3 images) locally:
ffprobeconfirms:video h264 28.0s+audio aac 28.0s