fix(producer,lint): id-less media renders blank wash instead of footage#1790
Conversation
A timed <video>/<audio> identified only by a Studio-stamped `data-hf-id` (no real `id`) rendered as a flat white/grey wash with dropped audio, and lint stayed silent so it surfaced only at render. Root cause, two layers: - lint `readAttr(tag, "id")` used a `\b` boundary, which treats the hyphen in `data-hf-id="…"` as a word break — so reading "id" matched the trailing `id="…"` inside `data-hf-id` and returned a phantom id. `media_missing_id` therefore never fired for media carrying only a data-hf-id. Switched to a `(?<![\w-])` lookbehind so a short name can't match the tail of a longer hyphenated attribute (also fixes "width" matching `data-width`, etc.). - the render pipeline identifies media by the real `el.id`: frame extraction keys injected stills as `__render_frame_<id>__`, the runtime frame-swap matches on `el.id`, and the audio mixer selects `audio[id][src]`. An empty `el.id` meant injected frames/audio never matched. compileForRender now assigns a stable positional id to every id-less timed media element before any stage parses or serves the HTML. Adds a producer regression fixture (video with data-hf-id, no id) and a lint test covering the data-hf-id/id collision. Baseline mp4 generated separately.
Golden compiled.html + output.mp4 (generated on linux/amd64 in the Dockerfile.test image). Compare-mode passes: compilation, visual (0 failed frames), and audio (correlation 1.000). A regression to the blank-wash behaviour fails the visual check.
vanceingalls
left a comment
There was a problem hiding this comment.
Stack: single PR against
main(no parents, no descendants). HEAD17e35f9e. Author Miguel.Verdict: ✅ LGTM — root cause confirmed at both layers; defenses in depth correctly composed; regression fixture nails the symptom.
The case as presented
A <video> carrying only Studio's data-hf-id (no real id) rendered as a flat wash and any sibling <audio> went silent. hyperframes lint was somehow asleep at the wheel — media_missing_id never fired. The user reports of "grey/white background video" and "black video, dropped audio" trace back through this same locked door.
Investigation — the prints on the brass
Layer 1 — the false witness in readAttr. The original regex anchored its attribute name with \b. A word boundary, of course, sits between any word-character and any non-word-character — and the hyphen in data-hf-id="…" is a non-word-character. Asking readAttr(tag, "id") therefore happily matched the trailing id="…" inside data-hf-id, returning a phantom value. The media_missing_id rule, satisfied by the phantom, packed up and went home. The fix swaps \b for (?<![\w-]) — a lookbehind that refuses both word-characters and hyphens, demanding the attribute begin fresh. The same surgery is applied to readJsonAttr. I checked every readAttr caller in packages/lint/** (eleven files); none relied on the prior tail-matching misbehaviour. Positive tests across composition.test.ts, slideshow.test.ts, etc., would have screamed if hyphenated attribute reads broke — they don't, because the lookbehind permits a fresh start after < or whitespace.
Layer 2 — the missing identity in the producer. The render pipeline keys every timed media element off el.id: __render_frame_<id>__ for injected stills, el.id for the runtime frame-swap (packages/core/src/runtime/media.ts), and the audio mixer's audio[id][src] selector (packages/engine/src/services/audioMixer.ts). Empty el.id → no key → no match → blank surface, silent audio. The new assignMissingMediaIds walks video[data-start], audio[data-start], stamps a stable positional hf-media-N on any id-less element, and — critically — runs on the same HTML object that flows into parseVideoElements / parseAudioElements and out to the file server. I read compileForRender around line 1510 at HEAD: the call is wedged between collectExternalAssets and parseVideoElements/parseAudioElements, so the parse, the server, and the runtime all see the same identity. No split-brain.
Verifications
- ✅ Root-cause confirmed. Both layers match the symptom; neither is decorative.
- ✅ Reachable under prod defaults. No flag, no opt-in —
assignMissingMediaIdsruns unconditionally insidecompileForRender. (feedback_verify_new_path_reachablecleared.) - ✅ Auto-heal precedes parse + serve. Same
htmlvalue flows to every consumer; no "fix one, serve the other" split. (feedback_decorative_gate_patterncleared — populate path is the same object as the read path.) - ✅
data-hf-iddeliberately NOT reused asid. Author calls this out: Studio edit handle ≠ render identity. Correct — round-tripping would couple two namespaces with different lifetimes. - ✅ Lint rule actually exercises the bug. The added
media.test.tscase hasdata-hf-idand noid, and asserts exactly twomedia_missing_iderrors. Without the regex fix this test would fail; with it, it passes. - ✅ Regression fixture is the symptom, not a strawman.
video-hfid-no-id/src/index.htmlis a full-bleed background<video>withdata-hf-idonly — exactly the Studio-stamped shape — and PSNR will catch a return to the blank wash. - ✅ CI green where it matters. Lint, Tests, Build, Fallow, Preview parity, runtime contract, CodeQL all pass at
17e35f9e. Regression shards 1/2/4/5/7/8 in-flight at review time; nothing has flipped red. - ✅
media_missing_idleft as error. Author flags the slight pessimism (auto-heal means it's no longer strictly "will be FROZEN") and leaves message-tuning for a follow-up. Defensible — a realidremains the contract for Studio targeting and GSAP selectors; the auto-assign is a safety net, not an invitation.
Minor observations (non-blocking)
- 💭 The autogenerated id is
hf-media-${seq++}keyed on DOM traversal order. If a future Studio commit reorders nodes, persisted artefacts that referenced the oldhf-media-Nwould drift. Today nothing persists these — they're derived per-compile — so it's fine. Worth a one-line comment if anyone gets clever later. - 💭 The
media_missing_idmessage refinement the author flags is genuinely worth filing as a tiny follow-up — once a developer relies on the auto-heal, "FROZEN" reads as crying wolf.
CI + context snapshot
- HEAD:
17e35f9e269d6014aa500afc68efe0c5db9cecb0 - Base:
main· single PR · no parents, no descendants - Required gates green; regression shards in progress
- Prior reviewers: none at review time
The chain of evidence holds. Ship it.
Review by Via
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Layering on Via's LGTM (HEAD
17e35f9e). Producer-side ground she covered: regex fix anatomy (\bvs(?<![\w-])),assignMissingMediaIdsplacement betweencollectExternalAssetsandparseVideoElements/parseAudioElements, regression fixture PSNR coverage, lint-callers-don't-rely-on-tail-match audit. All confirmed independently. Taking the lint + cross-cutting + observability angles she didn't dwell on.
Verdict: lgtm-with-followups. One genuine concern (#1 below — root-cause-vs-band-aid in compileTimingAttrs); the rest are non-blocking but worth filing.
🟠 The same \b-style bug lives in compileTimingAttrs.getAttr — this PR band-aids the symptom, not the root cause
packages/core/src/compiler/timingCompiler.ts:62:
function getAttr(tag: string, attr: string): string | null {
const match = tag.match(new RegExp(`${attr}=["']([^"']+)["']`));
return match ? (match[1] ?? null) : null;
}No boundary at all — getAttr(tag, "id") matches the trailing id="…" inside data-hf-id="…" (same root cause as readAttr, slightly worse — it doesn't even need \b to misfire). Verified in a Node REPL:
getAttr(`<video data-hf-id="hf-v1a2b3" data-start="0">`, "id") === "hf-v1a2b3" // wrong
getAttr(`<div data-composition-id="main" data-width="1920">`, "id") === "main" // wrong
getAttr(`<div data-composition-id="main" data-width="1920">`, "width") === "1920" // wrong
This matters because compileTag at line 84-89 already has an auto-id-on-missing-id branch — exactly the same intent as the new assignMissingMediaIds:
let id = getAttr(result, "id");
if (!id) {
id = `${isVideo ? "hf-video" : "hf-audio"}-${generateId()}`;
result = injectAttr(result, "id", id);
}That branch should already have caught every id-less media element coming into compileForRender (it runs at htmlCompiler.ts:214 via compileHtmlFile, before assignMissingMediaIds). The reason it didn't fire on Studio-stamped <video data-hf-id="…"> is that getAttr(result, "id") was wrongly returning the data-hf-id value, so if (!id) was false. Same boundary bug, exact same shape.
Net: this PR's assignMissingMediaIds band-aids the symptom downstream. The root cause survives, and so does a parallel auto-id system using a different prefix (hf-video-N vs hf-media-N) keyed on a now-broken getAttr. Suggest either:
- Apply the same
(?<![\w-])lookbehind totimingCompiler.ts'sgetAttrand letcompileTag's pre-existing auto-id fire, then dropassignMissingMediaIdsas redundant; OR - Keep both, but file a follow-up to fix
getAttrso unrelated callsites (compileTag,extractResolvedMediaat line 224,clampDurations, the unresolved-div branch at line 163) aren't operating on phantom ids.
The fact that things accidentally still work today is partly because injectDurations matches <[^>]*id=["']${id}["'][^>]*> against the tag substring — a data-hf-id="x" value still hits because id="x" is a substring of data-hf-id="x". The whole chain has been operating on conflated identifiers and getting lucky.
🟠 Behavior change — lint may start firing on previously-silent compositions
Every readAttr(tag.raw, "id") callsite in packages/lint/** was previously sometimes returning the data-hf-id value. After the fix it returns null for id-less elements. Two of those callsites gate whether a rule fires:
core.ts:357—studio_missing_editable_idwarning (if (readAttr(tag.raw, "id")) continue)media.ts:441—media_missing_iderror (the PR's intended new behavior)
Before the fix, both rules silently no-op'd on compositions stamped with only data-hf-id. After the fix, they fire as designed — which is correct, but means any project currently passing hyperframes lint and using data-hf-id-only authoring (i.e. anything coming out of Studio's preview server) may go red. Worth a release-note line + a heads-up to consumers; this PR ships with v0.7.19 already cut.
Other readAttr(tag.raw, "id") callsites are message-formatting only (elementId: readAttr(...) || undefined for finding payloads) — those were producing phantom-id strings in lint output, harmless but confusing. Now correct.
🟡 Test coverage gaps
- No unit test for
assignMissingMediaIds. The function is covered only by the heavy regression fixture (PSNR comparison on baselineoutput.mp4). A 5-line unit test inhtmlCompiler.test.tsasserting that<video data-hf-id="x">getsid="hf-media-0"and<audio data-hf-id="y">getsid="hf-media-1", plus negative cases (realidpreserved,<video>withoutdata-startskipped), would be a much sharper signal when this regresses. - Audio side of bug is not exercised by the regression fixture. PR body says the symptom is "blank wash + dropped audio";
meta.jsonhasminAudioCorrelation: 0.0and the fixture's<video>ismuted playsinlinewith no<audio>element. The audio-dropped-because-audio[id][src]-no-match path is the part Miguel cites from user reports — and the test wouldn't catch a regression on that side because there's no audio in the baseline to fail against. Consider adding a sibling<audio data-hf-id="…">with a tone clip and raisingminAudioCorrelationto a meaningful threshold, or split into a second fixture. - Lint regex test only covers
data-hf-id. The fix's comment also calls outwidthmatchingdata-width. No test pins the latter — a future regex tweak could regress only that case and unit tests would still pass.
🟡 Cross-repo lint enforcement gap (carry-over from #1779)
@hyperframes/lint@0.7.19 is published but heygen-cli (Go) and hyperframes-gemini-agent neither depend on nor invoke it. Verified — no @hyperframes/lint reference in either repo's manifests. So Claude/Gemini agents that generate HF compositions still can't catch media_missing_id at generation time; the lint fix only helps authors running hyperframes lint locally or in CI. The producer-side assignMissingMediaIds is the actual safety net for agent-generated content.
Worth a follow-up: either wire @hyperframes/lint into the agent-side compose pipeline, or make assignMissingMediaIds emit a one-line warning when it fires (see next item) so the LINT-gap is observable from the render telemetry.
🟡 Observability gap — silent auto-heal
assignMissingMediaIds mutates the HTML silently. When it fires, it's papering over either (a) a Studio preview-server bug (preview should stamp id, not just data-hf-id), (b) an agent-generated composition that bypassed lint, or (c) a hand-authored composition that needs an id. In all three cases we'd want to know — both for the originating system to be fixed and for the resulting hf-media-N ids to be predictable for downstream consumers. A single log.warn / counter on the changed branch ("auto-assigned id on N id-less timed media elements") would make the gap visible without forcing it to be a hard error. Per the failure-path-emission rubric, the auto-heal is the failure path of the lint rule; not emitting from it loses the signal that the lint didn't prevent the issue.
🟡 Pre-existing src violator
packages/producer/tests/sub-composition-video/src/compositions/polaroid.html:10 has <video src="…" data-start="0" data-track-index="0" crossorigin="anonymous"> — no id, no data-hf-id. The compiled output at output/compiled.html:279 shows it's been getting id="hf-video-0" via the existing videoFrameExtractor.ts:134 auto-id (the third parallel auto-id system in this codebase, fwiw). That fixture has been quietly testing a path that goes through three different "give it an id" branches in sequence. Worth either adding an explicit id to the source or documenting that the fixture intentionally exercises the auto-heal cascade.
🟡 Nits
hf-media-Nis keyed ondocument.querySelectorAlltraversal order. Today nothing persists these ids across compiles so it's fine; if anything ever caches a compiled artifact and referenceshf-media-N, reorder = drift. Worth a single-line doc on the function. (Via flagged the same — agreeing.)media_missing_idmessage ("will be FROZEN") is now overstated given the auto-heal. Author already flagged this for follow-up; agreeing.- ES2022 lookbehind is fine for the Node entry; the
@hyperframes/lint/browserentry runs in whatever consumers ship to (Studio is Electron-Chromium, fine). Worth a sanity check that no downstream still ships to Safari < 16.4.
Producer fix: 🟢 correct surface, well-placed in the pipeline, regression fixture (video side) does its job.
Lint regex fix: 🟢 the lookbehind is the right anchor; explanatory comment is exactly right.
Cross-cutting: the bug's siblings in getAttr and the absent observability are the parts that turn this from "fix" to "fix-and-prevent-recurrence"; everything above is the file-it-as-follow-up version of that.
Not blocking on any of this — the user-facing symptom is genuinely fixed and the CI green is meaningful. But the getAttr parallel bug (item #1) is the one I'd want eyes on before this pattern recurs in a third place.
Reviewed at SHA 17e35f9e269d6014aa500afc68efe0c5db9cecb0.
— Rames D Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
Addendum to my R1 — converging with Rames's #1
Rames's first followup is correct, and my R1 missed it. Posting this as an addendum, not a verdict change — still LGTM, but the underlying regex bug deserves a follow-up.
Verified at 17e35f9e
The same attribute-boundary bug lives in the compiler. packages/core/src/compiler/timingCompiler.ts:62-65:
function getAttr(tag: string, attr: string): string | null {
const match = tag.match(new RegExp(`${attr}=["']([^"']+)["']`));
return match ? (match[1] ?? null) : null;
}No \b, no lookbehind, no boundary at all — naked interpolation. getAttr(tag, "id") against <video data-hf-id="abc-123"> substring-matches the trailing id="abc-123" inside data-hf-id and returns "abc-123". That's the same false-witness shape the lint readAttr fix patched at packages/lint/src/utils.ts:119 with (?<![\w-]) — except the compiler version is even more permissive (matches anywhere in the tag, not just after a word/hyphen boundary).
The dependent auto-id system is in the same file, not videoFrameExtractor.ts (that file doesn't exist under packages/core/src/compiler/; nearest is packages/engine/src/services/videoFrameExtractor.ts, unrelated). The actual auto-id lives at packages/core/src/compiler/timingCompiler.ts:85-88:
let id = getAttr(result, "id");
if (!id) {
id = `${isVideo ? "hf-video" : "hf-audio"}-${generateId()}`;
result = injectAttr(result, "id", id);
}For a <video data-hf-id="…"> with no real id, getAttr(result, "id") returns the phantom — id is truthy — the hf-video-N injection silently skips. The element flows out of compileTimingAttrs still id-less. That's why assignMissingMediaIds was necessary as a producer-side band-aid.
Dispatch chain confirms the band-aid framing. packages/producer/src/services/htmlCompiler.ts:214 calls compileTimingAttrs(html) first (inside compileHtmlFile); assignMissingMediaIds runs later at line 1521 in compileForRender. The producer-side stamp papers over the symptom by assigning hf-media-N before parse + serve, but the compiler's own hf-video-N / hf-audio-N auto-id remains dead code for any element carrying data-hf-id. Anyone else calling compileTimingAttrs directly (lint, browser-side preview, tests, future callers) hits the same misbehaviour.
Audit gap I owe
My R1 said "audited all eleven readAttr callers in packages/lint/**; nobody relied on the prior tail-matching misbehaviour." That audit was scoped strictly to lint. The compiler package has a sibling getAttr with the same shape (worse, even — no boundary at all) and the same callers-depend-on-a-real-id pattern. I should have widened the audit to every attribute-reading helper in the monorepo. cf. feedback_verify_full_dispatch_chain.
Verdict
Still LGTM on the PR as shipped — the producer-side fix is reachable, defended in depth, and the regression fixture nails the symptom. But Rames's followup is the right framing: the root cause survives at timingCompiler.ts:62, and a follow-up PR should swap that getAttr regex to the same (?<![\w-]) lookbehind shape as readAttr. Otherwise the next pipeline that wires compileTimingAttrs ahead of assignMissingMediaIds (or skips the band-aid entirely) reintroduces the bug.
R1 addendum by Via
…rop the band-aid (#1792) * fix(core): root-cause the id-less media wash in getAttr, drop the band-aid The blank-wash/dropped-audio fix in #1790 added assignMissingMediaIds in the producer to stamp ids onto id-less timed media. That was a band-aid: the real cause is timingCompiler's getAttr, whose regex had no name boundary at all, so getAttr(tag, "id") matched the trailing id="…" inside data-hf-id="…". compileTag saw a phantom id and skipped its existing hf-video-N/hf-audio-N injection, leaving the element with no real el.id — which the render pipeline keys off of. Fix getAttr with the same (?<![\w-]) lookbehind used for the lint readAttr fix. compileTag's auto-id injection now fires for data-hf-id-only media, in both the main composition and sub-compositions (parseSubCompositions runs the same compileTimingAttrs pass), so assignMissingMediaIds is removed entirely. Extends the regression fixture with a standalone id-less <audio> (the dropped- audio side, previously untested) and raises minAudioCorrelation to 0.9. Adds a timingCompiler test for the data-hf-id/id boundary. * test(producer): use seeded pink noise (not a pure sine) for fixture audio A continuous sine anti-aligns under the audio cross-correlation (correlation -1.0 from a sub-period offset). Broadband seeded noise correlates robustly. * test: cover audio-side id injection via unit test; keep render fixture video-only The audio render-baseline used synthetic sine/noise, which anti-aligns under the harness audio cross-correlation (deterministic -1.0). Real audio fixtures are unaffected. Cover the audio side of the boundary fix with a deterministic timingCompiler unit test (id-less <audio> gets hf-audio-N) instead, and keep the render fixture video-only. * test(producer): regenerate baseline under the root fix (hf-video-N from compileTag)
Problem
A timed
<video>/<audio>identified only by a Studio-stampeddata-hf-id(no realid) rendered as a flat white/grey wash with dropped audio — andhyperframes lintstayed silent, so it surfaced only at render. This is the root cause behind both user-reported "grey/white wash on background video" and "black video + dropped audio" feedback. Studio's preview server stampsdata-hf-id(notid) into the source, so opening a composition in Studio is enough to plant the trigger.It reproduces on current
main: a<video>with a realidrenders perfectly; the same composition with only adata-hf-idrenders blank.Root cause (two layers)
lint —
readAttr(tag, "id")matched with a\bboundary. The-indata-hf-id="…"is a word boundary, so reading"id"matched the trailingid="…"insidedata-hf-idand returned a phantom id.media_missing_idtherefore never fired for media carrying only adata-hf-id. Switched to a(?<![\w-])lookbehind so a short attribute name can't match the tail of a longer hyphenated one (also fixes"width"matchingdata-width, etc.).render — the pipeline identifies media by the real
el.id: frame extraction keys injected stills as__render_frame_<id>__, the runtime frame-swap matches onel.id, and the audio mixer selectsaudio[id][src]. An emptyel.idmeant the injected frames/audio never matched → blank wash + dropped audio.compileForRendernow assigns a stable positionalidto every id-less timed media element before any stage parses or serves the HTML, so the whole pipeline shares one identity.Tests
packages/producer/tests/video-hfid-no-id/— a full-bleed<video>withdata-hf-idand noid. Baseline mp4 (generated on linux/amd64 in Docker) shows the footage; a regression to the blank wash fails PSNR.data-hf-idbut no realidnow raisesmedia_missing_id.Notes
media_missing_idis left as-is (error). With the render auto-heal it is now a touch pessimistic ("will be FROZEN"), but a realidis still the contract for stable Studio targeting and GSAP selectors; the auto-assigned id is a render-time safety net, not a license to omit ids. Message refinement can follow if desired.