TypeAgent Studio: replay fidelity rungs + Impact Report controls#2569
Open
TalZaccai wants to merge 15 commits into
Open
TypeAgent Studio: replay fidelity rungs + Impact Report controls#2569TalZaccai wants to merge 15 commits into
TalZaccai wants to merge 15 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR advances TypeAgent Studio’s “find a regression” workflow by making Impact Report replay fidelity explicit (including two deterministic replay modes and optional wildcard validation) and by simplifying/standardizing Studio UX infrastructure (shared tree provider scaffolding, shared tooltip model, shared backoff + ws RPC-channel adapters, and shared webview nonce generation).
Changes:
- Add two-mode replay (
nfa-grammarvscompletionBased-cache) plus optional wildcard-match validation and per-side fidelity reporting (core runtime + Studio UI protocol). - Refactor Studio tree views to use a shared
BaseStudioTreeProviderand a structuredTooltipModel, plus shared text formatting helpers. - Consolidate shared infra across packages: ws→RPC channel adapter, exponential backoff helper, and a shared webview CSP nonce helper; add host-side git ref pickers for Impact Report.
Show a summary per file
| File | Description |
|---|---|
| ts/pnpm-lock.yaml | Adds new workspace and external deps used by Studio/vscode-shell/websocket-utils. |
| ts/packages/vscode-shell/src/chatViewProvider.ts | Uses shared createWebviewNonce from core instead of local nonce generator. |
| ts/packages/vscode-shell/src/agentServerBridge.ts | Switches reconnect logic to shared exponential backoff implementation. |
| ts/packages/vscode-shell/package.json | Adds @typeagent/websocket-utils dependency. |
| ts/packages/utils/webSocketUtils/src/rpcChannel.ts | Introduces shared ws→RpcChannel adapter with robust frame decoding. |
| ts/packages/utils/webSocketUtils/src/backoff.ts | Adds reusable exponential backoff helper. |
| ts/packages/utils/webSocketUtils/package.json | Exports new backoff/rpcChannel entrypoints and depends on @typeagent/agent-rpc. |
| ts/packages/typeagent-studio/src/webviewKit/webviewHtml.ts | Centralizes nonce generation via core webview helper. |
| ts/packages/typeagent-studio/src/webviewKit/protocol.ts | Extends host↔webview protocol for replay mode, validation, provenance, connection state, and version picking. |
| ts/packages/typeagent-studio/src/webviewKit/host.ts | Adds per-instance webview panels and optional retain-context behavior. |
| ts/packages/typeagent-studio/src/tooltipModel.ts | Adds structured tooltip model for consistent, testable hover cards. |
| ts/packages/typeagent-studio/src/textFormatting.ts | Adds shared collapse+truncate helper for consistent row text formatting. |
| ts/packages/typeagent-studio/src/test/webviewProtocol.spec.ts | Updates protocol parsing tests for new message shapes and defaults. |
| ts/packages/typeagent-studio/src/test/studioServiceConnection.spec.ts | Updates tests for new backoff configuration shape. |
| ts/packages/typeagent-studio/src/test/studioRuntimeCore.spec.ts | Adds coverage for replay mode gating of construction cache consult. |
| ts/packages/typeagent-studio/src/test/sandboxTreePresentation.spec.ts | Updates tests for new icon-driven health + tooltip model. |
| ts/packages/typeagent-studio/src/test/sandboxSource.spec.ts | Updates tests for new connection backoff option naming. |
| ts/packages/typeagent-studio/src/test/replayViewModel.spec.ts | Updates/expands Impact Report view-model tests (filters, diffs, fidelity matrix, provenance). |
| ts/packages/typeagent-studio/src/test/gitRefProvider.spec.ts | Adds tests for host-side git ref enumeration and provenance pinning. |
| ts/packages/typeagent-studio/src/test/eventLogSource.spec.ts | Updates tests for new backoff option naming. |
| ts/packages/typeagent-studio/src/test/corpusTreePresentation.spec.ts | Updates tests for feedback icon handling and tooltip model. |
| ts/packages/typeagent-studio/src/test/collisionsSource.spec.ts | Updates tests for new backoff option naming. |
| ts/packages/typeagent-studio/src/test/collisionsPresentation.spec.ts | Updates tests for new tooltip model structure. |
| ts/packages/typeagent-studio/src/test/backoff.spec.ts | Adds unit tests for shared backoff helper. |
| ts/packages/typeagent-studio/src/studioServiceConnection.ts | Replaces fixed backoff array with shared exponential backoff + exposes next retry time. |
| ts/packages/typeagent-studio/src/studioServiceClient.ts | Uses shared ws→RPC channel adapter and re-exports it. |
| ts/packages/typeagent-studio/src/sandboxTreeProvider.ts | Refactors to shared BaseStudioTreeProvider connection gating + common item plumbing. |
| ts/packages/typeagent-studio/src/sandboxTreePresentation.ts | Moves to structured tooltips and icon-driven health presentation. |
| ts/packages/typeagent-studio/src/replayPresentation.ts | Uses shared collapse+truncate helper. |
| ts/packages/typeagent-studio/src/impactReportView.ts | Implements per-agent panels, native version picking, provenance pinning, and connection mirroring. |
| ts/packages/typeagent-studio/src/gitRefProvider.ts | Adds host-side git ref listing, validation, and provenance resolution with option-guarding. |
| ts/packages/typeagent-studio/src/extension.ts | Adds corpus file watcher, improves status bar reconnect countdown, and per-agent Impact Report command behavior. |
| ts/packages/typeagent-studio/src/eventLogTreeProvider.ts | Refactors to shared base tree provider and connection gating behavior. |
| ts/packages/typeagent-studio/src/eventLogPresentation.ts | Switches event hover to structured tooltip model; uses shared truncation. |
| ts/packages/typeagent-studio/src/corpusTreeProvider.ts | Refactors to shared base tree provider; shows feedback via icons. |
| ts/packages/typeagent-studio/src/corpusTreePresentation.ts | Switches entry hover to structured tooltip model; uses shared truncation; adds feedback rating field. |
| ts/packages/typeagent-studio/src/collisionsTreeProvider.ts | Refactors to shared base tree provider and connection gating behavior. |
| ts/packages/typeagent-studio/src/collisionsPresentation.ts | Switches hover content to structured tooltip model; tweaks exemplar row formatting. |
| ts/packages/typeagent-studio/src/baseTreeProvider.ts | Adds shared provider scaffold (connection gate, refresh, tooltip rendering). |
| ts/packages/typeagent-studio/README.md | Removes obsolete “Hello (skeleton)” command mention. |
| ts/packages/typeagent-studio/package.json | Updates commands/menus and adds deps (codicons + default-agent-provider). |
| ts/packages/typeagent-studio/esbuild.mjs | Copies codicon font to media and marks default-agent-provider external in service bundle. |
| ts/packages/typeagent-studio/.gitignore | Ignores generated codicon font asset. |
| ts/packages/typeagent-core/test/wildcardValidator.spec.ts | Adds tests for wildcard validation allowlist, fail-open diagnostics, and loader behavior. |
| ts/packages/typeagent-core/test/sideFidelity.spec.ts | Adds tests for per-side fidelity derivation. |
| ts/packages/typeagent-core/test/repoAgentLoader.spec.ts | Extends tests for agent-name resolution using configured agent roots. |
| ts/packages/typeagent-core/test/grammarReplayResolver.spec.ts | Adds tests for wildcard validation in grammar replay and validated match selection. |
| ts/packages/typeagent-core/src/webview/index.ts | Adds shared crypto-based nonce generator for webviews. |
| ts/packages/typeagent-core/src/sandbox/repoAgentLoader.ts | Uses shared agent-ref resolver and exports it. |
| ts/packages/typeagent-core/src/sandbox/inMemorySandboxManager.ts | Uses shared agent-ref resolver for name derivation. |
| ts/packages/typeagent-core/src/sandbox/agentRef.ts | Introduces robust agent name resolution with optional agentRoots. |
| ts/packages/typeagent-core/src/runtime/studioRuntimeCore.ts | Adds replay mode, wildcard validation plumbing, and per-side fidelity reporting. |
| ts/packages/typeagent-core/src/runtime/index.ts | Re-exports wildcard validation APIs from runtime surface. |
| ts/packages/typeagent-core/src/replay/wildcardValidator.ts | Implements L4a wildcard validation with allowlist + fail-open diagnostics. |
| ts/packages/typeagent-core/src/replay/grammarResolver.ts | Adds validated wildcard match selection and guards git show with --end-of-options. |
| ts/packages/typeagent-core/src/replay/constructionCacheResolver.ts | Uses shared schema-file hash implementation to avoid drift. |
| ts/packages/typeagent-core/src/index.ts | Exposes new webview module via top-level index. |
| ts/packages/typeagent-core/package.json | Exports new ./webview entrypoint. |
| ts/packages/studio-service/src/wildcardValidation.ts | Wires wildcard validation via lazy dynamic import of default-agent-provider (externalized). |
| ts/packages/studio-service/src/studioServiceServer.ts | Uses shared ws→RPC channel adapter. |
| ts/packages/studio-service/src/runtime.ts | Wires the default wildcard validator resolver into Studio runtime creation. |
| ts/packages/dispatcher/dispatcher/test/actionSchemaFileCache.spec.ts | Updates tests to use shared schema-file hash helper. |
| ts/packages/dispatcher/dispatcher/src/translation/actionSchemaFileCache.ts | Switches schema-file hashing to shared helper to prevent cache-key drift. |
| ts/packages/cache/src/index.ts | Exports the shared schema-file hash helper. |
| ts/packages/cache/src/explanation/schemaInfoProvider.ts | Implements and documents schema-file hash helper (single source of truth). |
| ts/docs/plans/vscode-devx/STATUS.md | Updates status doc for replay fidelity rungs/modes and wildcard validation. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Files not reviewed (1)
- ts/pnpm-lock.yaml: Generated file
- Files reviewed: 68/69 changed files
- Comments generated: 4
- Review effort level: Low
…ache) Make the replay's deterministic dispatch model explicit instead of implicitly mixing two paths that never coexist in a real dispatcher config. Level A (core gating): add StudioReplayMode = "nfa-grammar" | "completionBased-cache" and mode? on StudioReplayRequest (default nfa-grammar); gate the live construction-cache consult behind completionBased-cache. Default runs are now grammar-only and A/B-symmetric. Level B (plumbing + UI + test): - Thread mode through the webview run message; parseWebviewMessage validates it (unknown/missing -> nfa-grammar). The host forwards it into replayCorpus; the channel/RPC layers ride on StudioReplayRequest unchanged. - Add a two-state Grammar/Cache toggle to the Impact Report action bar with explanatory tooltips, persisted/restored with the version selection. - Add an injectable resolveConstructionCache seam to CreateStudioRuntimeOptions so the gating is testable without a live cache. - Tests: 3 runtime gating tests over a scaffolded agent (cache skipped in nfa-grammar / when mode omitted, consulted in completionBased-cache); update webview protocol run-message expectations to include mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replay now optionally runs each agent's real validateWildcardMatch over the working-tree side's wildcard grammar matches -- the dispatcher's only beyond-grammar determinism (getValidatedMatches) -- dropping a match the agent rejects, exactly as the dispatcher does. Working-tree side only; git-ref side stays grammar-only. Opt-in (default off) and fail-open: only an explicit false rejects; a missing/throwing validator or unloadable agent accepts and records a diagnostic, so replay never fabricates a lost match from infrastructure noise. Core (@typeagent/core, dependency-light): - replay/wildcardValidator.ts: createWildcardMatchValidator with an INJECTED ReplayAppAgentLoader (no dispatcher dep), empty-object stub SessionContext, allowlist (timer/list/player), fail-open diagnostics, dispose->unloadAppAgent. - grammarResolver.ts: selectValidatedMatchAction walks the ranked MatchResult[] (wildcardCharCount===0 auto-accepts, first accepted wins, all-rejected => needs-explanation); exposes wildcardValidationApplied. Working-tree only. - studioRuntimeCore.ts: StudioReplayRequest.validateWildcards opt-in, resolveWildcardValidator injectable seam, StudioReplayResult.wildcardValidation summary, validator build/dispose lifecycle. Re-exported the API from @typeagent/core/runtime so the host can wire a real loader. Host (studio-service): - wildcardValidation.ts: a default loader that lazily imports default-agent-provider (marked external in the service bundle), so it resolves on the in-repo dev path and cleanly no-ops in the packaged .vsix. Gated on the allowlist so the import only fires for an allowlisted wildcard match. Webview (typeagent-studio): - A lit Validate toggle (mirrors the mode toggle), threaded through the run protocol message, with an honest sub-bar indicator: wildcard-validated, or a warning-tinted unavailable/skipped/no-validator/degraded from the diagnostics. Tests: core +40 (wildcardValidator + grammarReplayResolver L4a), studio protocol expectations. core 189, studio 184, studio-service 21 green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A live integration smoke against the real built agent modules (via
default-agent-provider) surfaced that player's validateWildcardMatch throws
during replay: player runs execMode "separate", so the loader returns an RPC
proxy whose child process reconstructs its own SessionContext where agentContext
is undefined (our agentContext:{} stub is never serialized across the wire).
player then throws "Cannot read properties of undefined (reading 'spotify')"
before reaching its no-client self-degrade guard. Even without the throw it can
only ever return true without a live Spotify client, so it adds no fidelity --
it would just fail open with an `errored` diagnostic and a misleading "degraded"
indicator.
timer and list, by contrast, ignore the context and validate correctly over RPC
(timer even produces real rejects), so the allowlist is now timer/list only.
Updated the stub-context rationale, the webview toggle tooltip, and the tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update the capability matrix, long-pole narrative, and next-slice list to reflect shipped two-mode replay (grammar/cache) and L4a live wildcard validation, with L4b (build-from-ref) deferred to P-7 (post-Gate-C). The live priority is now player corpus capture -> Gate C measurement. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Surface a per-side fidelity matrix in the Impact Report so a replay is honest about which deterministic layers actually ran (grammar, schema enrichment, construction cache, wildcard validation) and what building from a git ref would add, instead of over-claiming fidelity. Core: add FidelityLayer/SideFidelity types + sideFidelity on StudioReplayResult, populated by a pure, unit-tested deriveSideFidelity() in both the success and aborted paths. View model: pure toFidelityMatrix() with a build-from-ref preflight hint. Client+CSS: a collapsible fidelity panel with per-layer status icons and hover reasons. Also break a build cycle introduced by L4a: studio-service no longer declares default-agent-provider (it is aggregated by studio-agent, which depends on studio-service). The optional, externally-bundled dynamic import now resolves from the bundling extension (typeagent-studio), which owns the dependency; the specifier is indirected so tsc does not statically resolve it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cleanup Impact Report: - Move the Cache and Wildcard validation toggles behind a gear 'Replay options' popover with VS Code-themed on/off pill switches (closes on outside-click/Escape, disabled mid-run). Add the settings-gear codicon glyph to the curated set so the icon-only button renders. - Relabel the old Grammar/Cache mode toggle to a Cache on/off switch and Validate to Wildcard validation, with concise one-line tooltips. - Remove the verbose source-side preflight hint from the fidelity matrix (and its CSS + tests) per UI 'avoid heavy text' guidance. Corpora view: - Add a FileSystemWatcher on **/*.utterances.jsonl so the tree refreshes on corpus create/change/delete instead of only on manual refresh (fixes the seed-then-save stale tree). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Impact Report panel was created with retainContextWhenHidden:false, so switching away from the tab tore down and reloaded the webview. On reload the client repainted its defaults and fired a single, no-retry 'ready' handshake to re-pull connection + agent + result from the host; when that reply failed to land on the reveal, the panel was stranded on 'Connecting...' with the agent/branch context blank. Add an opt-in retainContextWhenHidden option to WebviewKitPanelOptions (default false, no change for other panels) and enable it for the Impact Report so the webview keeps its DOM, live connection, selection, and rendered result while hidden. Navigating away and back no longer reloads the panel or depends on the reveal handshake. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
555c7e3 to
1a5fbb6
Compare
… and resolver doc - Corpus watcher: scope the FileSystemWatcher to <repoRoot>/corpus (or a corpus-rooted glob when the repo root is unknown) instead of **/*.utterances.jsonl, so it no longer churns on unrelated utterance files across a large workspace. - Event log and collisions trees: only await the connection while their buffer/cache is empty (initial load). Once events or scan results have been collected, render them immediately so a mid-session disconnect keeps the data visible instead of falling back to an indefinite loading bar on reconnect. - grammarResolver: drop an accidental duplicate JSDoc block above selectValidatedMatchAction, keeping the single accurate one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…explicit UI actions Drop the FileSystemWatcher on *.utterances.jsonl. The Corpora tree now refreshes only on in-extension actions (seeding an in-repo file, adding an external source, recording feedback) and the manual Refresh command -- each already calls corpusTree.refresh() directly -- rather than auto-updating on out-of-band edits to corpus files on disk. This supersedes the earlier watcher-scoping change for the same review thread. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agents now opt their validateWildcardMatch into replay via a new optional AppAgentManifest field (replaySafeWildcardValidator); timer and list set it, preserving current behavior. The host gates on the manifest flag (reading it lazily, failing closed to grammar-only), so core becomes pure load-and-run mechanism: the hardcoded allowlist, its option, and the agent-not-in-allowlist diagnostic are removed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…y probe The Impact Report's wildcard-validation toggle previously keyed off a self-certified manifest flag (replaySafeWildcardValidator), which was an unclear tri-state config for agent authors. Replace it with a runtime capability probe: the host loads the agent and checks whether it exposes a validateWildcardMatch method (canValidateWildcards). - Drop the replaySafeWildcardValidator field from AppAgentManifest and the two manifests that set it (list, timer). - Rename the RPC and runtime plumbing isReplaySafe -> canValidateWildcards end to end (core runtime/protocol, studio-service host/resolver/handler, client, stub, spec). The default validator resolver no longer gates on the manifest. - Webview: disable the toggle (with a neutral 'no validator to run' note) when the agent has no validator; otherwise enable it, default off, and show a caution warning only when it is turned on (the real validator may have side effects or be non-deterministic, so replay results may not reproduce). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a single 'Roadmap at a glance' section to STATUS.md (gate spine A-E plus a tagged off-critical-path backlog) so depth (L4b) and breadth (multi-variant/multi-agent) are backlog rows under the gates rather than a parallel plan. Promote the L4b sandbox-convergence and multi-variant compare design docs into the plan folder and index them in README. Refresh the stale top-of-file branch note. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…me corpus capture as agent-agnostic The per-side fidelity matrix (L4b Step 1) is already built and shipped, so split that backlog row out as done and leave only the optional Sandbox A/B relabel. Reword 'player corpus capture' to the general 'corpus capture' capability (works for any agent); player is only the anchor set Gate C is scored on. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Advances the "find a regression" journey by making replay's fidelity explicit and
the Impact Report controls easier to reason about — without adding heavy UI text.
Critical changes
Two-mode replay. The Impact Report can replay in
nfa-grammarmode (bothsides match the compiled grammar only, kept A/B-symmetric) or
completionBased-cachemode (the working-tree side consults the liveconstruction cache first, the way the dispatcher would).
Wildcard validation. An opt-in pass that runs the
agent's real
validateWildcardMatchover wildcard matches and drops the ones itrejects, for allowlisted agents only (player removed from the allowlist).
Fidelity transparency matrix. Each run reports a per-side matrix of which
deterministic layers (grammar, schema enrichment, construction cache, wildcard
validation, dispatch) actually ran on A vs B, with a status icon and a hover
reason — so the report is honest about exactly what it exercised. Backed by a new
deriveSideFidelityin the core runtime.Replay-options popover. The Cache and Wildcard-validation toggles move behind
a gear "Replay options" popover with VS Code-themed on/off pill switches (closes
on outside-click/Escape, disabled mid-run). Tooltips trimmed to one concise line
each.
Corpora auto-refresh. A
FileSystemWatcheron**/*.utterances.jsonlrefreshes the Corpora tree on corpus create/change/delete, fixing the stale tree
after "Seed in-repo corpus…" + save.
Supporting changes
studio-service → default-agent-provider → studio-agent → studio-service) by moving thedefault-agent-providerdependency to the bundling extension and resolving it via a tolerant,
variable-indirected dynamic
import()(kept external in the bundle).STATUS.mdto reflect two-mode replay and wildcardvalidation.
Testing
Studio suite green (187 tests); core fidelity spec added (16 tests); full
pnpm buildclean; typecheck and Prettier clean.22 files changed (+2,492 / −40), 6 commits.