Skip to content

Run safe-outputs MCP in the gh-aw node container#39100

Open
Copilot wants to merge 4 commits into
mainfrom
copilot/configure-safe-outputs-mcp-server
Open

Run safe-outputs MCP in the gh-aw node container#39100
Copilot wants to merge 4 commits into
mainfrom
copilot/configure-safe-outputs-mcp-server

Conversation

Copilot AI commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

This changes safe-outputs from a separately started HTTP sidecar into a first-class MCP server entry. The generated workflow now runs safe-outputs inside the gh-aw-node container with write access to the workspace, safe-outputs runtime files, and the safe-outputs log directory.

  • What changed

    • Replaced the generated safe-outputs HTTP server config with a containerized stdio MCP server definition.
    • Added a dedicated published container constant for ghcr.io/github/gh-aw-node.
    • Mounted the required writable paths directly into the MCP server:
      • workspace
      • ${RUNNER_TEMP}/gh-aw/safeoutputs
      • /tmp/gh-aw/mcp-logs/safeoutputs
  • Workflow generation

    • Removed the dedicated “generate safe-outputs server config” / “start safe-outputs HTTP server” flow.
    • Kept safe-outputs runtime file generation (config.json, tools.json) but made those files inputs to the MCP container instead of a host-side server process.
    • Dropped the now-unused safe-outputs port/API-key wiring from generated env and gateway setup.
  • MCP server model

    • Safe-outputs is now emitted alongside the rest of the MCP server list instead of being bootstrapped out-of-band.
    • The generated MCP entry passes through the runtime env needed by the JS implementation (workspace, runner temp, safe-outputs config/tools paths, log dir, repo context).
  • Generated shape

    {
      "safeoutputs": {
        "container": "ghcr.io/github/gh-aw-node",
        "mounts": [
          "${GITHUB_WORKSPACE}:${GITHUB_WORKSPACE}:rw",
          "${RUNNER_TEMP}/gh-aw/safeoutputs:${RUNNER_TEMP}/gh-aw/safeoutputs:rw",
          "/tmp/gh-aw/mcp-logs/safeoutputs:/tmp/gh-aw/mcp-logs/safeoutputs:rw"
        ],
        "args": ["-w", "${GITHUB_WORKSPACE}"]
      }
    }
  • Related updates

    • Updated renderer and setup-generation expectations to reflect stdio container transport instead of HTTP transport.
    • Refreshed the affected golden outputs and focused MCP/safe-outputs workflow expectations.

Changeset\n\n- Type: patch\n- Description: Run safe-outputs as a containerized MCP server inside the runtime instead of a separate HTTP sidecar.

Generated by 📋 Changeset Generator for issue #39100 · 13.6 AIC · ⊞ 14.8K ·



✨ PR Review Safe Output Test - Run 27471836462

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 84.9 AIC · ⌖ 7.94 AIC · ⊞ 8.1K ·

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Configure safe-outputs MCP in gh-aw node container Run safe-outputs MCP in the gh-aw node container Jun 13, 2026
Copilot AI requested a review from pelikhan June 13, 2026 15:42
@pelikhan pelikhan marked this pull request as ready for review June 13, 2026 16:04
Copilot AI review requested due to automatic review settings June 13, 2026 16:04
@pelikhan pelikhan added the smoke label Jun 13, 2026
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

💫 TO BE CONTINUED... Smoke Claude failed to deliver outputs! Our hero faces unexpected challenges...

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

🚀 Smoke Antigravity MISSION COMPLETE! Antigravity has spoken. ✨

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ failed to deliver outputs during design decision gate check.

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready.

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

⚠️ Smoke Gemini failed. Gemini encountered unexpected challenges...

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

🚀 Smoke Pi MISSION COMPLETE! Pi delivered. 🥧

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

🌑 The shadows whisper... Smoke Codex failed to deliver outputs. The oracle requires further meditation...

@github-actions

Copy link
Copy Markdown
Contributor

📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing...

@github-actions

Copy link
Copy Markdown
Contributor

📰 BREAKING: Smoke Copilot - AOAI (Entra) is now investigating this pull request. Sources say the story is developing...

@github-actions

Copy link
Copy Markdown
Contributor

📰 BREAKING: Smoke Copilot - AOAI (apikey) is now investigating this pull request. Sources say the story is developing...

@github-actions

Copy link
Copy Markdown
Contributor

Agent Container Tool Check

Tool Status Version
bash 5.2.21
sh available
git 2.54.0
jq 1.7
yq 4.53.3
curl 8.5.0
gh 2.93.0
node 22.22.3
python3 3.11.15 (PyPy 7.3.23)
go 1.24.13
java 21.0.11 (Temurin)
dotnet 10.0.300

Result: 12/12 tools available ✅

Overall Status: PASS

🔧 Tool validation by Agent Container Smoke Test · 40.1 AIC · ⌖ 12.6 AIC · ⊞ 28.1K ·

@github-actions

Copy link
Copy Markdown
Contributor

Warning

The push_to_pull_request_branch operation failed: SECURITY: Post-apply file-protection check failed. The patch applied files that were not detected by the pre-apply parser: .changeset/patch-containerize-safe-outputs-mcp-server.md. This may indicate a patch-parser bypass attempt. Aborting push.. The code changes were not applied.

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (233 new lines across pkg/ business directories) but does not have a linked Architecture Decision Record (ADR).

📄 Draft ADR committed: docs/adr/39100-run-safe-outputs-mcp-as-containerized-stdio-server.md — review and complete it before merging.

🔒 This PR cannot merge until an ADR is linked in the PR body.

📋 What to do next
  1. Review the draft ADR committed to your branch — it was generated from the PR diff
  2. Complete the missing sections — resolve the [TODO: verify] note on Alternative 2, refine the decision rationale, and confirm the real alternatives you considered
  3. Commit the finalized ADR to docs/adr/ on your branch
  4. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-39100: Run safe-outputs MCP as a containerized stdio server

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

❓ Why ADRs Matter

"AI made me procrastinate on key design decisions. Because refactoring was cheap, I could always say I will deal with this later. Deferring decisions corroded my ability to think clearly."

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 0042-use-postgresql.md for PR #42).

🔒 This gate blocks merge until an ADR is linked in the PR body.

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · 79 AIC · ⌖ 9.78 AIC · ⊞ 13.8K ·

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Codex - 27471836485

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex · 85.6 AIC · ⊞ 10.6K ·

@github-actions

Copy link
Copy Markdown
Contributor

Comment Memory

Silent caches wake
Builds and browser lights align
Smoke test leaves a trail

Note

This comment is managed by comment memory.

It stores persistent context for this thread in the code block at the top of this comment.
Edit only the text inside the backtick fences; workflow metadata and the footer are regenerated automatically.

Learn more about comment memory

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex · 85.6 AIC · ⊞ 10.6K ·

@github-actions

Copy link
Copy Markdown
Contributor

💥 Smoke Test: Claude — Run 27471836462

Core #1-12: ✅✅✅✅✅✅✅✅✅✅✅✅
PR Review #13-19: 13✅ 14✅ 15✅ 16✅ 17✅ 18✅ 19⚠️(skipped)

Overall: PARTIAL (all run; #19 close-PR skipped — no safe test PR). Claude engine nominal! 🚀

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 84.9 AIC · ⌖ 7.94 AIC · ⊞ 8.1K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥 Automated smoke test review - all systems nominal!

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • android.clients.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 84.9 AIC · ⌖ 7.94 AIC · ⊞ 8.1K

// Using node:lts-alpine provides the latest LTS version with minimal footprint
const DefaultNodeAlpineLTSImage = "node:lts-alpine"

// DefaultGhAwNodeImage is the published gh-aw Node container image used for

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice — documenting DefaultGhAwNodeImage with the rationale for needing Node.js + git is helpful for future maintainers. Consider noting the expected tag/version policy here too.

const DefaultWorkspaceMount = "\\${GITHUB_WORKSPACE}:\\${GITHUB_WORKSPACE}:rw"

// DefaultSafeOutputsMount is the mount path for safe-outputs runtime files
// (config.json, tools.json, outputs.jsonl, upload-artifacts/) in the runner temp directory.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safe-outputs mount constants are clear. Good use of GhAwRootDirShell for consistency across mounts.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /grill-with-docs, /improve-codebase-architecture, /tdd, and /zoom-out — commenting with improvement suggestions; no blocking correctness issues found.

📋 Key Themes & Highlights

Key Themes

  • Stale comments: The renderSafeOutputsTOML function comment still says "HTTP transport instead of stdio" — the exact opposite of what the PR implements.
  • Test name drift: TestSafeOutputsAPIKeyImmediateMasking was not renamed despite its body and doc comment being completely rewritten to test the absence of an API key.
  • Dual sources of truth for env vars: TOML renderer hardcodes a raw env_vars string; JSON renderer uses a typed struct slice. They agree today across 14 vars, but will silently diverge on the next addition.
  • Untagged container image: ghcr.io/github/gh-aw-node with no version tag contrasts with the gateway pin (v0.3.25). Worth aligning policy or documenting the intent.
  • Unexplained escaping asymmetry in TOML: args uses $GITHUB_WORKSPACE while mounts (via DefaultWorkspaceMount) uses \${GITHUB_WORKSPACE} — a brief comment would prevent future confusion.

Positive Highlights

  • ✅ Clean removal of ~53 lines of HTTP server bootstrapping (port, API key, bash export block) — significant complexity reduction.
  • RUNNER_TEMP is now correctly threaded through to the gateway docker command (golden file updates confirm this).
  • buildSafeOutputsConfigRuntimeEnvVars / buildSafeOutputsConfigRuntimeData are still used for config.json generation — the regression concern (dynamic GH_AW_INPUT_* vars) is correctly handled by pre-resolving into config.json before the container starts.
  • ✅ New mount constants (DefaultSafeOutputsMount, DefaultSafeOutputsLogMount) are well-named and follow existing conventions.
  • ✅ Tests properly assert the new behaviour (container image, mounts, absence of HTTP artefacts) and the renamed TestSafeOutputsMCPContainerUsesRuntimePaths is a good improvement.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 585.4 AIC · ⌖ 40.9 AIC · ⊞ 29.4K

yaml.WriteString(" \n")
yaml.WriteString(" [mcp_servers." + constants.SafeOutputsMCPServerID.String() + ".headers]\n")
yaml.WriteString(" Authorization = \"$GH_AW_SAFE_OUTPUTS_API_KEY\"\n")
yaml.WriteString(" container = \"" + constants.DefaultGhAwNodeImage + "\"\n")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] Stale function comment one line above: // Now uses HTTP transport instead of stdio, similar to mcp-scripts now says the exact opposite of what the function does.

💡 Suggested fix

Change the comment to:

// renderSafeOutputsTOML generates Safe Outputs MCP configuration in TOML format.
// Uses containerized stdio transport running in the published gh-aw node image.

Left unchanged, this misleads the next reader into thinking this function configures HTTP, which is the old approach being replaced.

// is masked immediately after generation, before any other operations.
// TestSafeOutputsNoLongerGeneratesHTTPAPIKey verifies that safe-outputs no longer
// creates an HTTP server API key now that it runs as a stdio MCP container.
func TestSafeOutputsAPIKeyImmediateMasking(t *testing.T) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Test name TestSafeOutputsAPIKeyImmediateMasking no longer describes what it tests. The updated doc comment above says it now verifies the absence of an HTTP API key, but the function name still implies it is testing masking behaviour.

💡 Suggested rename
func TestSafeOutputsNoLongerGeneratesHTTPAPIKey(t *testing.T) {

Test names are the first thing a failing CI shows. A reader hitting a failure in TestSafeOutputsAPIKeyImmediateMasking will look for API-key masking logic that no longer exists.

yaml.WriteString(" container = \"" + constants.DefaultGhAwNodeImage + "\"\n")
yaml.WriteString(" mounts = [\"" + constants.DefaultWorkspaceMount + "\", \"" + constants.DefaultSafeOutputsMount + "\", \"" + constants.DefaultSafeOutputsLogMount + "\"]\n")
yaml.WriteString(" args = [\"-w\", \"$GITHUB_WORKSPACE\"]\n")
yaml.WriteString(" env_vars = [\"DEBUG\", \"DEFAULT_BRANCH\", \"GH_AW_ASSETS_ALLOWED_EXTS\", \"GH_AW_ASSETS_BRANCH\", \"GH_AW_ASSETS_MAX_SIZE_KB\", \"GH_AW_MCP_LOG_DIR\", \"GH_AW_SAFE_OUTPUTS\", \"GH_AW_SAFE_OUTPUTS_CONFIG_PATH\", \"GH_AW_SAFE_OUTPUTS_TOOLS_PATH\", \"GITHUB_REPOSITORY\", \"GITHUB_SERVER_URL\", \"GITHUB_TOKEN\", \"GITHUB_WORKSPACE\", \"RUNNER_TEMP\"]\n")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] The TOML renderer hardcodes env_vars as a raw string literal, while the JSON renderer (renderSafeOutputsMCPConfigWithOptions) drives the same list from a typed struct slice. These two sources currently agree on 14 vars, but they will drift the moment someone adds a var to one and forgets the other.

💡 Suggested approach

Extract a shared slice in mcp_renderer_builtin.go:

var safeOutputsEnvVarNames = []string{
    "DEBUG", "DEFAULT_BRANCH", "GH_AW_ASSETS_ALLOWED_EXTS",
    "GH_AW_ASSETS_BRANCH", "GH_AW_ASSETS_MAX_SIZE_KB",
    "GH_AW_MCP_LOG_DIR", "GH_AW_SAFE_OUTPUTS",
    "GH_AW_SAFE_OUTPUTS_CONFIG_PATH", "GH_AW_SAFE_OUTPUTS_TOOLS_PATH",
    "GITHUB_REPOSITORY", "GITHUB_SERVER_URL", "GITHUB_TOKEN",
    "GITHUB_WORKSPACE", "RUNNER_TEMP",
}

The TOML renderer builds its env_vars line from this slice; the JSON renderer iterates the same slice. A future add is then a single-line change with no risk of format-specific drift.

// DefaultGhAwNodeImage is the published gh-aw Node container image used for
// JavaScript-based MCP servers that need Node.js and git plus workspace-mounted
// repository scripts at runtime.
const DefaultGhAwNodeImage = "ghcr.io/github/gh-aw-node"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] ghcr.io/github/gh-aw-node has no version tag, implying latest. The gateway image is pinned (ghcr.io/github/gh-aw-mcpg:v0.3.25). For a safety-critical MCP server that handles write operations (issues, PRs, assets), floating on latest means a silent image change could alter tool behaviour between runs.

💡 Options
  • Pin a version tag (preferred for production stability): "ghcr.io/github/gh-aw-node:v1.2.3" — same pattern as the gateway.
  • Document the intent if latest is deliberate (e.g. the image is always forward-compatible and the team controls both the image and this constant together).

Either way, a comment explaining the versioning policy would help reviewers of future PRs understand why this differs from the gateway pin.

yaml.WriteString(" Authorization = \"$GH_AW_SAFE_OUTPUTS_API_KEY\"\n")
yaml.WriteString(" container = \"" + constants.DefaultGhAwNodeImage + "\"\n")
yaml.WriteString(" mounts = [\"" + constants.DefaultWorkspaceMount + "\", \"" + constants.DefaultSafeOutputsMount + "\", \"" + constants.DefaultSafeOutputsLogMount + "\"]\n")
yaml.WriteString(" args = [\"-w\", \"$GITHUB_WORKSPACE\"]\n")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] The TOML args field uses unescaped $GITHUB_WORKSPACE, while the JSON args at line 241 always uses \${GITHUB_WORKSPACE} (backslash-escaped) regardless of engine. Meanwhile, TOML mounts uses the backslash-escaped DefaultWorkspaceMount constant. The asymmetry within TOML (escaped mounts, unescaped args) is not explained and could mislead the next person editing either renderer.

💡 Suggestion

Add a short inline comment documenting the intent:

// TOML: args expand shell vars directly; mounts use \${...} for gateway-side expansion.
yaml.WriteString("          args = [\"-w\", \"$GITHUB_WORKSPACE\"]\n")

Or if the two renderers should use identical escaping for args, align them so the behaviour is consistent and auditable from a single place.

@github-actions

Copy link
Copy Markdown
Contributor

Run safe-outputs MCP in the gh-aw node container
PASS — §27471858644
1✅ 2✅ 3✅ 4✅ 5✅
6✅ 7✅ 8✅ 9✅ 10✅
11✅ 12✅ 13✅ 14✅ 15✅
Author: app/copilot-swe-agent
Assignees: pelikhan, Copilot

Warning

Firewall blocked 5 domains

The following domains were blocked by the firewall during workflow execution:

  • accounts.google.com
  • clients2.google.com
  • contentautofill.googleapis.com
  • safebrowsingohttpgateway.googleapis.com
  • www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 250.4 AIC · ⌖ 23.4 AIC · ⊞ 20.4K ·

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 83/100 — Excellent

Analyzed 12 test(s): 12 design, 0 implementation, 0 guideline violation(s). This PR migrates safe-outputs from HTTP transport to containerized stdio MCP; all tests faithfully track the new behavioral contracts.

📊 Metrics & Test Classification (12 tests analyzed)
Metric Value
New/modified tests analyzed 12
✅ Design tests (behavioral contracts) 12 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 9 (75%)
Duplicate test clusters 0
Test inflation detected Yes (context: migration/simplification PR — see notes)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Notes
TestSafeOutputsAPIKeyImmediateMasking mcp_api_key_masking_test.go ✅ Design Negative: no HTTP config/server; positive: container present
TestRenderSafeOutputsMCPConfigWithOptions mcp_config_refactor_test.go ✅ Design Table-driven (Copilot & Claude); updated expected/unexpected strings
TestRenderSafeOutputsMCPConfigTOML mcp_config_refactor_test.go ✅ Design TOML rendering contract; asserts container fields, negates HTTP fields
TestRenderSafeOutputsMCPConfigTOMLStableAcrossSandboxModes (new) mcp_config_refactor_test.go ✅ Design 3 subtests (nil, agent-enabled, agent-disabled); sandbox-invariant container config
TestRenderSafeOutputsMCP_JSON_Copilot mcp_renderer_test.go ✅ Design JSON contract: stdio type, container image, mounts, args, env vars
TestRenderSafeOutputsMCP_JSON_Claude mcp_renderer_test.go ✅ Design JSON contract: no type field, container, direct shell vars
TestRenderSafeOutputsMCP_TOML mcp_renderer_test.go ✅ Design TOML contract: container, mounts, args, env_vars; rejects HTTP fields
TestSafeOutputsMCPContainerUsesRuntimePaths (new) mcp_setup_generator_test.go ✅ Design Regression test (compile → read YAML); 9 assert/notContains with descriptive messages
TestSafeOutputsDynamicAllowedRepos safe_outputs_dynamic_allowed_repos_test.go ✅ Design Count threshold relaxed 2→1; env placeholder contract preserved
TestSafeOutputsDynamicAllowedReposSecrets safe_outputs_dynamic_allowed_repos_test.go ✅ Design Same relaxation for secrets env var propagation
TestSafeOutputsMCPServerIntegration safe_outputs_mcp_integration_test.go ✅ Design E2E integration: container, mounts, no dedicated HTTP server step, config file written
TestSafeOutputsMCPServerWithCodexEngine safe_outputs_mcp_integration_test.go ✅ Design Same contract for Codex/TOML engine variant

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 12 tests — unit (//go:build !integration) and integration (//go:build integration)
  • 🟨 JavaScript (*.test.cjs, *.test.js): 0 tests changed in this PR
i️ Notes — No Action Required (2 items)

Test Inflation (mechanical penalty applied, migration context)

mcp_config_refactor_test.go (+39) vs mcp_config_builtin.go (+13) = 3.0×; mcp_setup_generator_test.go (+22) vs mcp_setup_generator.go (+5) = 4.4×. Both exceed the 2:1 threshold, triggering the −10 pt mechanical penalty. However, this is expected for a migration PR where production code is heavily simplified (mcp_setup_generator.go net: +5/−58) while tests update all assertions from HTTP to container format. Not a genuine quality concern.

TestRenderSafeOutputsMCPConfigTOMLStableAcrossSandboxModes — Consider a Clarifying Comment

All 3 subtests assert the same positive invariant (container present, mounts stable, no host rewriting). A brief inline comment explaining that sandbox-mode host rewriting used to vary this output would make the test's intent clearer for future readers.

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). No coding-guideline violations detected. All 12 tests enforce behavioral contracts from the user's perspective.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27471832444

🧪 Test quality analysis by Test Quality Sentinel · 484.1 AIC · ⌖ 28.7 AIC · ⊞ 28.3K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 83/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 12 tests enforce behavioral contracts. No coding-guideline violations detected.

@github-actions

Copy link
Copy Markdown
Contributor

@copilot review all comments and address unresolved review feedback.
Fix the safe-outputs container env passthrough for dynamic placeholders, then rerun the smoke tests.

Generated by 👨‍🍳 PR Sous Chef · 50.2 AIC · ⌖ 1.01 AIC · ⊞ 17.4K ·

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot recompile all lock files

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

@copilot recompile all lock files

Done — recompiled all 246 lock files. Also fixed a strings.Indexstrings.Cut lint issue in mcp_scripts_http_codex_test.go that was blocking the pre-push validation.

@github-actions

Copy link
Copy Markdown
Contributor

@copilot review all comments and address unresolved review feedback.

Please resolve the open TOML escaping thread, then rebase/update from main and rerun checks.

Generated by 👨‍🍳 PR Sous Chef · 52.3 AIC · ⌖ 1.07 AIC · ⊞ 17.4K ·

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot merge main
Run pr-finisher skill

…e-outputs-mcp-server

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

@copilot merge main
Run pr-finisher skill

Merged main into the branch (clean, no conflicts). All local checks pass (fmt, lint, test-unit). CI needs a human re-trigger before merge.

@github-actions

Copy link
Copy Markdown
Contributor

``
CI needs a human re-trigger after the main merge; once it passes, this looks ready.

Generated by 👨‍🍳 PR Sous Chef · 42.1 AIC · ⌖ 1.02 AIC · ⊞ 17.4K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants