Skip to content

MCP stdio server respawned in an unbounded tight loop (no backoff / no max-retry) in 1.0.61 #3782

@carlosmayol

Description

@carlosmayol

Summary

After updating to 1.0.61, a stdio MCP server began being spawned repeatedly in a tight loop — hundreds to thousands of child processes — without the CLI awaiting the prior process''s initialize handshake. There is no backoff, no max-retry cap, and no configuration knob to bound the behavior. This did not occur on the prior version.

Environment

  • Copilot CLI: 1.0.61 (Windows, WinGet GitHub.Copilot)
  • OS: Windows 11
  • Affected server type: stdio MCP server with a slow cold start

Reproduction

  1. Register a stdio MCP server whose startup is slow (multi-second cold start). In our case the launch shape was:
    { "command": "cmd", "args": ["/c", "npx", "-y", "@scope/some-mcp-server"] }
    (cmd -> npx -> node resolver -> node server, plus an npm "resolve latest" network round-trip on every spawn.)
  2. Start a session that exercises this server early/often.
  3. Observe the CLI spawn a new server process roughly every ~60 ms without waiting for the previous one to complete its handshake.

Observed

  • ~8,300 spawn events in a single session; log file grew to 72 MB.
  • Deep process trees accumulate because each attempt is itself a multi-process tree that never gets a chance to finish initializing before the next attempt fires.
  • Began immediately after the 1.0.61 update (binary timestamped the evening before the loop first appeared); not present on the prior release.

Suspected root cause

A retry-on-MCP-startup behavior appears to have been introduced or changed in 1.0.61 that:

  • does not await the in-flight process''s initialize response before retrying,
  • has no exponential backoff,
  • has no maximum retry count,
  • exposes no configuration to tune any of the above (nothing in the changelog or MCP config docs).

When a server''s spawn latency exceeds the (apparently very short / zero) retry interval, retries pile up unbounded.

Suggested fix

  1. Await the prior spawn''s handshake (or a per-attempt timeout) before issuing a retry.
  2. Add exponential backoff with a sane cap.
  3. Add a max-retry ceiling, then surface a clear error instead of looping.
  4. Optionally expose maxRetries / retryDelayMs / backoff in the MCP server config so operators can tune it.

Workaround

Eliminated the slow spawn so the handshake wins the race: globally install + pin the package and launch its entry point directly (node <path>/bin/server.js) instead of cmd /c npx -y .... Drops the launch from a 4-process tree + network check to a single process and stops the explosion — but this only masks the underlying unbounded-retry defect.

Evidence available

72 MB session log with ~8,300 timestamped spawn events (redacted excerpt available on request).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:mcpMCP server configuration, discovery, connectivity, OAuth, policy, and registry

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions