Guide · Debugging

MCP server debugging

Debugging an MCP server is harder than debugging a REST API because failures happen at four independent layers — transport (TCP/TLS), HTTP, MCP protocol, and tool execution — and a failure at any layer looks similar from the outside: the AI agent can't complete its request. The standard REST debugging workflow (open Chrome DevTools, look at the network tab) doesn't apply to a JSON-RPC-over-SSE protocol. You need layer-specific tools: the MCP Inspector for local development, protocol-level message logging for session diagnosis, structured log queries for production, and the AliveMCP probe history to understand which layer failed and when.

TL;DR

For local development: use MCP Inspector (npx @modelcontextprotocol/inspector) to interactively call your server's tools and inspect the raw JSON-RPC messages. For protocol-layer debugging: enable the MCP SDK's built-in transport logging with DEBUG=mcp:*. For production: query structured logs by session_id to trace a full session, filter by error_code to find failure patterns, and read the AliveMCP probe history to identify which layer failed first. The probe history gives you MTTD (when did it break?) and layer attribution (transport vs HTTP vs initialize vs tools/list) that logs alone can't provide.

MCP Inspector for local debugging

The official MCP Inspector is the fastest way to test your server locally. It's a browser-based UI that connects to your server, runs the full MCP session lifecycle, and lets you call tools interactively with a form interface:

# Install and run against your local server
npx @modelcontextprotocol/inspector http://localhost:3001/mcp

# For stdio servers:
npx @modelcontextprotocol/inspector node /path/to/your/server.js

The Inspector shows you the raw JSON-RPC messages for every request and response, which is essential for debugging protocol compliance issues. If your server returns an initialize response that's missing the capabilities field, you'll see it immediately in the message pane — the same field that AliveMCP's production probe validates. The Inspector's tool call form also shows you the exact inputSchema your server advertises, which is useful for catching schema drift before running a tools/list snapshot comparison.

Three things to check with the Inspector before considering a local server debugged:

  1. The initialize response has protocolVersion, capabilities, and serverInfo.name.
  2. The tools/list response lists all expected tools with correct input schemas.
  3. Each tool call with minimal valid input returns a result, not a JSON-RPC error.

Protocol-level message logging

When the Inspector doesn't reproduce the issue (production-only or client-specific bugs), enable the MCP SDK's transport-level debug logging. This logs every raw JSON-RPC message to stderr:

# Node.js MCP SDK — enable transport debug logging
DEBUG=mcp:* node dist/index.js

# This logs every message: initialize requests, tools/list responses,
# tool call requests, results, and errors — before any application
# logic runs. Useful for diagnosing:
# - Malformed JSON from the client
# - Missing fields in your server's responses
# - Unexpected method names sent by specific clients
# - Session establishment failures

# WARNING: This logs the full JSON-RPC message body.
# Tool call arguments and results will appear in the output.
# Never enable in production — debug tool call arguments and
# results may contain PII or sensitive user data.
# Use only in local or staging environments with synthetic data.

The DEBUG=mcp:* output will show you the exact byte sequence the client sends and the exact byte sequence your server returns. This is the correct tool for diagnosing "works with Claude Desktop but not with the Anthropic Agent SDK" — the two clients send slightly different initialize parameters, and seeing both raw messages side-by-side makes the difference obvious.

Production diagnosis by failure layer

AliveMCP categorizes every probe failure by the layer where it occurred. The layer tells you immediately what class of problem you're debugging:

Layer 1: Transport failure (TCP/TLS)

Probe gets connect ECONNREFUSED, ETIMEDOUT, or a TLS certificate error. The server is unreachable at the network level.

Layer 2: HTTP failure (4xx / 5xx)

TCP connects but the server returns an HTTP error code instead of a JSON-RPC response.

Layer 3: Protocol failure (initialize returns error)

HTTP 200 but the initialize response is malformed or returns a JSON-RPC error. This is almost always a code bug introduced by a recent deploy.

Layer 4: Tool surface failure (tools/list empty or error)

Initialize succeeds but tools/list returns an empty array or a JSON-RPC error.

Structured log queries for production diagnosis

When AliveMCP reports a failure, the first thing to check is your structured logs. A session that failed mid-flight leaves a trail:

# Query pattern for diagnosing a session failure (Loki LogQL, adapt for your aggregator)

# All events for a specific session:
{app="mcp-server"} | json | session_id="sess_abc123"

# All errors in the last 1 hour:
{app="mcp-server"} | json | level="error"

# Tool calls slower than 2 seconds (likely downstream timeout):
{app="mcp-server"} | json | tool_name!="" | duration_ms > 2000

# Failed initializations (probe or real client):
{app="mcp-server"} | json | msg="mcp.initialize" | error_code!="null"

# Find the session corresponding to an AliveMCP alert
# (AliveMCP probe has clientInfo.name = "AliveMCP"):
{app="mcp-server"} | json | client_name="AliveMCP" | level="error"

Cross-reference the timestamp of an AliveMCP alert with your server's log events at that same timestamp. If there are no log events at that timestamp, the server was down and producing no logs — the absence of logs is itself diagnostic. If there are error-level log events, read the stack traces.

Debugging TypeScript MCP servers with the Node.js debugger

For TypeScript servers, attach the Node.js debugger to step through tool handler code without resorting to console.log printf debugging:

# Start the server in debug mode (breaks before first line)
node --inspect-brk --loader ts-node/esm src/index.ts

# Or for a built server:
node --inspect dist/index.js

# Open Chrome and navigate to chrome://inspect
# Click "Open dedicated DevTools for Node"
# Set breakpoints in the Sources tab

For production-grade debugging of a live server without breakpoints, use the --inspect flag (not --inspect-brk) and connect only when actively debugging. The inspector port (default 9229) should never be exposed publicly — bind to 127.0.0.1 and use SSH tunneling if you need to connect to a remote server. See MCP server TypeScript for TypeScript-specific tooling and type safety patterns for tool handlers.

Related questions

How do I reproduce a production issue locally?

Production-only issues are usually caused by: missing environment variables, production-specific data shapes, or different client behavior (production uses Claude Desktop; local dev uses MCP Inspector). Reproduce by: (1) confirming all production env vars are set in your local .env; (2) using the exact same request payload from production logs; (3) running MCP Inspector with the same protocol version as the failing client. If the issue is client-specific, add a test that mimics that client's initialize parameters. See MCP server testing for the client-simulation pattern.

How do I debug a server that works locally but fails in production?

The most common cause is a missing environment variable that's set locally in .env but not in the production secret store. Enable startup validation that throws with a specific missing-variable message — the error will appear in your deploy logs. Second most common: a port binding to 127.0.0.1 instead of 0.0.0.0 inside a container. Third: a Node.js version mismatch between local and production that causes subtle behavior differences. Check node --version locally against the version in your Dockerfile or platform's Node.js image tag.

My server passes the initialize probe but tool calls fail. How do I diagnose this?

Tool call failures after a passing initialize are almost always application-layer issues: wrong API key, database connection failure, or a bug in the tool handler. The AliveMCP probe only verifies initialize and tools/list — it doesn't call your tools with real arguments. Check tool.error log lines for the error message and error code. If error_code = "ECONNREFUSED" on a tool that calls an external API, the downstream API is unreachable from production. Run a connectivity test from inside your production container: flyctl ssh console --app your-app -- curl -I https://api.openai.com.

Further reading