Guide · Debugging
MCP server debugging
Debugging an MCP server is harder than debugging a REST API because failures happen at four independent layers — transport (TCP/TLS), HTTP, MCP protocol, and tool execution — and a failure at any layer looks similar from the outside: the AI agent can't complete its request. The standard REST debugging workflow (open Chrome DevTools, look at the network tab) doesn't apply to a JSON-RPC-over-SSE protocol. You need layer-specific tools: the MCP Inspector for local development, protocol-level message logging for session diagnosis, structured log queries for production, and the AliveMCP probe history to understand which layer failed and when.
TL;DR
For local development: use MCP Inspector (npx @modelcontextprotocol/inspector) to interactively call your server's tools and inspect the raw JSON-RPC messages. For protocol-layer debugging: enable the MCP SDK's built-in transport logging with DEBUG=mcp:*. For production: query structured logs by session_id to trace a full session, filter by error_code to find failure patterns, and read the AliveMCP probe history to identify which layer failed first. The probe history gives you MTTD (when did it break?) and layer attribution (transport vs HTTP vs initialize vs tools/list) that logs alone can't provide.
MCP Inspector for local debugging
The official MCP Inspector is the fastest way to test your server locally. It's a browser-based UI that connects to your server, runs the full MCP session lifecycle, and lets you call tools interactively with a form interface:
# Install and run against your local server
npx @modelcontextprotocol/inspector http://localhost:3001/mcp
# For stdio servers:
npx @modelcontextprotocol/inspector node /path/to/your/server.js
The Inspector shows you the raw JSON-RPC messages for every request and response, which is essential for debugging protocol compliance issues. If your server returns an initialize response that's missing the capabilities field, you'll see it immediately in the message pane — the same field that AliveMCP's production probe validates. The Inspector's tool call form also shows you the exact inputSchema your server advertises, which is useful for catching schema drift before running a tools/list snapshot comparison.
Three things to check with the Inspector before considering a local server debugged:
- The
initializeresponse hasprotocolVersion,capabilities, andserverInfo.name. - The
tools/listresponse lists all expected tools with correct input schemas. - Each tool call with minimal valid input returns a result, not a JSON-RPC error.
Protocol-level message logging
When the Inspector doesn't reproduce the issue (production-only or client-specific bugs), enable the MCP SDK's transport-level debug logging. This logs every raw JSON-RPC message to stderr:
# Node.js MCP SDK — enable transport debug logging
DEBUG=mcp:* node dist/index.js
# This logs every message: initialize requests, tools/list responses,
# tool call requests, results, and errors — before any application
# logic runs. Useful for diagnosing:
# - Malformed JSON from the client
# - Missing fields in your server's responses
# - Unexpected method names sent by specific clients
# - Session establishment failures
# WARNING: This logs the full JSON-RPC message body.
# Tool call arguments and results will appear in the output.
# Never enable in production — debug tool call arguments and
# results may contain PII or sensitive user data.
# Use only in local or staging environments with synthetic data.
The DEBUG=mcp:* output will show you the exact byte sequence the client sends and the exact byte sequence your server returns. This is the correct tool for diagnosing "works with Claude Desktop but not with the Anthropic Agent SDK" — the two clients send slightly different initialize parameters, and seeing both raw messages side-by-side makes the difference obvious.
Production diagnosis by failure layer
AliveMCP categorizes every probe failure by the layer where it occurred. The layer tells you immediately what class of problem you're debugging:
Layer 1: Transport failure (TCP/TLS)
Probe gets connect ECONNREFUSED, ETIMEDOUT, or a TLS certificate error. The server is unreachable at the network level.
- Check if the container/process is running:
flyctl status --app your-appor equivalent. - Check if the port binding is correct — the process must listen on
0.0.0.0, not127.0.0.1, inside a container. - Check TLS certificate validity and expiry:
curl -v https://your-domain.com/mcp 2>&1 | grep -E 'SSL|certificate|expire'. - Check your CDN/reverse proxy is passing traffic through — a Cloudflare misconfiguration can produce transport failures without the origin server changing at all.
Layer 2: HTTP failure (4xx / 5xx)
TCP connects but the server returns an HTTP error code instead of a JSON-RPC response.
- 401/403 — authentication is required but the probe's credentials are wrong or missing. Check your server's auth middleware and the probe's authentication configuration.
- 404 — the MCP endpoint path is wrong. Check your routing configuration. The path AliveMCP probes is the one you registered — if you changed
/mcpto/api/mcp, update the monitoring config. - 502/503/504 — reverse proxy (Caddy, nginx, Fly.io's proxy layer) can't reach the upstream server. The container may have crashed or is taking too long to start. Check container health logs:
flyctl logs --app your-app. - 500 — the server started but the HTTP request handler threw an unhandled exception. Check the server's error logs for the exception stack trace.
Layer 3: Protocol failure (initialize returns error)
HTTP 200 but the initialize response is malformed or returns a JSON-RPC error. This is almost always a code bug introduced by a recent deploy.
- Run
curl -X POST https://your-domain.com/mcp -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"debug","version":"1"}}}'to see the raw response. - Compare the response against the MCP spec: does it have
result.protocolVersion,result.capabilities,result.serverInfo.name? - Check if the response is an error object (
{"error":{"code":...}}) instead of a result. The error code indicates which handler failed.
Layer 4: Tool surface failure (tools/list empty or error)
Initialize succeeds but tools/list returns an empty array or a JSON-RPC error.
- Check startup logs for tool registration errors — a tool whose handler threw during registration may not appear in the list.
- Check if a dependency injection issue is causing tools to be registered conditionally based on environment variables that aren't set in production.
- Run the MCP Inspector against production (if accessible) to see exactly what
tools/listreturns live.
Structured log queries for production diagnosis
When AliveMCP reports a failure, the first thing to check is your structured logs. A session that failed mid-flight leaves a trail:
# Query pattern for diagnosing a session failure (Loki LogQL, adapt for your aggregator)
# All events for a specific session:
{app="mcp-server"} | json | session_id="sess_abc123"
# All errors in the last 1 hour:
{app="mcp-server"} | json | level="error"
# Tool calls slower than 2 seconds (likely downstream timeout):
{app="mcp-server"} | json | tool_name!="" | duration_ms > 2000
# Failed initializations (probe or real client):
{app="mcp-server"} | json | msg="mcp.initialize" | error_code!="null"
# Find the session corresponding to an AliveMCP alert
# (AliveMCP probe has clientInfo.name = "AliveMCP"):
{app="mcp-server"} | json | client_name="AliveMCP" | level="error"
Cross-reference the timestamp of an AliveMCP alert with your server's log events at that same timestamp. If there are no log events at that timestamp, the server was down and producing no logs — the absence of logs is itself diagnostic. If there are error-level log events, read the stack traces.
Debugging TypeScript MCP servers with the Node.js debugger
For TypeScript servers, attach the Node.js debugger to step through tool handler code without resorting to console.log printf debugging:
# Start the server in debug mode (breaks before first line)
node --inspect-brk --loader ts-node/esm src/index.ts
# Or for a built server:
node --inspect dist/index.js
# Open Chrome and navigate to chrome://inspect
# Click "Open dedicated DevTools for Node"
# Set breakpoints in the Sources tab
For production-grade debugging of a live server without breakpoints, use the --inspect flag (not --inspect-brk) and connect only when actively debugging. The inspector port (default 9229) should never be exposed publicly — bind to 127.0.0.1 and use SSH tunneling if you need to connect to a remote server. See MCP server TypeScript for TypeScript-specific tooling and type safety patterns for tool handlers.
Related questions
How do I reproduce a production issue locally?
Production-only issues are usually caused by: missing environment variables, production-specific data shapes, or different client behavior (production uses Claude Desktop; local dev uses MCP Inspector). Reproduce by: (1) confirming all production env vars are set in your local .env; (2) using the exact same request payload from production logs; (3) running MCP Inspector with the same protocol version as the failing client. If the issue is client-specific, add a test that mimics that client's initialize parameters. See MCP server testing for the client-simulation pattern.
How do I debug a server that works locally but fails in production?
The most common cause is a missing environment variable that's set locally in .env but not in the production secret store. Enable startup validation that throws with a specific missing-variable message — the error will appear in your deploy logs. Second most common: a port binding to 127.0.0.1 instead of 0.0.0.0 inside a container. Third: a Node.js version mismatch between local and production that causes subtle behavior differences. Check node --version locally against the version in your Dockerfile or platform's Node.js image tag.
My server passes the initialize probe but tool calls fail. How do I diagnose this?
Tool call failures after a passing initialize are almost always application-layer issues: wrong API key, database connection failure, or a bug in the tool handler. The AliveMCP probe only verifies initialize and tools/list — it doesn't call your tools with real arguments. Check tool.error log lines for the error message and error code. If error_code = "ECONNREFUSED" on a tool that calls an external API, the downstream API is unreachable from production. Run a connectivity test from inside your production container: flyctl ssh console --app your-app -- curl -I https://api.openai.com.
Further reading
- MCP server logging — structured log format and session context propagation
- MCP server testing — protocol compliance and schema snapshot tests
- MCP server TypeScript — type safety and SDK tooling
- MCP server tracing — distributed traces across the four protocol layers
- MCP server deployment — post-deploy verification checklist
- MCP server health checks — what the probe validates at each layer
- JSON-RPC health checks vs HTTP probes — why HTTP-only monitoring misses protocol failures
- AliveMCP — probe history with per-layer failure attribution