Troubleshooting · Outages

MCP endpoint not responding — diagnostic checklist

Your MCP endpoint is answering (or not), an agent is timing out, and you need to know where the failure is. Walk this checklist from the outside in — transport, then TLS, then JSON-RPC, then MCP semantics, then auth.

TL;DR

Test in this order: DNS → TCP → TLS → HTTP status → JSON-RPC parse → initialize → tools/list → auth on a real tool call. The first layer that fails is your problem. Most "not responding" reports resolve at the JSON-RPC or auth layer, not the transport layer — which is why HTTP-only monitors miss them.

Step 1 — DNS and TCP

Run dig +short your-server.com and nc -vz your-server.com 443. If DNS fails or TCP is refused, the server's host is down or firewalled. This is the only layer a classical uptime monitor reliably catches — everything past here requires protocol awareness.

Step 2 — TLS

openssl s_client -connect your-server.com:443 -servername your-server.com. Look for Verify return code: 0 (ok). Common failures: expired cert (especially around auto-renewal boundaries), wrong SNI, or a cert that doesn't include your-server.com in SANs. This is frequently the culprit after a migration or a registrar change.

Step 3 — HTTP surface

curl -sv -X POST https://your-server.com/mcp -H 'content-type: application/json' -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{},"clientInfo":{"name":"diag","version":"1"}}}'. If you get HTTP 4xx on POST with a correct body: routing or method filter is wrong. If you get HTTP 200 with an empty or non-JSON body: your server is accepting the request but crashing before writing the response.

Step 4 — JSON-RPC parse

Pipe the response through jq. You should see a result key with protocolVersion, serverInfo, and capabilities. If you see error.code -32700 (Parse error): your server can't deserialize its own inputs. If error.code -32601 (Method not found): the server doesn't implement initialize, which means it isn't an MCP server in any meaningful sense.

Step 5 — MCP semantics

With the initialize response in hand, send tools/list. If this returns an error: a backend dependency (DB, API key, memory cache) is broken in a way the initialize handler happened to survive. If it returns with fewer tools than expected: a deploy silently removed one — this is the classic "schema drift" failure and agents will fail in confusing ways downstream. See our health check reference for what to hash and alert on.

Step 6 — Auth on a real call

Invoke one low-impact tool end-to-end (something idempotent and cheap). A common failure mode is that initialize and tools/list are unauthenticated but real calls require a bearer token; the server looks healthy in any naive probe, and every downstream call fails on auth. A working auth path is part of "responsive."

How to not find out from a user next time

If the above walk surfaced the problem, it means you didn't have a continuous probe that does this. That's what AliveMCP exists for — we run steps 3–5 against every public MCP endpoint every 60 seconds, and we run all six steps against private endpoints on the Author and Team tiers with Slack + webhook alerts. Claim your listing on the AliveMCP home page, and the next "not responding" report comes from the monitor, not from a customer.

Get early access