Reference · Health checks

MCP server health check

A correct MCP health check verifies three things: the protocol handshake succeeds, the tool registry responds without error, and the response envelope conforms to the MCP spec. Anything less is false-positive-prone.

TL;DR

Run this sequence every 60 seconds: initialize → verify protocolVersion and capabilitiestools/list → hash the returned schema → compare with last known hash. Alert on any JSON-RPC error, missing keys, or schema drift. AliveMCP does exactly this for every public MCP server, for free. Join the waitlist to add your private endpoints.

Why "HTTP 200" isn't a health check

MCP speaks JSON-RPC 2.0 over HTTP, SSE, or stdio. The transport layer and the protocol layer can fail independently: the transport can be fine while the protocol is broken, or the protocol can be fine while one tool is misconfigured. A health check that only inspects the transport is blind to the most common real-world failures — which is why MCP authors keep discovering their server has been silently broken for a week.

The probe sequence

  1. initialize request. Send a POST with body {"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18","capabilities":{},"clientInfo":{"name":"healthcheck","version":"1"}}}. Expect a result with protocolVersion, serverInfo.name, and capabilities.
  2. notifications/initialized send. Per spec, the client signals it's ready. A server that behaves differently after this signal reveals state-dependent bugs.
  3. tools/list request. Returns an array of {name, description, inputSchema}. Empty array is legal only if the server advertised no tools capability; otherwise it's a failure.
  4. Hash the schema. SHA-256 over the sorted, canonicalized list of (name, inputSchema). Store it. Compare next run — if it's different, emit a schema drift event (not always an alert; sometimes it's an intentional deploy).
  5. Measure latency. Record time-to-first-byte and time-to-complete. Keep p50/p95 rolling windows.

Alert signals ranked by urgency

Probe frequency

For agent-facing infrastructure, 60-second probes are the floor. Agents retry fast, and a five-minute-old cached status is a lifetime in conversation time. For cold internal tooling, 5-minute intervals are fine. Don't probe faster than 15 seconds unless you coordinate with the server author — it looks like abuse and can trigger rate limits.

How AliveMCP implements this

The public dashboard runs exactly the sequence above, every 60 seconds, against every MCP endpoint it discovers in MCP.so, Glama, PulseMCP, Smithery, the Official Registry, and GitHub. Private endpoints go on the Team tier at $49/mo with Slack + webhook alerts and a public status-page subdomain. Enterprise teams running 5-30 servers get SAML SSO, an audit log, and monthly SLA PDFs. See the live feed on the AliveMCP home page or review what's in each tier.

Get early access

Related questions

What HTTP method should my MCP accept for probes?

The spec uses POST for JSON-RPC envelopes. A monitor that uses GET will not correctly test the protocol — it'll hit whatever the server decides to serve on GET, which may be a landing page, a docs redirect, or a 405.

Should the probe execute a real tool call?

Usually no — tools/list exercises the same code paths as a call without the side effects. If a specific tool is your critical path (e.g. a payment tool), add a synthetic check for it, but keep it isolated from the main liveness probe.

How do I avoid rate-limiting my own server?

60s probes from a fixed set of AliveMCP IPs are designed to stay below any sane rate limit. If you use internal monitoring too, coordinate cadences or allowlist the probe's source by IP or JWT claim.

Further reading