Guide · Monitoring

Monitoring an MCP server

Monitoring an MCP server means answering four questions continuously: is the protocol handshake succeeding, is the tool surface stable, are latencies inside your envelope, and are callers getting clean responses? Everything else is cosmetics.

TL;DR

The four signals worth monitoring on an MCP server: handshake success rate, tool-surface stability (count + schema hash), p95 response latency, and client-facing error rate. Anything you measure beyond those is usually noise. You can self-host a probe in under an hour with a cron and curl, or let AliveMCP do it across every public MCP server for free.

Why MCP monitoring is different from API monitoring

A conventional REST API has one natural health signal — an HTTP 2xx on a known route — and monitoring tools can piggyback on that. MCP doesn't. An MCP server over HTTP is a JSON-RPC endpoint where every call is a POST to the same URL, the method is in the body, and the meaning of "healthy" depends on which capabilities the server advertised. You can't monitor an MCP server by GETing its root and checking for 200. You have to speak the protocol.

The consequence: dropping a tool like UptimeRobot or generic Pingdom on an MCP endpoint tells you the HTTPS server is up. It tells you nothing about whether agents talking to it can initialize, list tools, or get valid responses. In our April-2026 audit of 2,181 public MCP endpoints, 91% failed at the protocol or tool layer while passing a generic HTTP probe.

The four signals

  1. Handshake success rate. Every probe runs an initialize request. Record success / failure. Alert when the rolling 5-minute success rate drops below 99% — most auth or deploy-breakage failures show up here first.
  2. Tool-surface stability. Call tools/list on every probe. Hash the sorted list of (tool_name, input_schema). Alert on a shrinking count (a registration crashed) and on unexpected hash changes outside release windows (a deploy broke a contract).
  3. p95 response latency. Track two latencies: time-to-first-byte on initialize, and total round-trip on tools/list. Baseline the 7-day rolling p95; alert on 3× deviation sustained over 3+ consecutive probes.
  4. Client-facing error rate. If you have access to the server's own logs (self-hosted), count the JSON-RPC error responses it emits per minute. A server that probes clean but is emitting errors to real clients is the worst failure mode — your dashboard says green while users see nothing but red.

Four monitoring mistakes we see constantly

The minimum self-hosted stack

If you want to roll your own, here's the shortest path that actually works:

Budget: 3–6 hours to ship, 30 minutes a week to keep running. Fine for one or two internal servers. Past that, the hosted option is cheaper once you count your time.

When hosted makes sense

AliveMCP already runs the probe against every public MCP server in MCP.so, Glama, PulseMCP, Smithery, the Official Registry, and the GitHub mcp topic — so if you're monitoring third-party MCPs, there's nothing to build. For your own servers, the Author tier ($9/mo) adds webhook + Slack alerts, 90-day response-time history, a public status badge, and a verified-author mark on the public dashboard. The Team tier ($49/mo) adds 10 private endpoints, per-environment status pages, and SSO — comparable to what Datadog charges a 100× premium for, minus the dashboards-within-dashboards.

Join the waitlist

Related questions

Should I monitor from one region or many?

For MCPs aimed at end users, probe from at least two geographically distant regions — the single-region blind spot is routing-layer issues. For internal MCPs used only by colocated agents, one region is fine.

What about monitoring resource and prompt surfaces, not just tools?

If your server advertises resources or prompts capabilities, extend the probe: resources/list and prompts/list go in the same rotation. The hash-the-surface discipline is identical — shrinking counts are the red flag.

Does MCP monitoring need agent-behavior simulation?

Rarely. A real tool-call synthetic (like exercising your search tool end-to-end) adds confidence but triples cost and complexity. Reserve it for one or two critical-path tools, not a blanket policy.

Further reading