Guide · Monitoring
Monitoring an MCP server
Monitoring an MCP server means answering four questions continuously: is the protocol handshake succeeding, is the tool surface stable, are latencies inside your envelope, and are callers getting clean responses? Everything else is cosmetics.
TL;DR
The four signals worth monitoring on an MCP server: handshake success rate, tool-surface stability (count + schema hash), p95 response latency, and client-facing error rate. Anything you measure beyond those is usually noise. You can self-host a probe in under an hour with a cron and curl, or let AliveMCP do it across every public MCP server for free.
Why MCP monitoring is different from API monitoring
A conventional REST API has one natural health signal — an HTTP 2xx on a known route — and monitoring tools can piggyback on that. MCP doesn't. An MCP server over HTTP is a JSON-RPC endpoint where every call is a POST to the same URL, the method is in the body, and the meaning of "healthy" depends on which capabilities the server advertised. You can't monitor an MCP server by GETing its root and checking for 200. You have to speak the protocol.
The consequence: dropping a tool like UptimeRobot or generic Pingdom on an MCP endpoint tells you the HTTPS server is up. It tells you nothing about whether agents talking to it can initialize, list tools, or get valid responses. In our April-2026 audit of 2,181 public MCP endpoints, 91% failed at the protocol or tool layer while passing a generic HTTP probe.
The four signals
- Handshake success rate. Every probe runs an
initializerequest. Record success / failure. Alert when the rolling 5-minute success rate drops below 99% — most auth or deploy-breakage failures show up here first. - Tool-surface stability. Call
tools/liston every probe. Hash the sorted list of(tool_name, input_schema). Alert on a shrinking count (a registration crashed) and on unexpected hash changes outside release windows (a deploy broke a contract). - p95 response latency. Track two latencies: time-to-first-byte on
initialize, and total round-trip ontools/list. Baseline the 7-day rolling p95; alert on 3× deviation sustained over 3+ consecutive probes. - Client-facing error rate. If you have access to the server's own logs (self-hosted), count the JSON-RPC error responses it emits per minute. A server that probes clean but is emitting errors to real clients is the worst failure mode — your dashboard says green while users see nothing but red.
Four monitoring mistakes we see constantly
- Only probing HTTP. Your dashboard says up, your users say down, your Slack is quiet. Root cause: your monitor never spoke JSON-RPC.
- Probing too fast. Sub-10-second probes from a single IP trigger rate limiters, get you IP-banned by Cloudflare, and poison your own latency metrics. 60 seconds is the sweet spot.
- No schema baseline. You catch outages but not drift. An MCP server that quietly renames
search→findlooks identical on every other metric; only schema hashing surfaces it. - One alert tier for everything. A 200ms latency spike and a full
initializefailure should not page the same person the same way. Tiered alerts (critical → Slack → daily digest) are the difference between a useful signal and a team that ignores its monitoring.
The minimum self-hosted stack
If you want to roll your own, here's the shortest path that actually works:
- Probe: a shell script or Node program that runs the five-gate sequence from the liveness check guide and writes one row per probe to SQLite or Postgres.
- Schedule: cron every 60 seconds (
* * * * *with a wrapper that staggers if you have multiple servers). - Storage: SQLite is fine up to a few hundred servers at 60s cadence.
- Alerting: at the end of each probe, check the last 3 rows for that server; if 2+ failed, POST to a Slack webhook.
- Dashboard: a 30-line static HTML page that reads the latest row per server. Don't bother with Grafana for < 50 servers.
Budget: 3–6 hours to ship, 30 minutes a week to keep running. Fine for one or two internal servers. Past that, the hosted option is cheaper once you count your time.
When hosted makes sense
AliveMCP already runs the probe against every public MCP server in MCP.so, Glama, PulseMCP, Smithery, the Official Registry, and the GitHub mcp topic — so if you're monitoring third-party MCPs, there's nothing to build. For your own servers, the Author tier ($9/mo) adds webhook + Slack alerts, 90-day response-time history, a public status badge, and a verified-author mark on the public dashboard. The Team tier ($49/mo) adds 10 private endpoints, per-environment status pages, and SSO — comparable to what Datadog charges a 100× premium for, minus the dashboards-within-dashboards.
Related questions
Should I monitor from one region or many?
For MCPs aimed at end users, probe from at least two geographically distant regions — the single-region blind spot is routing-layer issues. For internal MCPs used only by colocated agents, one region is fine.
What about monitoring resource and prompt surfaces, not just tools?
If your server advertises resources or prompts capabilities, extend the probe: resources/list and prompts/list go in the same rotation. The hash-the-surface discipline is identical — shrinking counts are the red flag.
Does MCP monitoring need agent-behavior simulation?
Rarely. A real tool-call synthetic (like exercising your search tool end-to-end) adds confidence but triples cost and complexity. Reserve it for one or two critical-path tools, not a blanket policy.