Reference · Monitoring stacks

Prometheus MCP monitoring

Prometheus is the right tool for measuring what's happening inside your MCP server: handler latency, tool call counts, error rates, memory pressure. It's the wrong tool for measuring whether your MCP server looks alive from the outside, because Prometheus scrapes from inside your infrastructure — it can't see what your users' agents see.

TL;DR

Use Prometheus for in-process observability (request rates, handler durations, memory, queues). Use AliveMCP for external protocol-level uptime (does the server respond to a real MCP handshake from outside your network?). The two are complementary — one watches the inside, the other watches the outside. Join the waitlist to add your private endpoints to AliveMCP alongside your existing Prometheus stack.

What Prometheus does well for MCP servers

If you're already running Prometheus in your infrastructure, instrumenting your MCP server is straightforward. Expose a /metrics endpoint with standard histogram and counter metrics:

With these, a Grafana dashboard gives you a rich real-time view of your server's internal behavior.

What Prometheus misses for MCP monitoring

External protocol verification

Prometheus scrapes your server's own /metrics endpoint, which only works if your server is up from inside your infrastructure. It cannot verify that the MCP endpoint itself is reachable and responding correctly from the internet — from where your users' agents are making calls. A misconfigured reverse proxy, an expired TLS certificate, or a firewall rule change can make your MCP endpoint unreachable from outside while your internal Prometheus scrape continues to succeed perfectly.

Protocol-level validation

Even with Prometheus blackbox-exporter, you'd write a TCP or HTTP probe — not an MCP protocol probe. The blackbox exporter can verify that your endpoint responds to an HTTP POST, but it can't verify that the JSON-RPC response contains a valid protocolVersion, that tools/list returns the expected schemas, or that the response envelope conforms to the MCP spec. Protocol conformance failures are invisible to HTTP-level probes.

Third-party MCPs you don't own

If your product depends on third-party MCP servers (from public registries or vendor MCPs), Prometheus has nothing to say about their health. You can't instrument a server you don't control. You need external probing for the dependency graph, and that probing has to speak the same protocol the servers use.

Cross-region visibility

Your Prometheus scraper runs from a fixed location inside your infrastructure. It doesn't see whether your MCP endpoint is accessible from EU, APAC, or us-west-2 independently. Regional failures — a CDN misconfiguration, a split-brain DNS entry, a region-specific TLS certificate — are invisible unless you run scrapers in each region, which adds complexity most teams skip.

Combining Prometheus and AliveMCP

The two tools cover different angles of the same question. A practical setup:

SignalSource
Handler latency (p50/p95/p99)Prometheus histogram
Tool call error ratePrometheus counter
Memory / CPU pressurePrometheus + node_exporter
External protocol livenessAliveMCP (60s probes)
Tool schema drift detectionAliveMCP (hash comparison)
Third-party dependency uptimeAliveMCP public dashboard
Public status page for usersAliveMCP status subdomain
90-day uptime historyAliveMCP probe archive

Run them together. They don't overlap; they cover each other's blind spots.

Alerting across both systems

The simplest routing: Prometheus alerts go to your team's primary engineering channel (they signal internal infrastructure behavior); AliveMCP alerts go to a customer-facing oncall channel (they signal what users actually experience). A memory-pressure alert from Prometheus is informational. A "3 consecutive probe failures" alert from AliveMCP means users are broken right now.

For small teams, both channels can be the same Slack channel — the important thing is to label them clearly so you can distinguish an internal performance regression from an external availability incident at a glance.

Getting started

If you already have Prometheus running, exposing MCP-specific metrics from your server is a 30-minute task — most Node.js and Python MCP server frameworks have a middleware hook that's the right integration point. AliveMCP requires nothing from your server: it probes your public endpoint from outside and works with any MCP implementation. Check if your server is already on the public dashboard, or join the waitlist to add private endpoints alongside your Prometheus stack. See pricing for the Team tier ($49/mo) which includes private endpoint monitoring and Slack alert integration.

Get early access

Related questions

Can I forward AliveMCP alert webhooks into my Alertmanager?

Yes — Author and Team tier webhooks send a standard JSON payload that you can ingest into Alertmanager via a webhook receiver. The payload includes the server slug, the probe result, the previous status, and the UTC timestamp. Full payload spec is in the Author tier documentation.

Does AliveMCP expose a Prometheus-compatible /metrics endpoint?

Not yet — that's on the Enterprise tier roadmap. For now, the API (Author tier) returns uptime data in JSON, which you can scrape with a custom exporter or a simple curl + textfile collector if you want to pull it into Grafana alongside your own metrics.

What about OpenTelemetry?

If you're running OTel instead of Prometheus, the same split applies: OTel traces and spans give you internal call-path visibility; AliveMCP gives you external protocol-level uptime. OTel doesn't have a concept of "does the endpoint look healthy from outside your infrastructure" — that's a fundamentally external check.

Further reading