Guide · Observability

MCP server metrics

Metrics are the cheapest form of observability: a few hundred bytes of counter and histogram data exported every 15 seconds can alert you to a rising error rate, a latency spike, or a resource saturation condition minutes before users notice. For MCP servers, the key metrics are: total tool call count (by tool name and outcome), call duration distribution, active session count, and circuit-breaker state. This guide walks through adding prom-client to an MCP server, exposing a /metrics endpoint, building a Grafana dashboard, and writing alert rules that wake you up when something is actually wrong — not just noisy.

TL;DR

Install prom-client. Create a Registry and four instruments: mcp_tool_calls_total (counter, labels: tool_name, status), mcp_tool_duration_seconds (histogram), mcp_active_sessions (gauge), mcp_circuit_breaker_open (gauge per dependency). Expose GET /metrics on a separate port so Prometheus can scrape it without going through your MCP transport. Build three Grafana panels: tool call rate, P99 latency by tool, error rate by tool. Set two alerts: error rate > 5% for 5 minutes, P99 latency > 2s for 5 minutes. Pair with AliveMCP external probes for failures that happen before a single metric is emitted.

The four golden signals for MCP servers

Google SRE's four golden signals (latency, traffic, errors, saturation) map directly to MCP server concerns:

Signal	MCP metric	What it catches
Traffic	`mcp_tool_calls_total`	Unexpected traffic spikes, zero traffic (silent failure), per-tool usage patterns
Latency	`mcp_tool_duration_seconds` P50/P99	Slow downstream APIs, cold start latency, degraded dependencies
Errors	`mcp_tool_calls_total{status="error"}`	Dependency failures, invalid parameters, open circuit breakers
Saturation	`mcp_active_sessions`, `mcp_bulkhead_running`	Connection pool exhaustion, approaching session limits, bulkhead saturation

prom-client setup

Create a singleton metrics registry in a dedicated module. Using the default global registry (collectDefaultMetrics()) is fine for a single-process server; if you run multiple servers in the same process, use a custom Registry to avoid name collisions:

// metrics.ts — prom-client setup
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client';

export const registry = new Registry();

// Node.js process metrics: event loop lag, heap size, GC pause, file descriptors
collectDefaultMetrics({ register: registry, prefix: 'nodejs_' });

export const toolCallsCounter = new Counter({
  name: 'mcp_tool_calls_total',
  help: 'Total number of MCP tool calls',
  labelNames: ['tool_name', 'status', 'transport'] as const,
  registers: [registry],
});

export const toolDurationHistogram = new Histogram({
  name: 'mcp_tool_duration_seconds',
  help: 'Duration of MCP tool calls in seconds',
  labelNames: ['tool_name', 'status'] as const,
  buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
  registers: [registry],
});

export const activeSessionsGauge = new Gauge({
  name: 'mcp_active_sessions',
  help: 'Number of currently active MCP sessions',
  registers: [registry],
});

export const circuitBreakerOpenGauge = new Gauge({
  name: 'mcp_circuit_breaker_open',
  help: '1 if the circuit breaker is open for a dependency, 0 if closed',
  labelNames: ['dependency'] as const,
  registers: [registry],
});

export const bulkheadRunningGauge = new Gauge({
  name: 'mcp_bulkhead_running',
  help: 'Number of concurrent calls running through each dependency bulkhead',
  labelNames: ['dependency'] as const,
  registers: [registry],
});

Wiring metrics into tool handlers

Increment the counter and record the histogram in every tool handler. The cleanest approach is a wrapper function so the metric logic never clutters the tool's business logic:

// instrumented-tool.ts — wrapper that adds metrics to any tool handler
import { toolCallsCounter, toolDurationHistogram } from './metrics';

type ToolHandler<P, R> = (params: P, context: unknown) => Promise<R>;

export function withMetrics<P, R>(
  toolName: string,
  handler: ToolHandler<P, R>,
): ToolHandler<P, R> {
  return async (params, context) => {
    const end = toolDurationHistogram.startTimer({ tool_name: toolName });
    try {
      const result = await handler(params, context);
      toolCallsCounter.inc({ tool_name: toolName, status: 'ok', transport: 'sse' });
      end({ status: 'ok' });
      return result;
    } catch (err) {
      toolCallsCounter.inc({ tool_name: toolName, status: 'error', transport: 'sse' });
      end({ status: 'error' });
      throw err;
    }
  };
}

// server.ts — use withMetrics when registering tools
import { withMetrics } from './instrumented-tool';

server.tool('search', searchSchema, withMetrics('search', async (params) => {
  // pure business logic, no metrics code here
  return deps.searchBulkhead.execute(() => callSearchApi(params.query, deps.searchAgent));
}));

server.tool('notify', notifySchema, withMetrics('notify', async (params) => {
  return deps.notificationService.send(params.message);
}));

Track session lifecycle separately from tool calls. Increment mcp_active_sessions when a session is established and decrement when it closes:

// Session tracking — increment/decrement active sessions gauge
import { activeSessionsGauge } from './metrics';

// On session open (in your transport's session handler)
activeSessionsGauge.inc();

// On session close (in your transport's cleanup handler)
activeSessionsGauge.dec();

Exposing the /metrics endpoint

The /metrics scrape endpoint should be on a separate port from your MCP transport. This prevents Prometheus's frequent scrape requests from appearing in your MCP latency metrics, and allows you to firewall the metrics port so it's only reachable from your monitoring infrastructure:

// metrics-server.ts — /metrics on a separate port
import http from 'http';
import { registry } from './metrics';

export function startMetricsServer(port: number): void {
  const server = http.createServer(async (req, res) => {
    if (req.url === '/metrics' && req.method === 'GET') {
      res.writeHead(200, { 'Content-Type': registry.contentType });
      res.end(await registry.metrics());
      return;
    }
    res.writeHead(404);
    res.end();
  });

  server.listen(port, () => {
    logger.info({ port }, 'metrics server listening');
  });
}

// config.ts — add METRICS_PORT
export interface Config {
  PORT: number;
  METRICS_PORT: number;
  // ...
}

// server.ts — start metrics server alongside MCP server
startMetricsServer(config.METRICS_PORT); // e.g., 9090

Configure Prometheus to scrape this endpoint:

# prometheus.yml — scrape config for MCP server
scrape_configs:
  - job_name: 'mcp-server'
    static_configs:
      - targets: ['mcp-server:9090']
    scrape_interval: 15s
    metrics_path: /metrics

Circuit-breaker and bulkhead metrics

Circuit-breaker state changes (CLOSED → OPEN → HALF_OPEN) are binary events that don't fit naturally into histograms. Export them as gauges updated on state-change events:

// wire circuit-breaker state into metrics in createDeps()
import { circuitBreakerOpenGauge, bulkheadRunningGauge } from './metrics';

export async function createDeps(config: Config): Promise<Deps> {
  const searchBreaker = new CircuitBreaker(searchFn, cbOptions);

  searchBreaker.on('open',     () => circuitBreakerOpenGauge.set({ dependency: 'search' }, 1));
  searchBreaker.on('close',    () => circuitBreakerOpenGauge.set({ dependency: 'search' }, 0));
  searchBreaker.on('halfOpen', () => circuitBreakerOpenGauge.set({ dependency: 'search' }, 0.5));

  // Export bulkhead running count on a schedule (gauges need refreshing)
  setInterval(() => {
    bulkheadRunningGauge.set({ dependency: 'search' }, deps.searchBulkhead.stats.running);
  }, 5_000);

  return { searchBreaker, /* ... */ };
}

A mcp_circuit_breaker_open{dependency="search"} == 1 alert fires immediately when the search API circuit opens — no waiting for error rates to accumulate in a histogram. See the circuit breaker guide for the full half-open probe and reset logic.

Grafana dashboard

Three panels cover the essentials for an MCP server Grafana dashboard:

Panel 1 — Tool call rate by outcome

sum by (tool_name, status) (
  rate(mcp_tool_calls_total[5m])
)

Panel 2 — P99 tool call latency by tool

histogram_quantile(0.99,
  sum by (tool_name, le) (
    rate(mcp_tool_duration_seconds_bucket[5m])
  )
)

Panel 3 — Error rate as a percentage

sum by (tool_name) (rate(mcp_tool_calls_total{status="error"}[5m]))
/
sum by (tool_name) (rate(mcp_tool_calls_total[5m]))
* 100

Add a fourth panel for mcp_active_sessions (stat panel, single value) and a fifth for mcp_circuit_breaker_open (stat panel per dependency, threshold colour red when == 1). These two panels give you an instant health overview at a glance.

Alert rules

Two alerts cover the most common failure modes:

# prometheus-alerts.yml — MCP server alert rules
groups:
  - name: mcp-server
    rules:

      - alert: MCPToolHighErrorRate
        expr: |
          sum by (tool_name) (rate(mcp_tool_calls_total{status="error"}[5m]))
          /
          sum by (tool_name) (rate(mcp_tool_calls_total[5m]))
          > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MCP tool {{ $labels.tool_name }} error rate above 5%"
          description: "{{ $value | humanizePercentage }} of {{ $labels.tool_name }} calls are failing."

      - alert: MCPToolHighLatency
        expr: |
          histogram_quantile(0.99,
            sum by (tool_name, le) (
              rate(mcp_tool_duration_seconds_bucket[5m])
            )
          ) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MCP tool {{ $labels.tool_name }} P99 latency above 2s"
          description: "P99 latency is {{ $value | humanizeDuration }}."

      - alert: MCPCircuitBreakerOpen
        expr: mcp_circuit_breaker_open == 1
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "MCP circuit breaker open for {{ $labels.dependency }}"

The for: 5m delay on error-rate and latency alerts suppresses false positives from brief transient spikes. The circuit-breaker alert fires after just 1 minute — a breaker that stays open for more than a minute indicates a real dependency outage, not a transient blip.

Metrics and AliveMCP together

Prometheus metrics are pull-based: Prometheus scrapes your /metrics endpoint every 15 seconds. If your server crashes between scrapes, Prometheus never receives the crash signal — it simply stops seeing data. A scrape gap of 15–30 seconds looks identical to a network partition in the Prometheus data model.

AliveMCP is push-based observability from the outside: it actively attempts a full MCP session — connect, initialize, list tools, call a tool — and reports the outcome regardless of whether the server's internal metrics pipeline is working. A server that crashes and restarts between Prometheus scrapes still gets caught by AliveMCP because the probe runs continuously, not on a 15-second pull cycle. Use Prometheus for capacity planning, latency profiling, and trend analysis; use AliveMCP for uptime alerting and the initial failure detection that wakes up on-call. See the observability overview for how all the layers fit together.