Guide · Production Quality Engineering

Synthetic Monitoring for MCP Servers — external probes, canary checks, and uptime detection

Synthetic monitoring sends scripted probes from outside your system on a schedule, recording whether the server responds correctly and how long it takes. For MCP servers, this means automating the same protocol handshake an AI agent performs — initialize, tools/list, optionally a specific tools/call — and alerting when any step fails or slows. It is the only monitoring approach that tells you the server is broken before a real user agent tries to use it. This guide covers how to design synthetic probes for MCP, how to extend them to application-layer canary checks, when to run them from multiple regions, and how AliveMCP automates the full process.

TL;DR

A minimal MCP synthetic probe: connect via SSE or stdio, send initialize, verify the server returns a valid capabilities response, send tools/list, verify the expected tools are present. Run this every 60 seconds from an external vantage point. Extend with a canary tool call — a known input with a known correct output — to detect application-layer failures that the protocol probe misses. AliveMCP automates this entire probe cycle, tracks P95 response time, exposes failure_reason on every incident, and alerts on Slack, PagerDuty, or OpsGenie without you writing a single line of probe code.

Why synthetic monitoring is the right baseline for MCP servers

MCP server observability has three layers. Log-based monitoring reads what the server wrote after something happened. APM tracing instruments the code to capture spans as requests flow through. Synthetic monitoring probes the server from the outside as if it were a client, catching failures that the other two layers miss entirely.

Log-based monitoring has a fundamental gap: if the server process crashes, it stops writing logs. You discover the outage only when someone checks the log aggregator or a user complains. APM tracing has a related gap: it captures requests that reach the instrumented code, but misses failures at the network or protocol layer — a firewall rule that blocks new connections, a TLS certificate that expired, a process that is running but not accepting connections.

Synthetic monitoring catches all of these because it approaches the server as a client would. If a connection cannot be established, the probe records a connection_refused failure. If the TLS handshake fails, the probe records a tls_error. If the server accepts the connection but the MCP initialize handshake stalls, the probe records a timeout. None of this requires a log line or an instrumented span — it's observable from outside.

For MCP specifically, the protocol handshake is a reliable liveness signal. An MCP server that completes initialize and responds to tools/list has successfully loaded its tools, resolved dependencies, and opened necessary connections. A server in a degraded state (process running but stuck in initialization, one dependency unavailable, tool schema validation failing) will fail the probe even though the process is alive and the health check endpoint returns 200.

The MCP synthetic probe protocol

A synthetic MCP probe consists of three sequential steps. Each step has a maximum allowed duration; failure or timeout at any step terminates the probe and records the failure at that layer.

Step 1: Transport connection

For HTTP/SSE transport, establish a TCP connection to the server's address and port. For stdio transport, spawn the server process and attach to its stdin/stdout. Record the connection establishment time. If the connection is refused or times out, record failure_reason: connection_refused and terminate.

Step 2: Protocol handshake (initialize)

Send an initialize request with the probe client's capabilities. A minimal initialize body:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "capabilities": {},
    "clientInfo": {
      "name": "synthetic-probe",
      "version": "1.0.0"
    }
  }
}

Verify the response includes a valid protocolVersion and a capabilities object. If the response is malformed, record failure_reason: protocol_error. If the response takes longer than the timeout threshold (typically 5–10 seconds), record failure_reason: timeout.

Step 3: Tools enumeration (tools/list)

Send a tools/list request and verify the response contains the tools the server is expected to expose. For a production probe, maintain a tool manifest — a list of tool names the server must provide — and alert if any expected tool is absent (schema drift, deployment regression) or if an unexpected tool appears (unauthorized deployment).

// Node.js synthetic probe using the MCP SDK
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';

async function probeMcpServer(serverUrl, expectedTools, timeout = 8000) {
  const start = Date.now();

  const transport = new SSEClientTransport(new URL(serverUrl));
  const client = new Client({ name: 'synthetic-probe', version: '1.0' }, {});

  const connectTimeout = new Promise((_, reject) =>
    setTimeout(() => reject(new Error('connect_timeout')), timeout)
  );

  await Promise.race([client.connect(transport), connectTimeout]);
  const connectMs = Date.now() - start;

  const toolsResponse = await client.listTools();
  const toolsMs = Date.now() - start;

  const presentTools = new Set(toolsResponse.tools.map(t => t.name));
  const missingTools = expectedTools.filter(name => !presentTools.has(name));

  await client.close();

  return {
    ok: missingTools.length === 0,
    connect_ms: connectMs,
    total_ms: toolsMs,
    missing_tools: missingTools,
    tool_count: toolsResponse.tools.length,
  };
}

Run this probe on a cron schedule from an external host. The key constraint is "external" — a probe running on the same machine as the server cannot detect host-level failures (network partition, load balancer misconfiguration, port exhaustion at the host level). Run it from a separate machine, a separate VPC, or a monitoring service.

Extending to application-layer canary checks

The protocol probe tells you the server is reachable and structurally sound. It does not tell you the server is producing correct results. A server that always returns {"results": []} from a search tool, or a server whose database connection is silently returning stale data, passes the protocol probe while delivering degraded quality to users.

A canary tool call extends the probe to the application layer: call a specific tool with a known input and verify the output meets minimum correctness criteria.

// Canary check: verify a known tool produces a non-empty result
async function canaryCheck(client, toolName, input, verifyFn) {
  const start = Date.now();

  const result = await client.callTool({ name: toolName, arguments: input });
  const latencyMs = Date.now() - start;

  let parsed;
  try {
    parsed = JSON.parse(result.content[0].text);
  } catch {
    return { ok: false, reason: 'unparseable_response', latency_ms: latencyMs };
  }

  const verification = verifyFn(parsed);
  return {
    ok: verification.ok,
    reason: verification.ok ? null : verification.reason,
    latency_ms: latencyMs,
    tool: toolName,
  };
}

// Example: search tool must return at least one result for a known query
const canaryResult = await canaryCheck(
  client,
  'search_documents',
  { query: 'MCP server health check canary' },
  (parsed) => ({
    ok: parsed.total_results > 0,
    reason: parsed.total_results === 0 ? 'empty_results_for_known_query' : null,
  })
);

Design canary inputs carefully. The input should be stable (always produces the same class of output) and lightweight (not expensive to execute). For a search tool, use a query for a document you control and keep in the index permanently. For a database tool, query a sentinel row you write at deploy time. For an external API tool, use a deterministic API endpoint that always returns a known response.

Probe frequency and alert thresholds

Probe frequency determines your detection latency — the gap between when a failure occurs and when you know about it. The right frequency depends on your availability target and the tolerance for downtime.

Target availability	Allowed downtime / month	Recommended probe interval	Detection latency
99.9% (three nines)	~43 minutes	60 seconds	≤ 60s
99.5%	~3.6 hours	5 minutes	≤ 5m
99.0%	~7.2 hours	10 minutes	≤ 10m
Casual / dev server	Not tracked	30 minutes	≤ 30m

Alert on consecutive failures, not single failures. A single probe miss can be caused by transient network congestion between the probe host and the server, not a real outage. Two consecutive failures at 60-second intervals (2 minutes of unresponsiveness) is a reliable signal that warrants an alert. Configure your alerting threshold to match your tolerance: alert after N consecutive failures where N = 2 for production, N = 3–5 for less critical servers.

Track P95 response time across probe runs. Latency degradation often precedes complete failure: a server whose connection pool is saturating shows rising P95 before it starts timing out entirely. Alert on P95 exceeding 2× the baseline measured during a healthy window, in addition to alerting on outright failures.

Multi-region synthetic probing

A single-region probe has a blind spot: network issues between the probe host and the server may appear as server failures when the server is actually healthy. Conversely, a regional routing failure (DNS misconfiguration, CDN edge outage, regional load balancer failure) affects users in that region but not the probe if it runs from a different region.

Multi-region probing runs the same probe from two or more geographic vantage points and correlates results. The failure classification logic:

Probe A result	Probe B result	Classification	Alert priority
Fail	Fail	Global outage — server unreachable everywhere	P1 — page immediately
Fail	Pass	Regional routing issue — affects users in region A	P2 — investigate routing/CDN
Pass	Fail	Regional routing issue — affects users in region B	P2 — investigate routing/CDN
Pass (slow)	Pass (fast)	Regional latency degradation	P3 — investigate inter-region routing

For a self-hosted multi-region setup, run the probe script on small VMs in two separate cloud regions (e.g., us-east-1 and eu-west-1) and send results to a shared aggregator. This costs roughly $5–10/month in compute. For managed multi-region probing, AliveMCP handles regional probe distribution automatically.

How AliveMCP implements synthetic monitoring

AliveMCP is purpose-built synthetic monitoring for MCP servers. It executes the full three-step probe (transport connection → initialize handshake → tools/list verification) from external infrastructure every 60 seconds. You configure your server URL once; AliveMCP handles the probe loop, failure detection, and alerting.

Every probe result is recorded with a structured failure_reason field that maps directly to the failure layer:

failure_reason	What it means	First investigation step
`connection_refused`	TCP connection rejected — process not listening on expected port	Check if process is running; check if port binding changed
`timeout`	Connection or handshake exceeded threshold	Check CPU/memory utilization; check for connection pool saturation
`protocol_error`	Server connected but returned malformed MCP response	Check recent deployments; check for partial startup (tools not loaded yet)
`tls_error`	TLS certificate invalid, expired, or hostname mismatch	Check certificate expiry; check reverse proxy TLS configuration
`schema_drift`	tools/list returned different tools than the last stable baseline	Check deployment for tool additions/removals; verify intentional vs accidental change

For application-layer monitoring beyond the protocol probe, configure a custom health check URL in AliveMCP. AliveMCP polls this URL alongside the protocol probe and alerts if it returns a non-2xx status. Use this endpoint to expose your canary tool call result:

// Express health endpoint combining protocol health with canary check
app.get('/health', async (req, res) => {
  const checks = {};

  // Database / store connectivity
  try {
    await db.query('SELECT 1');
    checks.database = 'ok';
  } catch (err) {
    checks.database = 'unreachable';
    return res.status(503).json({ status: 'unhealthy', checks, reason: 'database_unreachable' });
  }

  // Canary tool check (if applicable)
  try {
    const result = await runCanaryToolCall();
    checks.canary = result.ok ? 'ok' : 'degraded';
    if (!result.ok) {
      return res.status(503).json({ status: 'degraded', checks, reason: result.reason });
    }
  } catch (err) {
    checks.canary = 'error';
    return res.status(503).json({ status: 'unhealthy', checks, reason: err.message });
  }

  res.json({ status: 'ok', checks });
});

Point AliveMCP's custom health check URL at https://your-server.com/health. AliveMCP treats any non-2xx response as a failure and reports it alongside the protocol probe result, giving you both layers in a single alert.

Building a self-hosted synthetic monitoring stack

If you prefer to run synthetic monitoring yourself rather than using a managed service, here is a minimal production-grade stack:

// probe-runner.ts — runs on a cron schedule from an external host
import cron from 'node-cron';
import { probeMcpServer } from './probe.js';
import { alertOn } from './alerting.js';

const servers = [
  {
    name: 'production-mcp',
    url: 'https://mcp.yourapp.com/sse',
    expectedTools: ['search_documents', 'get_user', 'create_ticket'],
    timeoutMs: 8000,
  },
];

let consecutiveFailures: Record = {};
const ALERT_THRESHOLD = 2;

cron.schedule('* * * * *', async () => {  // every minute
  for (const server of servers) {
    const result = await probeMcpServer(server.url, server.expectedTools, server.timeoutMs)
      .catch(err => ({ ok: false, reason: err.message }));

    if (!result.ok) {
      consecutiveFailures[server.name] = (consecutiveFailures[server.name] || 0) + 1;
      if (consecutiveFailures[server.name] === ALERT_THRESHOLD) {
        await alertOn({
          server: server.name,
          reason: result.reason || 'probe_failed',
          consecutive: consecutiveFailures[server.name],
        });
      }
    } else {
      if (consecutiveFailures[server.name] >= ALERT_THRESHOLD) {
        await alertOn({ server: server.name, type: 'resolved', consecutive: 0 });
      }
      consecutiveFailures[server.name] = 0;
    }

    // Record to time-series store for P95 tracking
    await recordProbeResult(server.name, result);
  }
});

The infrastructure cost for self-hosted synthetic monitoring is low — a $5/month VPS running Node.js is sufficient for monitoring up to 50 servers at 60-second intervals. The operational cost (maintaining the probe logic, keeping the probe runner healthy, building alerting integrations) is where managed services like AliveMCP provide leverage. Self-host if you have specialized probing requirements (private VPC servers, custom tool call canaries, proprietary alerting integrations); use a managed service for everything standard.

Frequently asked questions

What is the difference between synthetic monitoring and real user monitoring (RUM)?

Synthetic monitoring sends scripted probes on a schedule, independent of real traffic. Real user monitoring instruments actual user sessions to capture their experience. For MCP servers, RUM is not practical: the "users" are AI agents that don't run JavaScript in a browser and don't emit the telemetry that RUM SDKs collect. Synthetic monitoring is the correct baseline for MCP uptime detection. APM tracing on the server side complements it by capturing request-level details once you know a failure is happening.

How do I monitor an MCP server that is not publicly reachable (behind a VPN or firewall)?

Private MCP servers cannot be probed from external infrastructure. The two options are: (1) run a synthetic probe inside the same network (a small VM or container in the same VPC on a cron schedule) and push probe results to an external monitoring endpoint; or (2) use a probe agent that establishes an outbound tunnel to a monitoring service, avoiding the need to open inbound ports. AliveMCP's private agent approach works for networks that allow outbound HTTPS but block inbound connections. The probe logic is identical — the only difference is the network path the probe takes to reach the server.

Should I alert on the first probe failure or wait for consecutive failures?

Wait for two consecutive failures before alerting for production MCP servers. Single probe failures are frequently caused by transient network issues between the probe infrastructure and your server — a router hiccup, a brief TCP connection queue backup — that resolve within 30 seconds. Alerting on every single failure produces alert fatigue, training you to ignore alerts, which is worse than missing a brief outage. Alert on one consecutive failure (the very first failure) only for critical infrastructure where any gap in availability is unacceptable and you've verified your probe network path is extremely reliable.

How is synthetic monitoring different from a Kubernetes liveness probe?

A Kubernetes liveness probe runs inside the same cluster as your server and typically hits an HTTP health endpoint like /health. It detects when the process should be restarted but cannot detect network-layer failures outside the cluster (load balancer misconfiguration, DNS failure, ingress controller outage). Synthetic monitoring from an external vantage point detects the full failure path — including every component between a remote client and your server. Run both: the liveness probe handles container-level restart automation; external synthetic monitoring handles customer-facing availability detection.

How many canary tool calls should I run in each probe cycle?

One per probe cycle is sufficient for most MCP servers. A single canary call with a known-good input tests that the core tool execution path is working end-to-end. Adding more canary calls increases probe duration (adding latency to each cycle) and the chance of the probe itself causing load on a degraded server. If you have multiple independent tool types (search tools, write tools, external-API tools), consider a single "health probe" tool that internally checks all of them and returns a structured health status, then call that one tool from the probe. This keeps the probe surface minimal while giving you application-layer coverage across all tool types.