Guide · Production Quality Engineering
Four Golden Signals for MCP Servers — latency, traffic, errors, and saturation
Google's Site Reliability Engineering book introduced the four golden signals as the minimum instrumentation any service needs: latency, traffic, errors, and saturation. Every other metric is a derivative. For MCP servers, these four signals map directly to the tool call lifecycle — but each has MCP-specific nuances that make naive application of the framework miss the most important failure modes. This guide translates the four golden signals to the MCP server context, defines what to measure for each, identifies the alert thresholds that matter, and shows how AliveMCP covers the latency and error signals automatically, complementing your server-side instrumentation for traffic and saturation.
TL;DR
Map the four golden signals to MCP: Latency = tool call duration P50/P95 per tool, measured at the protocol layer and inside the server. Traffic = tool calls per minute per tool, tracked server-side. Errors = protocol-layer failures (connection_refused, timeout, protocol_error) + application-layer errors (tool handler exceptions, schema validation failures). Saturation = connection pool utilization, memory headroom, and CPU headroom. AliveMCP covers latency (P95 tracking on every probe) and protocol errors (failure_reason on every incident) automatically. Wire your own instrumentation for traffic and saturation.
Why the four golden signals work for MCP servers
The four golden signals are powerful because they are causally complete: any user-visible degradation of an MCP server manifests in at least one of the four signals. A server that is slow shows in latency. A server under heavy load shows in traffic before showing in saturation and then in latency. A server with a broken tool shows in errors. A server about to run out of connection pool capacity shows in saturation before the requests start failing.
If your monitoring covers all four signals with appropriate thresholds, you will detect degradation before users do. If you skip any signal, there is a class of failure your monitoring will miss. This is the framework's guarantee — not that monitoring is easy, but that the four signals are collectively necessary and sufficient for baseline observability.
| Signal | MCP server mapping | Where to measure | AliveMCP covers? |
|---|---|---|---|
| Latency | Tool call duration (P50, P95, P99 per tool) | Protocol layer (external) + server middleware (internal) | Yes — P95 on every probe |
| Traffic | Tool calls per minute per tool; active sessions | Server middleware; session lifecycle hooks | No — measure server-side |
| Errors | Protocol failures + tool handler exceptions + schema errors | Protocol layer (external) + server error handlers | Partially — protocol errors only |
| Saturation | Connection pool usage; memory headroom; CPU headroom | Server metrics endpoint; process monitoring | No — measure server-side |
Signal 1: Latency
Latency for an MCP server has two measurement points: external (from the client's perspective, including network) and internal (inside the server, excluding network). Track both.
External latency: what clients experience
AliveMCP measures external latency on every 60-second probe cycle: the time from initiating the TCP connection through completing the tools/list response. This is the latency an AI agent experiences for the protocol overhead before any tool call. It catches network degradation, load balancer overhead, TLS renegotiation cost, and server initialization time.
Internal latency: per-tool handler duration
Measure the duration of each tool handler execution separately from protocol overhead. A tool that averages 50ms internally but shows 800ms externally has 750ms of overhead to investigate (network round trip, serialization, connection pooling). A tool that averages 800ms internally is the slow code path to optimize.
// latency-middleware.ts — wraps all tool calls with timing and per-tool metrics
function createLatencyMiddleware(metrics: MetricsCollector) {
return async function latencyMiddleware(toolName: string, args: unknown, next: NextFn) {
const start = Date.now();
let success = true;
try {
const result = await next(toolName, args);
return result;
} catch (err) {
success = false;
throw err;
} finally {
const durationMs = Date.now() - start;
// Record per-tool latency histogram
metrics.histogram('mcp_tool_duration_ms', durationMs, {
tool: toolName,
success: String(success),
});
// Alert threshold: individual tool calls over 5s are always logged
if (durationMs > 5000) {
console.warn(`[SLOW_TOOL] ${toolName}: ${durationMs}ms`, { args_keys: Object.keys(args as object) });
}
}
};
}
Latency percentiles: P50 vs P95 vs P99
Alert on P95, not P50. The median (P50) hides the tail experience that a significant fraction of users encounter. P99 is too noisy for alerting — it reflects extreme outliers that are often caused by scheduled tasks, garbage collection pauses, or network transients, not real degradation. P95 represents "what does the 95th-percentile user experience?" — a useful signal that 1 in 20 requests is slow, which is usually worth investigating.
| Percentile | Use for | Alert threshold |
|---|---|---|
| P50 (median) | Baseline health indicator; performance dashboards | Track; don't alert on |
| P95 | User experience quality; regression detection | Alert at 2× baseline, sustained 5+ minutes |
| P99 | Outlier investigation; capacity planning | Track; alert only at extreme values (>30s) |
Signal 2: Traffic
Traffic for an MCP server means two things: the number of active sessions and the number of tool calls per minute, broken down by tool name. Both matter for capacity planning and for understanding what's happening during an incident.
Session traffic
Each connected MCP client represents one session. Track the current active session count, the session creation rate (new sessions per minute), and session duration distribution. A sudden drop in new sessions may indicate the server is refusing connections. A sudden spike may indicate a runaway agent loop or an upstream service sending retries.
// session-metrics.ts — track active sessions and session lifecycle
class SessionMetrics {
private activeSessions = new Set();
private sessionCreateCount = 0;
private sessionEndCount = 0;
onSessionStart(sessionId: string) {
this.activeSessions.add(sessionId);
this.sessionCreateCount++;
metrics.gauge('mcp_active_sessions', this.activeSessions.size);
metrics.counter('mcp_sessions_total', 1, { event: 'start' });
}
onSessionEnd(sessionId: string, durationMs: number) {
this.activeSessions.delete(sessionId);
this.sessionEndCount++;
metrics.gauge('mcp_active_sessions', this.activeSessions.size);
metrics.histogram('mcp_session_duration_ms', durationMs);
metrics.counter('mcp_sessions_total', 1, { event: 'end' });
}
}
Tool call traffic
Track calls per minute per tool. This reveals which tools are hot (most frequently called), which are cold (rarely used, potential candidates for deprecation), and whether traffic distribution changes before an incident (an external system may start hammering one tool before the server saturates).
// Tool call rate tracking
metrics.counter('mcp_tool_calls_total', 1, {
tool: toolName,
success: String(!error),
});
Traffic as a leading indicator
Traffic spikes often precede saturation and latency degradation. A well-instrumented MCP server shows this sequence: traffic rate doubles → connection pool utilization rises → P95 latency increases → timeouts begin. Alerting on traffic spikes (e.g., >3× rolling average for 2 minutes) gives you an earlier warning than alerting on latency alone.
Signal 3: Errors
MCP server errors occur at two distinct layers. Tracking only one layer misses entire failure classes.
Protocol-layer errors
Protocol errors are observable from outside the server, without any server-side instrumentation. AliveMCP captures these automatically on every probe:
connection_refused— TCP connection rejected. Server not listening or port blocked.timeout— Connection or handshake exceeded threshold. Server alive but not responding.protocol_error— Connected but returned malformed MCP response. Partial startup, version mismatch.tls_error— TLS certificate invalid, expired, or hostname mismatch.schema_drift— tools/list returned unexpected tool set. Unintended deployment or tool removal.
Application-layer errors
Application errors occur inside the server and require server-side instrumentation to capture. They arrive as valid MCP responses with an error payload — protocol probe passes, but the tool call failed.
// error-tracking-middleware.ts — track tool errors with structured context
function createErrorTrackingMiddleware(metrics: MetricsCollector, logger: Logger) {
return async function errorMiddleware(toolName: string, args: unknown, next: NextFn) {
try {
return await next(toolName, args);
} catch (err: unknown) {
const error = err as Error;
// Classify error type
const errorType = classifyError(error);
metrics.counter('mcp_tool_errors_total', 1, {
tool: toolName,
error_type: errorType, // 'validation', 'database', 'external_api', 'unknown'
});
logger.error('tool_error', {
tool: toolName,
error_type: errorType,
message: error.message,
// Do NOT log args in full — may contain sensitive data
args_keys: Object.keys(args as object),
});
throw err; // Re-throw so the MCP SDK returns a proper error response
}
};
}
function classifyError(error: Error): string {
if (error.message.includes('validation')) return 'validation';
if (error.message.includes('ECONNREFUSED') || error.message.includes('pool')) return 'database';
if (error.message.includes('ETIMEDOUT') || error.message.includes('fetch')) return 'external_api';
return 'unknown';
}
Error rate alerting
Alert on error rate (errors per minute) rather than error count. A server processing 1000 tool calls per minute with 10 errors has a 1% error rate — acceptable. A server processing 20 calls per minute with 10 errors has a 50% error rate — critical. Alert when the error rate exceeds 1% for a sustained period, or when any error type that was previously at zero (like database) starts occurring.
Signal 4: Saturation
Saturation measures how "full" your server's critical resources are. For MCP servers, the three resources that saturate before the server fails are: the database connection pool, memory, and CPU.
Connection pool saturation
The database connection pool is the resource that saturates first under load for most MCP servers. When the pool is full, new tool calls queue and wait for an available connection — this is when P95 latency starts rising before errors occur.
// /metrics endpoint — expose saturation metrics for Prometheus or custom dashboards
app.get('/metrics', async (req, res) => {
const poolStats = db.pool.stats(); // varies by ORM/driver
const memUsage = process.memoryUsage();
res.json({
// Connection pool
pool_total: poolStats.total,
pool_idle: poolStats.idle,
pool_waiting: poolStats.waiting,
pool_utilization: (poolStats.total - poolStats.idle) / poolStats.total,
// Memory
heap_used_mb: Math.round(memUsage.heapUsed / 1024 / 1024),
heap_total_mb: Math.round(memUsage.heapTotal / 1024 / 1024),
heap_utilization: memUsage.heapUsed / memUsage.heapTotal,
rss_mb: Math.round(memUsage.rss / 1024 / 1024),
// Active sessions (traffic × saturation crossover metric)
active_sessions: sessionMetrics.activeCount,
at: new Date().toISOString(),
});
});
Saturation thresholds
| Resource | Warning threshold | Critical threshold | What happens when exceeded |
|---|---|---|---|
| Connection pool utilization | > 70% for 2+ minutes | > 90% for 1+ minute | New tool calls queue → P95 rises → timeouts |
| Heap utilization (Node.js) | > 75% | > 90% | GC pressure → latency spikes → OOM crash |
| RSS memory growth | 10% increase per hour | Exceeds system limit | Memory leak → eventual OOM kill |
| CPU utilization | > 70% for 5+ minutes | > 90% for 2+ minutes | Event loop lag → latency rise → queue depth grows |
Saturation as a leading indicator
Saturation metrics predict future failures. A connection pool at 80% utilization will reach 100% if traffic increases by 25%. This gives you time to scale horizontally, reduce load, or increase pool size before the failure occurs. Review saturation trends weekly — a pool that runs at 40% normally but reaches 70% on Mondays identifies a capacity ceiling you can address proactively.
Combining the four signals into an alert strategy
The four golden signals have a natural causal order under load: traffic increases → saturation rises → latency degrades → errors appear. Monitoring all four gives you alerts at each stage of the degradation cascade, progressively escalating:
// alert-rules.yaml (for Prometheus alertmanager or equivalent)
groups:
- name: mcp-server-golden-signals
rules:
# Latency — P95 2x baseline sustained 5 minutes
- alert: MCPHighLatencyP95
expr: histogram_quantile(0.95, mcp_tool_duration_ms_bucket) > 1000
for: 5m
severity: warning
annotations:
summary: "MCP server P95 latency above 1000ms"
# Traffic — 3x spike sustained 2 minutes
- alert: MCPTrafficSpike
expr: rate(mcp_tool_calls_total[2m]) > 3 * rate(mcp_tool_calls_total[1h] offset 1d)
for: 2m
severity: warning
annotations:
summary: "MCP tool call rate is 3x normal"
# Errors — error rate above 1% for 2 minutes
- alert: MCPHighErrorRate
expr: rate(mcp_tool_errors_total[2m]) / rate(mcp_tool_calls_total[2m]) > 0.01
for: 2m
severity: critical
annotations:
summary: "MCP tool error rate above 1%"
# Saturation — connection pool above 80%
- alert: MCPConnectionPoolSaturating
expr: (pool_total - pool_idle) / pool_total > 0.8
for: 2m
severity: warning
annotations:
summary: "MCP database connection pool above 80% utilization"
The MCP-specific addition to the four golden signals is the protocol availability signal that AliveMCP covers: can a client even connect to the server? This is a prerequisite to all four signals — you can't have latency, traffic, errors, or saturation data from a server that refuses connections. Run AliveMCP alongside your server-side instrumentation; together they cover all five dimensions.
Frequently asked questions
Do I need all four golden signals from day one?
No. Start with errors and latency, which are the most impactful and easiest to instrument. Error tracking requires adding a middleware wrapper to your tool handlers. Latency tracking requires the same middleware plus a histogram. AliveMCP covers both from the external perspective without any code changes. Add traffic metrics when you need to understand load patterns, and saturation metrics when you approach capacity limits. The four signals are a completeness checklist, not a day-one requirement — a server with error and latency monitoring is dramatically better observed than one with nothing.
How does the golden signals framework apply to stdio transport MCP servers?
For stdio transport servers (one server process per client), traffic means the number of concurrent server processes rather than sessions on a shared server. Latency, errors, and saturation apply at the process level. The key difference: stdio servers don't share a connection pool (each process has its own), so connection pool saturation manifests as host-level resource saturation — too many concurrent processes competing for memory and CPU. Monitor at the process orchestration layer (number of live child processes, aggregate memory usage across all processes) rather than at the connection pool level within a single process.
Which metric is most important to alert on for an MCP server?
Errors, specifically protocol-layer errors. A server that isn't reachable provides zero value to any AI agent that depends on it. Protocol errors (connection_refused, timeout, protocol_error) are the highest-priority signal because they indicate complete unavailability — not degraded performance, but zero capability. AliveMCP's failure_reason field captures this. After protocol-layer error coverage, prioritize latency P95 (most tool calls that are succeeding are slow) and then saturation (resource limits approaching). Traffic monitoring is useful for capacity planning but rarely the signal that triggers an emergency response.
How should I handle the "missing" signals that AliveMCP doesn't cover (traffic and saturation)?
The simplest approach: expose a /metrics endpoint from your MCP server that returns JSON with pool_utilization, active_sessions, and heap_utilization. Then scrape it with a simple cron job that pushes to a time-series database (InfluxDB, Prometheus, or even a CSV file for a simple server). You don't need a full Prometheus deployment — a 20-line script that fetches /metrics every minute and appends to a file gives you enough history to spot saturation trends. For production servers with real load, invest in Prometheus + Grafana; for small servers, a simple metrics endpoint with periodic logging is sufficient.
What is the difference between the golden signals and RED (Rate, Errors, Duration)?
RED is a simplified version of the golden signals optimized for services where saturation is handled at the infrastructure level (e.g., Kubernetes automatically scales pods when CPU is high). RED collapses "traffic" and "saturation" into "Rate" and omits explicit saturation monitoring. For MCP servers where you control the server process directly, the full four golden signals framework is more appropriate because saturation (especially connection pool utilization) is your responsibility to monitor and act on — it doesn't automatically self-heal. Use RED as a quick-start framework and add saturation monitoring once you've hit a resource limit in production.
Further reading
- Synthetic Monitoring for MCP Servers — external probes complement server-side metrics
- MCP Server Regression Testing — P95 baselines and performance regression detection
- MCP Server Observability — metrics, tracing, and structured logging
- MCP Server Alerting — thresholds, routing, and on-call integration
- MCP Server SLOs — defining and measuring service level objectives
- MCP Server Connection Pooling — pool sizing and saturation management