Guide · Observability

MCP server tracing

Distributed tracing gives you a causally-linked timeline across every component involved in a single operation. For MCP servers, that timeline runs from the AI agent that initiated the session through the MCP protocol layers (transport, initialize, tools/list, tool call) into every downstream service your tools invoke. Without tracing, a slow agent session is a black box — you know something took too long, but you don't know whether the bottleneck was the network, the initialize handshake, a specific tool call, or a downstream API that tool depends on. With tracing, you have the full causal chain as a timeline with durations attached to each segment.

TL;DR

Use OpenTelemetry to instrument your MCP server. The trace structure has one root span per agent session (mcp.session), with child spans per protocol operation: mcp.initialize, mcp.tools_list, and one mcp.tool_call span per tool invocation. Each tool call span has child spans for downstream API calls. Propagate W3C traceparent via HTTP headers for HTTP/SSE MCP servers, or via JSON-RPC _meta fields for stdio-based servers. Never log tool call arguments as span attributes — they may contain user PII. External probe monitoring from AliveMCP complements tracing by covering the gap where the server is completely down and generating no traces at all.

Why standard distributed tracing needs MCP adaptation

Standard distributed tracing frameworks assume a request-response model: one request comes in, one response goes out, and the trace covers that lifecycle. MCP has a different shape:

Session-scoped: an agent session involves multiple protocol operations (initialize, tools/list, one or more tool calls) over a persistent connection. The "request" that matters to the end user spans all of them, not just one.
Four protocol layers: each with independent failure modes. A trace that only covers the application layer misses transport, HTTP, and initialization failures — precisely the failure modes that make MCP servers unreliable. See MCP server observability for the full four-layer model.
Tool call amplification: one agent session generates N tool call spans, each of which may generate M downstream service spans. A session that calls 10 tools where each tool hits 3 APIs produces 30+ child spans. Your trace sampling strategy must account for this fanout.
Stdio vs HTTP transport: for stdio-based MCP servers, there are no HTTP headers to carry traceparent. Propagation requires injecting context into the JSON-RPC message structure via the _meta field extension point.

Trace structure for MCP

The recommended span hierarchy for a complete MCP server trace:

agent_session (root span)
  ├── mcp.initialize
  │     └── (optional: auth validation span if SSO/OAuth involved)
  ├── mcp.tools_list
  │     └── (optional: tool registry fetch if dynamic tools)
  ├── mcp.tool_call [tool_name="weather.get"]
  │     ├── downstream.http [url="https://api.weather.example/v1/current"]
  │     └── downstream.cache [operation="redis.get", key="weather:lat:long"]
  └── mcp.tool_call [tool_name="calendar.list"]
        └── downstream.http [url="https://calendar.google.com/api/v3/events"]

Span attribute naming conventions:

mcp.session_id: stable session identifier, generated at initialize. Use this to correlate all spans in the same agent session.
mcp.operation: one of initialize, tools_list, tool_call.
mcp.tool_name: the tool name on tool call spans. Required for per-tool latency and error rate breakdown.
mcp.error_code: JSON-RPC error code on error spans (e.g., -32601 method not found, -32603 internal error).
mcp.client_id: optional identifier for the AI agent making the request, if your server tracks client identity.
http.status_code: standard OTel HTTP semantic convention for the transport layer.

PII rule: never include tool call arguments as span attributes. Tool inputs frequently contain user-provided data (names, addresses, queries, API keys passed as parameters). Log tool call argument shapes in structured logs instead, stripped to their schema (key names only, not values). Traces flow to observability backends where the retention and access control model may differ from your primary data store — arguments in spans will persist longer than intended and in more places than expected.

W3C traceparent propagation

Distributed tracing requires each component to pass context to the next so that spans from different services can be assembled into a single trace tree. W3C Trace Context defines the standard propagation format.

HTTP/SSE MCP servers

For MCP servers that communicate over HTTP (SSE transport or streamable HTTP), propagation uses standard HTTP headers:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: alivemcp=probe,vendor=otlp

The AI agent that initiates the session generates a root traceparent and includes it in the HTTP request to your MCP server. Your server reads it via the OpenTelemetry SDK's HTTP propagator, creates a child span, and passes the updated context to any downstream HTTP calls your tools make. This produces a trace tree that spans from the agent through your MCP server into your backend services — a single timeline for the full operation.

stdio-based MCP servers

Stdio MCP servers communicate over stdin/stdout with JSON-RPC messages, not HTTP. There are no headers. Use the JSON-RPC _meta field extension point to carry trace context:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "weather.get",
    "arguments": { "location": "..." },
    "_meta": {
      "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01"
    }
  },
  "id": 1
}

The _meta field is defined in the MCP specification as an extension point for non-semantic metadata. Your server reads params._meta.traceparent at the JSON-RPC layer and extracts context before processing the request. This requires a thin wrapper around your MCP SDK's tool dispatch that runs the OTel propagator on the _meta field before entering tool-specific code.

See JSON-RPC health checks vs HTTP probes for deeper discussion of the protocol-layer differences between HTTP and JSON-RPC MCP transports.

OpenTelemetry SDK implementation

Minimal instrumentation for a Node.js MCP server using the OTel SDK:

import { trace, context, propagation } from '@opentelemetry/api';
import { W3CTraceContextPropagator } from '@opentelemetry/core';

const tracer = trace.getTracer('mcp-server', '1.0.0');
propagation.setGlobalPropagator(new W3CTraceContextPropagator());

// In your tool dispatch handler:
async function handleToolCall(req) {
  const carrier = req.params?._meta ?? {};
  const ctx = propagation.extract(context.active(), carrier);

  return context.with(ctx, async () => {
    const span = tracer.startSpan('mcp.tool_call', {
      attributes: {
        'mcp.tool_name': req.params.name,
        'mcp.session_id': req.sessionId,
        'mcp.operation': 'tool_call',
      }
    });
    try {
      const result = await executeTool(req.params.name, req.params.arguments);
      span.setStatus({ code: 0 }); // OK
      return result;
    } catch (err) {
      span.recordException(err);
      span.setStatus({ code: 2, message: err.message }); // ERROR
      throw err;
    } finally {
      span.end();
    }
  });
}

For Python MCP servers, the pattern is equivalent using opentelemetry-api and opentelemetry-sdk. Instrument each protocol operation (initialize, tools/list, tool call) with a span. Use context managers rather than try/finally where possible for cleaner error handling.

Sampling strategy

At high traffic volumes, tracing every operation is expensive in both CPU overhead and storage cost. A practical sampling strategy for MCP servers:

Always sample initialize: initialize spans are cheap (one per session, not per tool call) and highly diagnostic — they're the first place to look for auth failures and protocol version mismatches. Sample at 100%.
Always sample tools/list: tools/list failures are distinct from tool call failures and must be diagnosed separately. 100% sampling here is also cheap since it fires once per session at most.
Sample tool calls at 1/N for high-traffic servers: at >1,000 sessions/hour, tracing every tool call creates significant overhead. Use a head-based sampler at 10–20% for tool call spans on high-traffic paths. Always-sample tool calls that produce errors (tail-based sampling): configure your OTel SDK to upgrade the sample rate to 100% for any span tree that contains an error span.
Always sample the first N calls of a new tool name: when you add a new tool or update tool logic, full sampling for the first 1,000 invocations gives you early-warning diagnostic data before switching to the reduced sample rate.

How external probes complement tracing

Distributed tracing has a fundamental blind spot: it only produces data when the server is running and receiving requests. When your MCP server is completely down — TCP refused, host unreachable, process crashed — no traces are generated. The absence of traces is not itself an alert signal in most tracing backends.

External probe monitoring from AliveMCP fills this gap. The probe initiates a real MCP protocol sequence from outside your infrastructure every 60 seconds, regardless of whether any agent is actually using the server. If the server is down, the probe generates an alert immediately — without waiting for a user to experience a failed agent session. The probe also verifies that the server is reachable from the public internet, which internal health checks and traces cannot confirm.

The practical workflow: when an alert fires, check AliveMCP first to understand which protocol layer failed (transport/HTTP/initialize/tools_list). Then open your tracing backend and look for error spans in the 5-minute window before the alert timestamp. The probe alert gives you the what-layer and when; the traces give you the why-within-that-layer. See MCP server error rate for per-layer error classification and how probe data and trace data differ per layer.

Backend options for trace storage

Where to send your OTel spans:

Jaeger (self-hosted): free, open source, runs alongside your MCP server. Good for small teams with infrastructure capacity. Jaeger All-in-One Docker image has a 30-day retention window and a full query UI. No cost, but operational overhead.
Grafana Tempo (self-hosted or cloud): designed for high-volume trace ingestion with low storage cost using object storage backends (S3, GCS). Pairs with Grafana dashboards from MCP server monitoring dashboard. Grafana Cloud free tier includes Tempo.
Managed OTLP backends: Honeycomb, Lightstep, Axiom all accept OTLP directly. Useful when you want managed retention and querying without self-hosted infrastructure. Pricing varies by event volume.
New Relic / Datadog: both accept OTLP traces. Higher cost, but useful if you're already on either platform for other signals. The integration cost and per-GB pricing can exceed the value for a small MCP server fleet — see our Datadog MCP alternative analysis.