Guide · Observability

MCP server distributed tracing

Distributed tracing tracks a single logical operation — a user's tool call — as it flows through multiple services: the LLM client, the MCP server, downstream HTTP APIs, the database, and possibly other MCP servers called via nested tool invocations. Without distributed tracing, a slow tool call appears as a single opaque number — "the search tool took 3 seconds". With tracing, you see the breakdown: 2ms for request parsing, 50ms for cache lookup (miss), 2.8s waiting for the external search API, 150ms serialising the response. The slow part is unambiguously the external search API. This guide covers W3C traceparent extraction at session start, creating child spans per tool call, propagating context to downstream HTTP calls, and exporting traces to Jaeger or Grafana Tempo.

TL;DR

At session initialize, extract the W3C traceparent header from the client's request metadata using propagation.extract(). Store that context object on the session. In each tool handler, call tracer.startActiveSpan with the session context as parent — creating a child span. In downstream HTTP calls, inject the current context into outgoing headers using propagation.inject() so downstream services continue the same trace. Export to Jaeger via OTLP. In Grafana, use trace-to-log correlation via the shared trace_id field in your Pino JSON logs. See AliveMCP for external probe spans that enter your trace from outside the cluster.

The tracing topology for MCP

A typical MCP tool call crosses multiple service boundaries:

LLM client (Claude, Cursor, etc.)
  │  HTTP POST /mcp  [traceparent: 00-abc123...-def456...-01]
  ▼
MCP server (your process)
  │  tools/call "search"
  ▼  ├─ span: mcp.tool/search
     │   HTTP GET /v2/search  [traceparent propagated]
     │   ▼
     │  Search API (external)
     │   └─ span recorded in search API's own tracer
     │
     └─ span: mcp.tool/query_db
          SQL SELECT ... [traced by @opentelemetry/instrumentation-pg]
          ▼
         PostgreSQL (no OTel, but span recorded in your process)

If the LLM client sends a traceparent header and your MCP server extracts it, all spans created during that session share the same root trace ID. In Jaeger or Grafana Tempo, you see the entire tree: the client's span as root, your tool call spans as children, and the downstream API spans as grandchildren — in a single trace view.

If the LLM client does not send a traceparent, your server starts a new root trace. The trace is still useful for internal diagnostics, just not connected to the client's larger operation.

W3C traceparent format

The W3C Trace Context specification defines the traceparent header format used by OTel and all modern tracing systems:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ── ──────────────────────────────── ──────────────── ──
             │           trace-id (128-bit)       parent-span-id   │
         version                                  (64-bit)      trace-flags
                                                                (01=sampled)

Four fields:

OTel's propagation.extract() parses this header and returns an OTel context object. You never need to parse traceparent manually — always use the OTel propagation API.

Extracting traceparent at session initialise

The MCP initialize request is the right place to extract the incoming trace context — it's the first message of every session and the natural place to establish session-level context:

// session-context.ts — store the OTel context per session
import { Context, propagation, context as otelContext } from '@opentelemetry/api';
import { AsyncLocalStorage } from 'async_hooks';

const sessionContextStorage = new AsyncLocalStorage<{ otelCtx: Context }>();

export function withSessionContext<T>(otelCtx: Context, fn: () => Promise<T>): Promise<T> {
  return sessionContextStorage.run({ otelCtx }, fn);
}

export function getSessionOtelContext(): Context {
  return sessionContextStorage.getStore()?.otelCtx ?? otelContext.active();
}
// server.ts — extract traceparent on initialize
import { propagation } from '@opentelemetry/api';
import { withSessionContext } from './session-context';

server.setRequestHandler(InitializeRequestSchema, async (request, context) => {
  // MCP passes HTTP headers via _meta or via the transport-level request headers
  const incomingHeaders = request.params._meta?.headers ?? {};
  const parentCtx = propagation.extract(otelContext.active(), incomingHeaders);

  // Wrap the entire session in the extracted OTel context
  return withSessionContext(parentCtx, async () => {
    return {
      protocolVersion: '2024-11-05',
      capabilities: { tools: {} },
      serverInfo: { name: 'my-mcp-server', version: '1.0.0' },
    };
  });
});

If the client does not include a traceparent header, propagation.extract() returns an empty context — the next span will start a new root trace. This is correct behaviour, not an error.

Creating child spans per tool call

With the session context stored in AsyncLocalStorage, tool-call spans are automatically parented to the session's root context:

// In tool handlers — create a child span using the session's OTel context
import { trace, SpanStatusCode } from '@opentelemetry/api';
import { tracer } from './tracer';
import { getSessionOtelContext } from './session-context';

server.tool('search', searchSchema, async (params, context) => {
  const sessionCtx = getSessionOtelContext();

  return tracer.startActiveSpan(
    'mcp.tool/search',
    { kind: SpanKind.SERVER },
    sessionCtx,  // parent context — makes this span a child of the session's root span
    async (span) => {
      span.setAttributes({
        'mcp.tool.name': 'search',
        'mcp.session.id': context.meta?.sessionId ?? 'unknown',
      });

      try {
        const results = await callSearchApi(params.query, deps);
        span.setStatus({ code: SpanStatusCode.OK });
        span.setAttribute('mcp.result.count', results.length);
        return { content: [{ type: 'text', text: JSON.stringify(results) }] };
      } catch (err) {
        span.recordException(err as Error);
        span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
        return { isError: true, content: [{ type: 'text', text: 'search failed' }] };
      } finally {
        span.end();
      }
    }
  );
});

The third argument to startActiveSpan is the parent context. When you pass sessionCtx, the new span's parent-span-id is set to the span ID from the extracted traceparent. The trace ID is inherited automatically.

Propagating context to downstream HTTP calls

When your tool handler calls an external HTTP API, inject the current OTel context into the outgoing request headers. The downstream service can then extract it and create its own child spans:

// Inject traceparent into outgoing HTTP calls
import { propagation, context as otelContext } from '@opentelemetry/api';

async function callSearchApi(query: string, agent: https.Agent): Promise<SearchResult[]> {
  const headers: Record<string, string> = {
    'Content-Type': 'application/json',
  };

  // Inject the active OTel context (traceparent + tracestate) into headers
  propagation.inject(otelContext.active(), headers);
  // headers now contains: { traceparent: '00-abc123...-def456...-01', ... }

  const res = await fetch('https://search.internal/v2/search', {
    method: 'POST',
    headers,
    body: JSON.stringify({ query }),
    dispatcher: agent,
  });

  if (!res.ok) throw new Error(`search API ${res.status}`);
  return res.json();
}

If the downstream service is also instrumented with OTel (any language — the W3C spec is cross-language), it will extract your traceparent and create child spans that appear in the same trace in Jaeger or Tempo. If the downstream service is not instrumented, the header is silently ignored and the trace ends at your server's span — no error.

Jaeger backend setup

For a self-hosted tracing backend, Jaeger is the most common choice. Run it alongside your services with Docker Compose:

# docker-compose.yml — Jaeger all-in-one for development
services:
  jaeger:
    image: jaegertracing/all-in-one:1.57
    ports:
      - "16686:16686"   # Jaeger UI
      - "4317:4317"     # OTLP gRPC receiver
      - "4318:4318"     # OTLP HTTP receiver
    environment:
      - COLLECTOR_OTLP_ENABLED=true
      - SPAN_STORAGE_TYPE=memory

  mcp-server:
    build: .
    environment:
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
      - OTEL_SERVICE_NAME=my-mcp-server
      - NODE_ENV=development

Point your OTLPTraceExporter at http://jaeger:4318/v1/traces (HTTP) or grpc://jaeger:4317 (gRPC). Open http://localhost:16686 to see traces. Search by service=my-mcp-server and operation=mcp.tool/search.

For production, use Grafana Tempo instead of Jaeger. Tempo stores traces in object storage (S3, GCS) rather than in memory, integrates with Grafana dashboards, and supports trace-to-log and trace-to-metrics correlation. The OTLP endpoint URL and exporter configuration are identical — only the backend changes.

Sampling and trace completeness

A key property of distributed tracing is sampling consistency: if a trace is sampled at the root (client decision), every downstream service must also sample it; if it is not sampled, no downstream service should record it. This prevents partial traces — where you see only some spans for an operation — which are worse than no traces because they mislead you about latency.

OTel's ParentBasedSampler enforces this: it respects the trace-flags bit from an incoming traceparent. If the client set trace-flags=01 (sampled), ParentBasedSampler always records the trace regardless of the local ratio sampler setting. If the client set trace-flags=00, ParentBasedSampler drops it:

// Always use ParentBasedSampler to respect upstream sampling decisions
import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-node';

const sampler = new ParentBasedSampler({
  root: new TraceIdRatioBasedSampler(0.1),  // sample 10% of new (root) traces
  // remoteParentSampled: AlwaysOnSampler  (default — respect upstream "sampled" flag)
  // remoteParentNotSampled: AlwaysOffSampler  (default — respect upstream "not sampled" flag)
});

Trace-to-log correlation

Every log line emitted inside a span should carry the span's trace_id and span_id. This enables the "click a log line → jump to the trace" workflow in Grafana. Use the Pino mixin pattern from the structured logging guide:

// Pino mixin injects trace_id/span_id from the active OTel span
import { trace } from '@opentelemetry/api';

mixin() {
  const span = trace.getActiveSpan();
  if (!span) return {};
  const ctx = span.spanContext();
  return { trace_id: ctx.traceId, span_id: ctx.spanId };
},

In Grafana Loki, configure a derived field on the trace_id JSON key that links to your Grafana Tempo datasource. In Kibana (Elasticsearch), configure a trace link on the trace_id field in your index pattern. Both give you a clickable link from any log entry to its parent trace.

AliveMCP probe spans

AliveMCP external probes are not just uptime checks — they can carry a synthetic traceparent header on the initialize request. When your MCP server extracts it and creates child spans, the AliveMCP probe's trace appears in Jaeger or Tempo alongside your user traces. This lets you:

Use AliveMCP for continuous external availability monitoring. See the observability overview for how external probes, distributed traces, metrics, and structured logs form a layered observability system.