Guide · Observability

MCP server OpenTelemetry

OpenTelemetry (OTel) is the CNCF standard for collecting traces, metrics, and logs from any service. A single SDK — @opentelemetry/sdk-node — instruments all three signals and exports them via OTLP to any backend (Jaeger, Grafana Tempo, Prometheus, Loki, Datadog, Honeycomb). For MCP servers, the three signals complement each other: traces show the path of a tool call through your server and its downstream dependencies; metrics show aggregate counts, latencies, and error rates across all tool calls; logs contain the per-session, per-call detail that neither traces nor metrics can hold. Wiring them together — with a shared trace ID on every log line — makes debugging a slow or failing tool call a matter of minutes, not hours.

TL;DR

Install @opentelemetry/sdk-node and the OTLP exporter. Start the SDK before your server code runs. In each tool handler, open a span with tracer.startActiveSpan, set attributes for tool.name and session.id, record exceptions, and end the span in a finally block. Emit a custom mcp.tool_calls_total counter and a mcp.tool_duration_ms histogram. Inject trace_id and span_id from the active OTel context into every Pino log line so you can jump from a log entry to its parent trace in Grafana. Pair with AliveMCP external probes to catch failures that never produce internal spans.

Why three signals, not one

Many teams start with just logs (easy to add console.log) or just metrics (easy to add a /healthz endpoint), but each signal has blind spots that the other two fill:

SignalBest forBlind spot
TracesPer-request latency breakdown, finding which downstream call is slowAggregate rates, missing from requests that never start (process crash)
MetricsAlerting on error rate, P99 latency, saturation; cheaply stored over timeNo per-request detail, can't drill into which specific call was slow
LogsExact parameters, error messages, per-session context, debugging edge casesExpensive to store at DEBUG level in production, no graph-friendly structure without a query engine

OTel connects all three: spans carry a traceId; log records include the same traceId; exemplars attach a traceId to a histogram bucket for the P99 data point. You click a slow histogram bucket, jump to the trace, find the slow span, and read the associated logs — all in one workflow in Grafana.

NodeSDK setup

The SDK must be initialised before any other require or import. Create a dedicated instrumentation.ts file and import it at the very top of your server entry point:

// instrumentation.ts — initialise before server.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION, SEMRESATTRS_DEPLOYMENT_ENVIRONMENT } from '@opentelemetry/semantic-conventions';
import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-node';

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: 'my-mcp-server',
    [SEMRESATTRS_SERVICE_VERSION]: process.env.SERVICE_VERSION ?? '0.0.0',
    [SEMRESATTRS_DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV ?? 'development',
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT
        ? process.env.OTEL_EXPORTER_OTLP_ENDPOINT.replace('/v1/traces', '/v1/metrics')
        : 'http://localhost:4318/v1/metrics',
    }),
    exportIntervalMillis: 15_000,
  }),
  // In production sample at 10%; in development sample everything
  sampler: new ParentBasedSampler({
    root: new TraceIdRatioBasedSampler(
      process.env.NODE_ENV === 'production' ? 0.1 : 1.0
    ),
  }),
});

sdk.start();

process.on('SIGTERM', () => sdk.shutdown().finally(() => process.exit(0)));
// server.ts — import instrumentation first, before any other module
import './instrumentation';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { createDeps } from './deps';
// ... rest of server code

The resource block attaches service.name, service.version, and deployment.environment to every span, metric, and log record exported from this process. Backends use these to filter by service and environment without you adding them manually to every event.

Creating a span per tool call

The highest-value instrumentation point in an MCP server is the tool handler. Each tool call is a discrete unit of work that deserves its own span with the tool name, session ID, and outcome as attributes:

// tracer.ts — application-level tracer
import { trace } from '@opentelemetry/api';
export const tracer = trace.getTracer('mcp-server', '1.0.0');
// In server.ts — wrap every tool handler with a span
import { SpanStatusCode } from '@opentelemetry/api';
import { tracer } from './tracer';

server.tool('search', searchSchema, async (params, context) => {
  return tracer.startActiveSpan('mcp.tool/search', async (span) => {
    span.setAttributes({
      'mcp.tool.name': 'search',
      'mcp.session.id': context.meta?.sessionId ?? 'unknown',
      'mcp.query.length': params.query.length,
    });

    try {
      const results = await deps.searchBulkhead.execute(() =>
        callSearchApi(params.query, deps.searchAgent)
      );
      span.setStatus({ code: SpanStatusCode.OK });
      span.setAttribute('mcp.result.count', results.length);
      return { content: [{ type: 'text', text: JSON.stringify(results) }] };
    } catch (err) {
      span.recordException(err as Error);
      span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
      return { isError: true, content: [{ type: 'text', text: 'search failed' }] };
    } finally {
      span.end();
    }
  });
});

Key attributes to set on every tool span:

Do not set sensitive parameters (API keys, database credentials, full user-query text) as span attributes — attributes are stored in your tracing backend and may appear in dashboards. Use a truncated or hashed version if you need to correlate by query.

Custom MCP metrics

Define instruments once at module scope, not inside tool handlers. Creating a new counter inside each tool call defeats the purpose — instruments must be singletons that accumulate across all calls:

// metrics.ts — application-level meters
import { metrics } from '@opentelemetry/api';

const meter = metrics.getMeter('mcp-server', '1.0.0');

export const toolCallsCounter = meter.createCounter('mcp.tool_calls_total', {
  description: 'Total number of MCP tool calls',
  unit: '1',
});

export const toolDurationHistogram = meter.createHistogram('mcp.tool_duration_ms', {
  description: 'Duration of MCP tool calls in milliseconds',
  unit: 'ms',
  advice: {
    explicitBucketBoundaries: [5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000],
  },
});

export const activeSessionsGauge = meter.createUpDownCounter('mcp.active_sessions', {
  description: 'Number of active MCP sessions',
  unit: '1',
});

export const transportErrorsCounter = meter.createCounter('mcp.transport_errors_total', {
  description: 'Total number of MCP transport-level errors',
  unit: '1',
});
// Using metrics in a tool handler
import { toolCallsCounter, toolDurationHistogram } from './metrics';

server.tool('search', searchSchema, async (params, context) => {
  const start = Date.now();
  const labels = { tool_name: 'search', transport: 'sse' };

  try {
    const result = await runSearch(params, deps);
    toolCallsCounter.add(1, { ...labels, status: 'ok' });
    toolDurationHistogram.record(Date.now() - start, { ...labels, status: 'ok' });
    return result;
  } catch (err) {
    toolCallsCounter.add(1, { ...labels, status: 'error' });
    toolDurationHistogram.record(Date.now() - start, { ...labels, status: 'error' });
    throw err;
  }
});

Session lifecycle hooks (if available in your transport) increment and decrement mcp.active_sessions. Wire them in your session management code, not in individual tool handlers. See the full metrics guide for Prometheus export, Grafana dashboard configuration, and alert rules on error rate and P99 latency.

Correlating logs with traces

The most powerful OTel feature for day-to-day debugging is trace-log correlation: every log line emitted during a tool call carries the same traceId and spanId as the span. In Grafana, you click a log line, see the linked trace, and jump directly to the span — no manual correlation needed.

With Pino, inject the active OTel context into each log call via a mixin:

// logger.ts — Pino with OTel trace context injection
import pino from 'pino';
import { trace, context } from '@opentelemetry/api';

export const logger = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  redact: {
    paths: ['*.password', '*.token', '*.api_key', '*.DATABASE_URL', '*.secret'],
    censor: '[REDACTED]',
  },
  mixin() {
    const span = trace.getActiveSpan();
    if (!span) return {};
    const ctx = span.spanContext();
    return {
      trace_id: ctx.traceId,
      span_id: ctx.spanId,
      trace_flags: ctx.traceFlags,
    };
  },
});

Because mixin is called on every log.info() / log.error() invocation, it picks up the active span automatically — you never need to pass the trace ID manually to log calls. If no span is active (e.g., in startup code), the mixin returns {} and the log line is emitted without trace fields, which is correct.

For Grafana Loki + Tempo correlation to work, the trace_id field in your log line must match the format Tempo expects (lowercase hex, 32 chars for 128-bit trace IDs). OTel's ctx.traceId returns exactly that format.

Resource attributes and environment configuration

Resource attributes are attached to every span, metric, and log exported from the process. The three most important for MCP servers in a multi-environment deployment:

AttributeValueWhy it matters
service.namemy-mcp-serverGroups all signals from this service in Grafana, Jaeger, etc.
service.version1.2.3Enables before/after error rate comparison across deploys
deployment.environmentproduction / stagingPrevents staging noise from polluting production dashboards

OTel also supports auto-detection of additional resource attributes from cloud provider metadata APIs (AWS EC2 instance type, Kubernetes pod name, GCP region). Enable these by adding the @opentelemetry/resource-detector-aws / @opentelemetry/resource-detector-gcp / @opentelemetry/resource-detector-container packages to resourceDetectors in the NodeSDK constructor. These detectors make an HTTP call to the instance metadata endpoint at startup — skip them in development to avoid slow startup times.

Sampling strategies

At 100% sampling (the default), every tool call produces a trace. For a busy MCP server handling thousands of calls per minute, this generates substantial backend storage cost. Apply TraceIdRatioBasedSampler to reduce cost without losing all traces:

// Sample 10% of requests in production, 100% in development
import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-node';

const sampler = new ParentBasedSampler({
  root: new TraceIdRatioBasedSampler(
    process.env.NODE_ENV === 'production' ? 0.1 : 1.0
  ),
});

ParentBasedSampler is important: if an upstream caller (e.g., the LLM client, or another service) has already sampled the request (their traceparent has trace-flags=01), the ParentBasedSampler respects that decision and records the trace even if the random ratio would have dropped it. This ensures that traces that cross service boundaries are either fully recorded or fully dropped, never partially captured.

For more targeted sampling — always record traces with errors, sample everything else at 10% — deploy an OpenTelemetry Collector in front of your tracing backend and configure tail-based sampling there. The Collector sees the full trace before making the sampling decision, so it can make error-aware choices.

OTel and AliveMCP together

OTel traces capture what happens inside the MCP server process. They cannot capture:

These failures produce no OTel data because no application code ran. AliveMCP fills this gap with external probes: it connects from outside the cluster, performs a full MCP initializetools/listtools/call sequence, and reports the result alongside your internal OTel data. A deployment that looks healthy in Grafana (because internal metrics are still exporting from the old pod) can simultaneously be failing in AliveMCP (because the new pod is crashing before it can accept connections). Both views are needed.

Startup sequence with OTel

The SDK must be fully started before your MCP server begins accepting tool calls. The recommended startup order:

// server.ts — startup sequence with OTel
import './instrumentation';             // 1. start OTel SDK first
import { parseConfig } from './config'; // 2. validate config
import { createDeps } from './deps';    // 3. open connections (traced)
import { buildServer } from './app';    // 4. create MCP server + register tools

async function main() {
  const config = parseConfig();
  const deps = await createDeps(config);
  const server = buildServer(deps);

  await server.listen({ port: config.PORT });
  logger.info({ port: config.PORT }, 'MCP server listening');
}

main().catch(err => {
  logger.error({ err }, 'startup failed');
  process.exit(1);
});

Connections opened inside createDeps() after the SDK starts will have their own spans if the underlying client library has OTel auto-instrumentation (e.g., @opentelemetry/instrumentation-pg for PostgreSQL, @opentelemetry/instrumentation-redis for Redis). Add these instrumentations to the instrumentations array in the NodeSDK constructor to automatically trace every database query and cache call without any manual span code in your application.

For a comprehensive picture of all four observability signals working together — traces, metrics, structured logs, and external probes — see the MCP Server Infrastructure Hardening Guide.