Guide · Debugging

Debug logging for MCP servers

MCP servers run in an unusual context: they receive JSON-RPC requests from an AI model running inside a client application, execute tool code, and return structured results — all over a transport (stdio or HTTP) that has no built-in request tracing. Without deliberate logging, a failing tool call produces a cryptic error in the AI client with no server-side context. Structured logging with a library like pino or winston gives you tool call duration, argument shape, error messages, and correlation IDs — the information you need to diagnose problems in both development and production.

TL;DR

Log to stderr, never stdout — stdout is the JSON-RPC transport in stdio-mode servers. Use a structured logger (pino is fastest). Log tool call start (tool name + argument shape), tool call result (success/error + duration), and errors with stack traces. Do not log full tool results — they may contain PII. Add a correlation ID per request to trace multi-tool agent workflows. Sample high-volume tool calls at 1–10% to avoid flooding logs.

Why stderr, not stdout

In stdio-mode MCP servers, stdout is the JSON-RPC wire — every byte written to stdout is parsed as a protocol message by the MCP client. Writing log output to stdout corrupts the JSON-RPC stream and causes the client to receive malformed JSON, which typically manifests as a silent connection failure or a parse error in the client logs.

Always write debug output to stderr. The MCP SDK captures stdin/stdout for the protocol and leaves stderr untouched — stderr output appears in terminal output during development and in log aggregation tools in production, without corrupting the protocol stream.

// WRONG — corrupts the JSON-RPC stream in stdio mode
console.log('Tool called:', toolName);

// CORRECT — stderr is safe in both stdio and HTTP modes
console.error('Tool called:', toolName);
process.stderr.write(`Tool called: ${toolName}\n`);

In HTTP-mode servers, stdout doesn't go to the transport, so console.log is technically safe — but using stderr consistently means your logging code works in both modes without changes.

Setting up pino for MCP servers

Pino is the fastest Node.js logging library and produces newline-delimited JSON to stderr by default. Install it with npm install pino and configure a server-level logger:

import pino from 'pino';

export const logger = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  // Write to stderr, not stdout
  transport: process.env.NODE_ENV === 'development'
    ? { target: 'pino-pretty', options: { destination: 2 } }  // 2 = stderr
    : undefined,
  // In production, pino writes NDJSON to stderr by default
  // (fd 2 when destination is not set, or use: pino({}, pino.destination(2)))
}, pino.destination(2));

For development, install pino-pretty as a devDependency to get readable output. In production, keep the raw NDJSON output — log aggregators (Datadog, Loki, CloudWatch) parse it automatically.

Set the log level via LOG_LEVEL environment variable: trace for full argument logging during debugging, debug during development, info in production. The pino level hierarchy is trace → debug → info → warn → error → fatal.

What to log per tool call

Each tool handler should log three events: call start, result (success or error), and the error details if the call fails. This gives you a complete picture of every tool invocation:

import { logger } from './logger.js';

server.tool('search_documents', searchSchema, async (args, extra) => {
  const requestId = extra?.meta?.progressToken ?? crypto.randomUUID();
  const start = performance.now();
  const callLogger = logger.child({ requestId, tool: 'search_documents' });

  // 1. Log call start with argument shape (not full values)
  callLogger.info({
    event: 'tool_call_start',
    args: {
      query: args.query?.slice(0, 100),   // truncate long strings
      limit: args.limit,
      filter: args.filter ? '[present]' : '[absent]',
    },
  }, 'tool call start');

  try {
    const results = await searchDocuments(args.query, {
      limit: args.limit,
      filter: args.filter,
    });

    const durationMs = Math.round(performance.now() - start);

    // 2. Log result metadata (not the result content itself)
    callLogger.info({
      event: 'tool_call_success',
      durationMs,
      resultCount: results.length,
      truncated: results.length === args.limit,
    }, 'tool call success');

    return {
      content: [{ type: 'text', text: JSON.stringify(results) }],
    };
  } catch (err) {
    const durationMs = Math.round(performance.now() - start);

    // 3. Log errors with stack trace
    callLogger.error({
      event: 'tool_call_error',
      durationMs,
      errorCode: err.code,
      errorMessage: err.message,
      stack: err.stack,
    }, 'tool call error');

    return {
      content: [{ type: 'text', text: `Error: ${err.message}` }],
      isError: true,
    };
  }
});

The key discipline is logging argument shape (field names, types, presence) rather than full argument values. Query strings are truncated at 100 characters. Optional fields are logged as [present] or [absent]. This gives you enough context to diagnose problems without logging potentially sensitive user data.

What NOT to log

Tool results frequently contain user data that should not appear in logs. Documents, emails, calendar entries, database rows, API responses — all can contain PII that must not land in your log aggregator. The rule: log result metadata, not result content.

Log this	Not this
`resultCount: 5`	The 5 document objects returned
`resultSizeBytes: 4200`	The document text content
`userId: '[present]'`	The actual user ID value
`query: args.query.slice(0, 50)`	The full query (may contain PII)
`errorMessage: err.message`	The full error object (may embed user data)

For debugging a specific production issue where you need full argument values, use a short-lived LOG_LEVEL=trace environment variable with a time-limited deployment — but never leave trace-level logging on permanently in production.

Correlation IDs across multi-tool agent workflows

When an AI agent executes a workflow that calls multiple tools in sequence (search, then fetch, then summarize), each tool call arrives as a separate JSON-RPC request with no shared context. Without a correlation ID, logs from a single agent workflow are scattered across separate log lines with no way to group them.

The MCP SDK passes a meta.progressToken in the extra parameter of tool handlers when the client sets one — use this as a correlation ID. When the client doesn't set a progress token, generate a UUID for the connection and thread it through all tool calls from that session:

import { AsyncLocalStorage } from 'node:async_hooks';

const requestContext = new AsyncLocalStorage();

// Middleware-style wrapper (call this at the start of each tool handler)
function withContext(fn) {
  return async (args, extra) => {
    const correlationId = extra?.meta?.progressToken
      ?? extra?.meta?.requestId
      ?? crypto.randomUUID();

    return requestContext.run({ correlationId }, () => fn(args, extra));
  };
}

// In handlers, get the current correlation ID from AsyncLocalStorage
function getLogger(toolName) {
  const ctx = requestContext.getStore() ?? {};
  return logger.child({ ...ctx, tool: toolName });
}

server.tool('fetch_document', fetchSchema, withContext(async (args) => {
  const log = getLogger('fetch_document');
  log.info({ event: 'tool_call_start', docId: args.id }, 'tool call start');
  // ...
}));

With this pattern, all log lines from a single agent workflow share the same correlationId. You can filter your log aggregator to correlationId = "abc-123" and see every tool call in sequence, with durations, result counts, and errors — a complete trace of what the agent did.

Log sampling for high-volume servers

If your MCP server handles thousands of tool calls per minute (common for document search or database query tools used by automated agents), logging every call at info level floods your log aggregator and drives up ingestion costs. Use pino's built-in sampling or a manual approach to log only a fraction of successful calls:

// Log every call start at debug level (filtered out in production)
// Log 1% of successful calls at info level (random sample)
// Log ALL errors at error level (never sample errors)

const SAMPLE_RATE = parseFloat(process.env.LOG_SAMPLE_RATE ?? '0.01');

callLogger.debug({ event: 'tool_call_start', ... }, 'tool call start');

if (Math.random() < SAMPLE_RATE) {
  callLogger.info({ event: 'tool_call_success', durationMs, resultCount }, 'sampled call');
}

if (err) {
  callLogger.error({ event: 'tool_call_error', ... }, 'tool call error');
}

A 1% sample rate at 10,000 calls/minute produces 100 logged successes/minute — enough to detect latency regressions via p99 percentile monitoring — while reducing log volume by 99x. Always sample errors at 100% regardless of volume.

Recognizing the AliveMCP probe in logs

AliveMCP monitors your server with periodic protocol-level probes: initialize → tools/list → sentinel tool call. If you've set up debug logging, you'll see this probe in your logs every few minutes. Recognizing it prevents false-positive alerts and lets you filter it out of your analytics:

// AliveMCP probe signature in logs (pino NDJSON format):
{"level":30,"tool":"tools/list","event":"tool_call_start","args":{},"msg":"tool call start"}
{"level":30,"tool":"tools/list","event":"tool_call_success","durationMs":2,"resultCount":8,"msg":"tool call success"}
{"level":30,"tool":"alivemcp_sentinel","event":"tool_call_start","args":{"nonce":"[present]"},"msg":"tool call start"}
{"level":30,"tool":"alivemcp_sentinel","event":"tool_call_success","durationMs":1,"msg":"tool call success"}

The AliveMCP probe uses the alivemcp_sentinel tool name for its verification call. Filter these out of your dashboards with tool != "alivemcp_sentinel" to keep your analytics clean. In your log aggregator, tag them as synthetic traffic rather than ignoring them entirely — the probe's duration metrics are useful for detecting latency in your handler initialization code.

Using winston instead of pino

If your project already uses winston (common in Express-based servers converted to MCP), configure it to write JSON to stderr:

import winston from 'winston';

export const logger = winston.createLogger({
  level: process.env.LOG_LEVEL ?? 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json(),
  ),
  transports: [
    new winston.transports.Stream({
      stream: process.stderr,  // explicit stderr
    }),
  ],
});

// In development, add a console transport with pretty formatting
if (process.env.NODE_ENV === 'development') {
  logger.add(new winston.transports.Console({
    format: winston.format.combine(
      winston.format.colorize(),
      winston.format.simple(),
    ),
  }));
}

Winston is slower than pino (JSON serialization is not as optimized) but has a larger plugin ecosystem. For new MCP server projects, pino is the default recommendation. For projects where winston is already a dependency, use the configuration above rather than introducing a second logging library.

FAQ

Can I use console.log for debug logging in an MCP server?

Only if your server uses HTTP transport (SSE or Streamable HTTP). In stdio-mode MCP servers, console.log writes to stdout, which is the JSON-RPC wire — any output corrupts the protocol stream and causes the client to fail with a parse error. Use console.error instead, which writes to stderr. Better yet, use pino or winston configured to write to stderr, so the same code works correctly in both transport modes without changes.

How do I view MCP server logs in Claude Desktop?

Claude Desktop captures stderr from MCP server processes and writes it to a log file. On macOS, the log file is at ~/Library/Logs/Claude/mcp-server-<name>.log. On Windows, it's at %APPDATA%\Claude\logs\mcp-server-<name>.log. Tail this file while developing: tail -f ~/Library/Logs/Claude/mcp-server-myserver.log. You can also open the Claude Desktop Developer Tools (Help → Developer Tools on macOS) to see live MCP protocol logs and server stderr output.

Should I log tool arguments in production?

Log argument shape (field names, types, truncated lengths) but not full argument values in production. Tool arguments from an AI model often contain user-provided text that may include PII — emails, names, document content. Log enough to diagnose a problem (which tool was called, whether optional parameters were set, string length) without logging the actual values. For temporary deep debugging, enable LOG_LEVEL=trace for a short window with appropriate access controls.

How do I correlate logs across multiple tool calls from the same agent workflow?

Use the meta.progressToken from the MCP request's extra parameter as a correlation ID — Claude and other clients set this when making tool calls in a workflow context. If the client doesn't set a progress token, generate a UUID per session using crypto.randomUUID() and thread it through tool calls via AsyncLocalStorage. Store the correlation ID in a pino child logger or winston's defaultMeta so it appears on every log line without explicit passing.

How do I filter AliveMCP probe traffic out of my tool call analytics?

The AliveMCP probe sends a tools/list request followed by a call to the alivemcp_sentinel tool (registered automatically when you use the AliveMCP SDK). In your log aggregator or analytics dashboard, filter out entries where tool == "alivemcp_sentinel". Alternatively, tag them as synthetic traffic: in your tool handler, check if args.source === 'alivemcp' and add a synthetic: true field to the log entry. Keep the probe logs — don't discard them — because their duration metrics reveal handler initialization latency.