Guide · Architecture

MCP server middleware

An MCP server built on Express needs a middleware stack between the raw HTTP request and the MCP session layer. That stack handles cross-cutting concerns — correlation IDs, structured logging, authentication, rate limiting — that belong outside the MCP protocol itself. Getting middleware ordering right matters more in MCP servers than in conventional REST APIs because a single TCP connection carries an entire session: an initialize handshake followed by many tool calls. Middleware that rejects too late (after transport.handleRequest starts) leaves the session in a half-open state; middleware that runs too early (before request parsing) can't read the headers it needs.

TL;DR

Apply middleware in this order on the /mcp route: (1) correlation ID injection via AsyncLocalStorage, (2) structured request logger, (3) authentication guard that returns HTTP 401 before transport.handleRequest, (4) rate limiter that returns HTTP 429 before transport.handleRequest, (5) the MCP transport handler. Use AsyncLocalStorage to propagate session_id and request_id through tool handlers without threading context manually. AliveMCP's probe hits your /mcp endpoint — middleware that incorrectly rejects the probe's unauthenticated initialize request will show up as downtime, so structure auth middleware to allow probe credentials or the specific public-initialization pattern your server supports.

Correlation ID middleware with AsyncLocalStorage

MCP sessions are long-lived — one session spans many tool calls, and each tool call may spawn database queries, outbound HTTP calls, and log lines. Without a correlation ID threaded through every operation, debugging a production incident means reconstructing a timeline from unrelated log entries. AsyncLocalStorage from node:async_hooks lets you store a session_id once at session creation and read it anywhere downstream without passing it as a function argument:

// context.ts — shared AsyncLocalStorage store
import { AsyncLocalStorage } from 'node:async_hooks';
import { randomUUID } from 'node:crypto';

export interface RequestContext {
  requestId: string;
  sessionId: string | null;
}

export const contextStore = new AsyncLocalStorage<RequestContext>();

// context-middleware.ts
import { Request, Response, NextFunction } from 'express';
import { contextStore, RequestContext } from './context.js';

export function correlationMiddleware(req: Request, res: Response, next: NextFunction) {
  const requestId = (req.headers['x-request-id'] as string) ?? randomUUID();
  const sessionId = (req.headers['mcp-session-id'] as string) ?? null;

  const ctx: RequestContext = { requestId, sessionId };
  res.setHeader('x-request-id', requestId);

  contextStore.run(ctx, next);
}

Register the middleware first on the /mcp route. Tool handlers and any modules they call can then read contextStore.getStore() to attach request_id and session_id to every log line or outbound request header — without receiving them as parameters:

// logger.ts — reads context automatically
import { contextStore } from './context.js';

export function log(event: string, fields: Record<string, unknown> = {}) {
  const ctx = contextStore.getStore();
  process.stdout.write(JSON.stringify({
    ts: new Date().toISOString(),
    event,
    request_id: ctx?.requestId,
    session_id: ctx?.sessionId,
    ...fields,
  }) + '\n');
}

This pattern means every log line emitted during a tool call — including lines from helper modules that know nothing about MCP — carries the same correlation IDs. When AliveMCP alerts you to a latency spike, you can filter your log stream by session_id and see exactly which tool call caused it.

Structured request logging middleware

Log each incoming request to /mcp as a single JSON line with method, path, session ID, status code, and duration. Do not use a generic HTTP logger like Morgan — it emits unstructured text and runs too late (after res.end) to propagate the correlation context you injected earlier:

// request-log-middleware.ts
import { Request, Response, NextFunction } from 'express';
import { log } from './logger.js';

export function requestLogMiddleware(req: Request, res: Response, next: NextFunction) {
  const start = Date.now();

  res.on('finish', () => {
    log('http_request', {
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration_ms: Date.now() - start,
      content_length: res.getHeader('content-length'),
    });
  });

  next();
}

For SSE responses (the streaming part of StreamableHTTP), res.finish fires when the connection closes — potentially minutes after the session started. This is expected. The duration_ms on SSE responses measures session lifetime, which is a useful separate metric from individual tool-call latency. Split your alerting: alert on http_request entries for the initial initialize handshake (these should complete in milliseconds), and separately on the per-tool structured logs from inside your tool handlers.

Authentication and rate limiting middleware ordering

Authentication middleware must run before transport.handleRequest. A 401 returned after the MCP transport has started processing means the transport has already allocated session state — the session is half-open. Reject unauthenticated requests at the HTTP layer, before the transport sees them:

// server.ts — correct middleware ordering for /mcp
import express from 'express';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { correlationMiddleware } from './context-middleware.js';
import { requestLogMiddleware } from './request-log-middleware.js';
import { authMiddleware } from './auth-middleware.js';
import { rateLimitMiddleware } from './rate-limit-middleware.js';

const app = express();
app.use(express.json());

// /mcp route with ordered middleware stack
app.post('/mcp',
  correlationMiddleware,     // 1. inject correlation IDs first
  requestLogMiddleware,      // 2. start timing for the finish event
  authMiddleware,            // 3. 401 before transport sees the request
  rateLimitMiddleware,       // 4. 429 before transport sees the request
  async (req, res) => {      // 5. MCP transport handler
    const server = new McpServer({ name: 'my-server', version: '1.0.0' });
    // register tools here ...
    const transport = new StreamableHTTPServerTransport({ sessionIdHeader: 'mcp-session-id' });
    await server.connect(transport);
    await transport.handleRequest(req, res);
  }
);

// GET /mcp for SSE (StreamableHTTP session resumption)
app.get('/mcp', correlationMiddleware, requestLogMiddleware, authMiddleware, async (req, res) => {
  // handle SSE session continuation
});

One edge case: the AliveMCP probe sends an initialize request. If your auth middleware rejects all requests without a credential, configure a dedicated read-only probe API key and include it in the probe's Authorization header. Alternatively, allow unauthenticated initialize requests (which expose only the server's name and capabilities, not any tool data) and gate tool execution inside the tool handlers instead — the tradeoff is a slightly larger unauthenticated attack surface vs. easier monitoring. See the authentication guide for both patterns.

Middleware reuse across routes

A production MCP server usually has more than one route: /mcp for the MCP transport, /healthz for Kubernetes readiness probes, /metrics for Prometheus scraping, /webhook for inbound events. Avoid applying the full middleware stack to all routes — /healthz does not need auth or rate limiting, and applying them risks a failed health check pulling the pod from rotation during a credential rotation. Register middleware per route:

// Health check — no auth, no correlation overhead
app.get('/healthz', (req, res) => {
  res.json({ status: isShuttingDown ? 'shutting_down' : 'ok' });
});

// Metrics endpoint — lightweight, IP-allowlist only
app.get('/metrics', ipAllowlistMiddleware, (req, res) => {
  res.set('Content-Type', 'text/plain');
  res.send(collectMetrics());
});

// Webhook endpoint — auth via HMAC, not Bearer token
app.post('/webhook', express.raw({ type: 'application/json' }), webhookSignatureMiddleware, webhookHandler);

// MCP transport — full stack
app.post('/mcp', correlationMiddleware, requestLogMiddleware, authMiddleware, rateLimitMiddleware, mcpHandler);
app.get('/mcp', correlationMiddleware, requestLogMiddleware, authMiddleware, mcpSseHandler);

This explicit per-route registration also makes the middleware stack auditable: a security reviewer reading server.ts can see exactly which middleware applies to the public-facing MCP endpoint without tracing app.use call order through multiple files.

Related questions

Does middleware run once per session or once per request?

Middleware runs once per HTTP request. In StreamableHTTP, the initialize handshake is one POST request, and each subsequent tool call is a separate POST request. This means auth middleware re-validates credentials on every request — which is correct behaviour, not overhead. If token validation is expensive (e.g., remote JWKS fetch), cache the JWKS at module scope with automatic rotation using createRemoteJWKSet from the jose library, not by caching the validated token itself.

Can I use Hono or Fastify instead of Express for the middleware layer?

Yes. The @modelcontextprotocol/sdk StreamableHTTPServerTransport accepts any IncomingMessage-compatible request and ServerResponse-compatible response, so it works with raw Node.js http modules. Hono and Fastify both have adapters that expose the underlying Node.js request/response objects. The middleware patterns in this guide (AsyncLocalStorage, per-route registration, ordering rules) apply regardless of framework. Hono's built-in middleware (bearer-auth, rate-limiter, logger) can replace the custom implementations shown above.

How do I propagate the correlation ID to outbound fetch calls inside tool handlers?

Read the request_id from contextStore.getStore() inside the tool handler and include it in the X-Request-Id header of outbound requests. If every service in your infrastructure propagates this header, a single request_id traces a tool call end-to-end across multiple services. This is the basis of distributed tracing — without a dedicated tracing library, this manual header propagation gives you the most valuable 80% of the observability for near-zero complexity.

Should error-handling middleware come before or after the MCP transport handler?

Express error-handling middleware (functions with four parameters: err, req, res, next) must be registered after all route handlers to catch unhandled errors. However, errors thrown inside transport.handleRequest are caught by the SDK and returned as JSON-RPC error objects — they do not propagate to Express error middleware. Register Express error middleware as a fallback for errors from your middleware (e.g., a database failure in auth middleware that you did not catch). The SDK's internal error handling and your Express error middleware operate in different layers.