Guide · Resilience

MCP server bulkhead pattern

The bulkhead pattern takes its name from the watertight compartments in a ship's hull: if one compartment floods, the others stay dry. Applied to MCP servers, a bulkhead ensures that a slow or failing external dependency can exhaust only the capacity allocated to it — leaving the rest of the server's tool calls unaffected. Without bulkheads, a single dependency that responds slowly can hold all concurrent sessions waiting for connections, starving tools that have nothing to do with that dependency.

TL;DR

Isolate each external dependency with its own connection pool or concurrency semaphore. A pool of 10 database connections means a database slowdown can block at most 10 concurrent tool calls waiting for a connection — not the entire server. Pair bulkheads with circuit breakers: the breaker cuts off a failing dependency; the bulkhead limits the blast radius while the breaker is deciding whether to open. Implement the Deps pattern so each dependency's pool is created once in createDeps() and injected — never created inside tool handlers.

The failure cascade without bulkheads

Consider an MCP server with three tools: search (calls an external search API), notify (calls a notification API), and query_db (reads from the database). Without bulkheads:

The search API becomes slow — each call takes 15 seconds instead of 200ms.
50 concurrent sessions call search. Each holds a Node.js async context waiting for the HTTP response.
The event loop is not blocked (Node.js handles I/O asynchronously), but memory grows: 50 × (params + headers + response buffer) accumulates.
If the MCP server uses a shared HTTP connection pool (e.g., an axios instance with maxSockets: 50), all 50 sockets are busy with search calls. When a notify call needs a socket, it queues behind the search calls — and takes 15 seconds to get one.
The database is healthy. query_db calls succeed immediately — but users perceive the server as slow because every other tool is blocked.

With bulkheads: the search API gets its own pool of 10 connections. A search slowdown can block at most 10 concurrent calls. The notification API and database have their own pools — their capacity is fully available regardless of search API state.

Per-dependency connection pools

Node.js HTTP clients allow per-instance connection pooling. Using a shared global http.Agent (or no agent, which defaults to a global) means all dependencies share one pool. Create a separate agent per external dependency:

// deps.ts — per-dependency HTTP agents as bulkheads
import https from 'https';
import http from 'http';

interface Deps {
  searchAgent: https.Agent;
  notificationAgent: https.Agent;
  db: Pool;
  cache: Redis;
  config: Config;
}

export async function createDeps(): Promise<Deps> {
  const config = parseConfig();

  // Each dependency gets its own agent — pool exhaustion is isolated
  const searchAgent = new https.Agent({
    maxSockets: 10,        // max 10 concurrent connections to search API
    maxFreeSockets: 2,     // keep 2 idle connections warm
    timeout: 6000,         // socket timeout
    keepAlive: true,
  });

  const notificationAgent = new https.Agent({
    maxSockets: 5,         // notification API has lower concurrency limit
    maxFreeSockets: 1,
    timeout: 5000,
    keepAlive: true,
  });

  // Database pool is its own bulkhead
  const db = new Pool({
    connectionString: config.DATABASE_URL,
    max: 20,               // max 20 concurrent database connections
    idleTimeoutMillis: 30_000,
    connectionTimeoutMillis: 5_000,
  });

  await db.query('SELECT 1'); // fail-fast if database is unreachable

  return { searchAgent, notificationAgent, db, cache: await createRedis(config), config };
}

Pass the agent to fetch or got calls when making requests to each dependency. With fetch in Node.js 18+, pass the agent via the dispatcher option using undici:

// In tool handler — use the per-dependency agent
import { fetch, Agent } from 'undici';

async function callSearchApi(query: string, agent: Agent): Promise<SearchResult[]> {
  const res = await fetch(`https://search.internal/v2/search?q=${encodeURIComponent(query)}`, {
    dispatcher: agent,
    signal: AbortSignal.timeout(5000),
  });
  if (!res.ok) throw new Error(`Search API ${res.status}`);
  return res.json() as Promise<SearchResult[]>;
}

Semaphore-based bulkheads

Not all dependencies use HTTP connection pooling. For in-process concurrency limits — limiting how many tool calls can execute a CPU-intensive operation simultaneously — a semaphore is the correct tool. The Semaphore class limits concurrent access to a resource regardless of the underlying transport:

// semaphore.ts — concurrency bulkhead
export class Bulkhead {
  private running = 0;
  private queue: Array<() => void> = [];

  constructor(
    private readonly maxConcurrent: number,
    private readonly maxQueue: number = 50,
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.running >= this.maxConcurrent) {
      if (this.queue.length >= this.maxQueue) {
        throw new Error(
          `bulkhead full — ${this.maxConcurrent} concurrent + ${this.maxQueue} queued`
        );
      }
      await new Promise<void>(resolve => this.queue.push(resolve));
    }
    this.running++;
    try {
      return await fn();
    } finally {
      this.running--;
      const next = this.queue.shift();
      if (next) next();
    }
  }

  get stats() {
    return { running: this.running, queued: this.queue.length };
  }
}

// deps.ts — bulkheads per dependency
interface Deps {
  // ...
  searchBulkhead: Bulkhead;
  llmBulkhead: Bulkhead;
}

export async function createDeps(): Promise<Deps> {
  // ...
  return {
    // ...
    searchBulkhead: new Bulkhead(10, 20),  // 10 concurrent search calls, 20 queued
    llmBulkhead: new Bulkhead(5, 10),      // LLM is expensive — lower limit
  };
}

// In tool handler:
server.tool('search', searchSchema, async (params) => {
  return deps.searchBulkhead.execute(() => callSearchApi(params.query, deps.searchAgent));
});

When the bulkhead is full (running + queued at limit), throw immediately rather than queueing indefinitely. Return isError: true from the tool with a clear message so the LLM can retry or inform the user, rather than holding the session in a queue of unknown length.

Bulkheads and circuit breakers together

Bulkheads and circuit breakers are complementary, not alternatives:

Concern	Bulkhead	Circuit breaker
Limits impact of slowness	Yes — caps concurrent calls waiting for a slow dependency	No — breaker only acts on error rate, not latency until timeout
Stops calls to a broken dependency	No — calls still go through, up to the limit	Yes — breaker opens after error threshold, fails fast immediately
Self-heals after dependency recovers	Yes — automatically (capacity frees as calls return)	Yes — via HALF_OPEN probe
Works per-dependency	Yes	Yes

The typical composition: wrap the bulkhead inside the circuit breaker. The breaker decides whether to call the dependency at all; the bulkhead limits how many callers can be in-flight simultaneously when the breaker is closed:

// Breaker wraps the bulkhead-limited function
const searchWithBulkhead = (query: string) =>
  deps.searchBulkhead.execute(() => callSearchApi(query, deps.searchAgent));

const searchBreaker = new CircuitBreaker(searchWithBulkhead, {
  errorThresholdPercentage: 50,
  timeout: 5000,
  resetTimeout: 30000,
  volumeThreshold: 5,
});
searchBreaker.fallback(() => ({ isError: true, reason: 'search circuit open' }));

Per-tenant bulkheads in multi-tenant servers

In a multi-tenant MCP server, one tenant's high-volume usage should not degrade service for others. A global bulkhead limits total concurrent calls but does not prevent a single tenant from consuming the entire budget. Per-tenant bulkheads enforce fair-share allocation:

// tenant-bulkhead.ts — per-tenant concurrency limits
export class TenantBulkheadRegistry {
  private readonly bulkheads = new Map<string, Bulkhead>();

  constructor(
    private readonly maxPerTenant: number,
    private readonly maxQueuePerTenant: number,
  ) {}

  get(tenantId: string): Bulkhead {
    let bh = this.bulkheads.get(tenantId);
    if (!bh) {
      bh = new Bulkhead(this.maxPerTenant, this.maxQueuePerTenant);
      this.bulkheads.set(tenantId, bh);
    }
    return bh;
  }

  // Prune idle tenants periodically to avoid unbounded map growth
  pruneIdle(): void {
    for (const [id, bh] of this.bulkheads.entries()) {
      if (bh.stats.running === 0 && bh.stats.queued === 0) {
        this.bulkheads.delete(id);
      }
    }
  }
}

// Usage in tool handler:
const tenantBulkhead = deps.searchTenantBulkheads.get(sessionContext.tenantId);
return tenantBulkhead.execute(() => callSearchApi(params.query, deps.searchAgent));

Exposing bulkhead stats in the health tool

Bulkhead stats reveal pressure on each dependency in real time. Include them in the health_check MCP tool alongside circuit-breaker states:

server.tool('health_check', {}, async () => {
  const [dbResult, cacheResult] = await Promise.allSettled([
    deps.db.query('SELECT 1'),
    deps.cache.ping(),
  ]);

  return {
    content: [{
      type: 'text',
      text: JSON.stringify({
        db: dbResult.status === 'fulfilled' ? 'ok' : 'error',
        cache: cacheResult.status === 'fulfilled' ? 'ok' : 'error',
        bulkheads: {
          search: deps.searchBulkhead.stats,
          llm: deps.llmBulkhead.stats,
        },
        circuit_breakers: {
          search: {
            opened: deps.searchBreaker.opened,
            halfOpen: deps.searchBreaker.halfOpen,
          },
        },
      }),
    }],
  };
});

AliveMCP can call this health_check tool as a synthetic probe — detecting not just that the server is reachable, but that its internal bulkheads are not saturated. A bulkhead that's permanently full (running at max, queue at max) is a leading indicator of a dependency problem before circuit-breaker error rates catch up. See the MCP Server Resilience and Configurability Guide for how bulkheads, circuit breakers, retry logic, and configuration work together as a complete resilience stack.

Bulkheads and the Deps pattern

Bulkheads only work if the thing they limit is created once at startup and reused — not recreated per-session or per-call. If you instantiate a new HTTP agent inside each tool handler, every call gets a new pool with new limits, and the limits are never enforced across concurrent calls.

The Deps pattern is what makes bulkheads correct: create one Bulkhead per dependency in createDeps(), store it in the Deps object, and inject it into every tool handler. The same principle applies to connection pools, circuit breakers, and semaphores — all of them must be module-scope singletons created once, not per-request instances.