Guide · Security

MCP server DDoS protection

An MCP server differs from a standard REST API in ways that matter for abuse defense: it maintains persistent connections (SSE or WebSocket), processes long-running tool calls, and may be called by automated LLM agents that run at machine speed without the natural throttle of a human user. A connection flood, a 100MB tool argument payload, or a runaway agent loop that triggers recursive tool calls can all take down an unprotected server within seconds.

TL;DR

Apply defense in layers: reverse proxy (Caddy/nginx) enforces connection-rate limits and maximum body size before the request reaches Node.js; the MCP server enforces request size caps, tool call depth limits, and global concurrency limits; a CDN (Cloudflare) absorbs volumetric attacks before they reach your origin. Rate limiting is the last line of defense, not the first.

The attack surface of an MCP server

Understanding the attack surface helps prioritize defenses. MCP-specific risks are different from a standard stateless HTTP API.

Attack vector	MCP-specific risk	Defense layer
Connection flood	Each SSE connection holds a file descriptor and memory. 1000 connections from a single IP can exhaust file descriptors.	Reverse proxy (connection limit per IP)
Large payload attack	Tool arguments are JSON — a 100MB `inputSchema`-valid string floods the Node.js heap and causes OOM.	Reverse proxy (body size limit) + Node.js (argument validation)
Slow-write attack	A client that writes 1 byte/second to a POST endpoint holds the connection open indefinitely, exhausting connection pool.	Reverse proxy (read/write timeout)
Recursive tool calls	An LLM in a loop calls Tool A which emits a prompt instructing the LLM to call Tool A again. Prompt injection can create infinite tool call chains.	MCP server (tool call depth limit per session)
Tool amplification	One tool call triggers N fan-out calls to external APIs (e.g., bulk-query tool that makes 1000 upstream requests). 10 tool calls = 10,000 upstream requests.	MCP server (per-tool rate limit + upstream concurrency cap)
Volumetric HTTP flood	Same as standard API — many requests per second from many IPs.	CDN WAF (Cloudflare challenge/block rule)

Reverse proxy: Caddy rate limiting and size limits

Caddy (the factory VPS default) can enforce connection-rate limits and body size before the request reaches Node.js. Use the caddy-ratelimit plugin or the built-in request_body directive.

# Caddyfile — MCP endpoint with DDoS mitigations
your-mcp-server.com {
  # Block requests with bodies over 1MB (prevents large payload attacks)
  request_body {
    max_size 1MB
  }

  # Limit new connections per IP: 10 connections per second burst, 2/s sustained
  # Requires caddy-ratelimit plugin (xcaddy build --with github.com/mholt/caddy-ratelimit)
  rate_limit {
    zone mcp_connections {
      key {remote_host}
      events 10
      window 1s
    }
  }

  # Timeout slow-write attacks: 10s to read the full request body
  @mcp_endpoint path /mcp /mcp/*
  handle @mcp_endpoint {
    reverse_proxy localhost:3000 {
      transport http {
        read_buffer 4096
        write_buffer 4096
        # 30s timeout for tool calls (long-running tools need headroom)
        response_header_timeout 30s
      }
    }
  }
}

If you're using nginx instead of Caddy, apply the same constraints via limit_conn, limit_req, and client_max_body_size.

Node.js server: argument size validation and depth limits

Even with reverse proxy size limits, validate tool argument sizes inside the MCP server. The reverse proxy limit is a coarse guard; per-tool validation catches oversized arguments that fit within the global body limit but are still unreasonable for a specific tool.

// src/middleware/abuse-guard.ts
import { z } from 'zod';

// Rough byte-size estimate for a JS object
function estimateBytes(value: unknown): number {
  return JSON.stringify(value)?.length ?? 0;
}

const MAX_ARGUMENT_BYTES = 64 * 1024; // 64KB per tool call
const MAX_STRING_LENGTH = 10_000;     // 10k chars per string argument

// Recursively check argument size
function validateArgSize(args: Record<string, unknown>, maxBytes = MAX_ARGUMENT_BYTES): void {
  const size = estimateBytes(args);
  if (size > maxBytes) {
    throw new Error(`argument_too_large: ${size} bytes exceeds ${maxBytes} byte limit`);
  }
  // Check individual string fields
  for (const [key, val] of Object.entries(args)) {
    if (typeof val === 'string' && val.length > MAX_STRING_LENGTH) {
      throw new Error(`argument_string_too_long: field '${key}' is ${val.length} chars, max ${MAX_STRING_LENGTH}`);
    }
  }
}

// Per-session tool call depth counter
const sessionDepth = new Map<string, number>();
const MAX_DEPTH = 10; // max tool calls in a single session before requiring a pause

export function checkDepth(sessionId: string): void {
  const current = sessionDepth.get(sessionId) ?? 0;
  if (current >= MAX_DEPTH) {
    throw new Error(`depth_limit_exceeded: ${current} tool calls in this session. Start a new session or wait.`);
  }
  sessionDepth.set(sessionId, current + 1);
}

export function decrementDepth(sessionId: string): void {
  const current = sessionDepth.get(sessionId) ?? 0;
  if (current > 0) sessionDepth.set(sessionId, current - 1);
}

// Wire into the tool handler
server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
  const sessionId = (extra as any)?._meta?.sessionId ?? 'unknown';
  const args = (request.params.arguments ?? {}) as Record<string, unknown>;

  try {
    validateArgSize(args);
    checkDepth(sessionId);
  } catch (err) {
    return {
      content: [{ type: 'text', text: JSON.stringify({ error: 'request_rejected', message: String(err) }) }],
      isError: true,
    };
  }

  try {
    // ... normal dispatch
    return await dispatchTool(request.params.name, args);
  } finally {
    decrementDepth(sessionId);
  }
});

Global concurrency cap

A global concurrency cap is orthogonal to rate limiting: it limits how many tool calls are running simultaneously rather than how many have been called per unit time. This is the last line of defense against a flood of concurrent requests that each individually stay within rate limits.

// src/middleware/concurrency-guard.ts
export class ConcurrencyGuard {
  private active = 0;
  private readonly max: number;

  constructor(max = 50) { this.max = max; }

  async acquire(): Promise<() => void> {
    if (this.active >= this.max) {
      throw new Error(`server_overloaded: ${this.active}/${this.max} concurrent tool calls active`);
    }
    this.active++;
    return () => { this.active--; };
  }

  stats() { return { active: this.active, max: this.max, utilization: this.active / this.max }; }
}

const concurrencyGuard = new ConcurrencyGuard(50);

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  let release: (() => void) | undefined;
  try {
    release = await concurrencyGuard.acquire();
    return await dispatchTool(request.params.name, request.params.arguments);
  } catch (err) {
    if (String(err).startsWith('server_overloaded')) {
      return {
        content: [{ type: 'text', text: JSON.stringify({ error: 'server_overloaded', retryable: true, retry_after_ms: 2000 }) }],
        isError: true,
      };
    }
    throw err;
  } finally {
    release?.();
  }
});

Cloudflare WAF rules for MCP endpoints

If your MCP server is behind Cloudflare, custom WAF rules let you challenge or block traffic patterns that look like abuse before the request reaches your origin. These rules complement — not replace — the server-level defenses above.

# Cloudflare WAF custom rule examples (defined in Cloudflare dashboard or Terraform)

# Rule 1: Challenge IPs making more than 60 requests/min to the MCP endpoint
# Field: http.request.uri.path starts_with "/mcp"
# AND: rate limit (60 requests per 60 seconds per IP)
# Action: JS Challenge

# Rule 2: Block requests with oversized Content-Length to MCP endpoint
# Field: http.request.uri.path starts_with "/mcp"
# AND: http.request.headers["content-length"] > "1048576"  (1MB)
# Action: Block

# Rule 3: Allow known MCP registry crawlers (AliveMCP, Glama, etc.)
# Field: http.user_agent contains "AliveMCP-Probe"
# Action: Allow (bypass rate limit rules)

# Rule 4: Challenge automated headless browsers that don't present TLS fingerprints
# consistent with real MCP SDK clients
# Field: cf.bot_score > 30
# AND: http.request.uri.path starts_with "/mcp"
# Action: Managed Challenge

Cloudflare's managed challenge is gentler than a block — legitimate MCP SDK clients running in Node.js will pass the JS challenge, while bot traffic typically fails. Use managed challenge for ambiguous traffic and hard block only for known attack signatures.

Detecting when defenses are being triggered

Defenses that fire silently are invisible to operators. Log every defense trigger with structured events so you can monitor attack patterns and tune your limits without over-blocking legitimate traffic.

// Structured log events for each defense layer
const defenseEvents = {
  connectionRateLimited: (ip: string) =>
    console.log(JSON.stringify({ event: 'defense_triggered', type: 'connection_rate', ip, ts: new Date().toISOString() })),

  argumentTooLarge: (tool: string, bytes: number) =>
    console.log(JSON.stringify({ event: 'defense_triggered', type: 'arg_size', tool, bytes, ts: new Date().toISOString() })),

  depthLimitExceeded: (sessionId: string, depth: number) =>
    console.log(JSON.stringify({ event: 'defense_triggered', type: 'depth_limit', sessionId, depth, ts: new Date().toISOString() })),

  serverOverloaded: (active: number, max: number) =>
    console.log(JSON.stringify({ event: 'defense_triggered', type: 'concurrency_cap', active, max, ts: new Date().toISOString() })),
};

AliveMCP external probes will detect when defense triggers are causing the server to return consistent errors before your internal monitoring fires, giving you an early warning signal that something is wrong at the protocol level.