Guide · Security
MCP server DDoS protection
An MCP server differs from a standard REST API in ways that matter for abuse defense: it maintains persistent connections (SSE or WebSocket), processes long-running tool calls, and may be called by automated LLM agents that run at machine speed without the natural throttle of a human user. A connection flood, a 100MB tool argument payload, or a runaway agent loop that triggers recursive tool calls can all take down an unprotected server within seconds.
TL;DR
Apply defense in layers: reverse proxy (Caddy/nginx) enforces connection-rate limits and maximum body size before the request reaches Node.js; the MCP server enforces request size caps, tool call depth limits, and global concurrency limits; a CDN (Cloudflare) absorbs volumetric attacks before they reach your origin. Rate limiting is the last line of defense, not the first.
The attack surface of an MCP server
Understanding the attack surface helps prioritize defenses. MCP-specific risks are different from a standard stateless HTTP API.
| Attack vector | MCP-specific risk | Defense layer |
|---|---|---|
| Connection flood | Each SSE connection holds a file descriptor and memory. 1000 connections from a single IP can exhaust file descriptors. | Reverse proxy (connection limit per IP) |
| Large payload attack | Tool arguments are JSON — a 100MB inputSchema-valid string floods the Node.js heap and causes OOM. | Reverse proxy (body size limit) + Node.js (argument validation) |
| Slow-write attack | A client that writes 1 byte/second to a POST endpoint holds the connection open indefinitely, exhausting connection pool. | Reverse proxy (read/write timeout) |
| Recursive tool calls | An LLM in a loop calls Tool A which emits a prompt instructing the LLM to call Tool A again. Prompt injection can create infinite tool call chains. | MCP server (tool call depth limit per session) |
| Tool amplification | One tool call triggers N fan-out calls to external APIs (e.g., bulk-query tool that makes 1000 upstream requests). 10 tool calls = 10,000 upstream requests. | MCP server (per-tool rate limit + upstream concurrency cap) |
| Volumetric HTTP flood | Same as standard API — many requests per second from many IPs. | CDN WAF (Cloudflare challenge/block rule) |
Reverse proxy: Caddy rate limiting and size limits
Caddy (the factory VPS default) can enforce connection-rate limits and body size before the request reaches Node.js. Use the caddy-ratelimit plugin or the built-in request_body directive.
# Caddyfile — MCP endpoint with DDoS mitigations
your-mcp-server.com {
# Block requests with bodies over 1MB (prevents large payload attacks)
request_body {
max_size 1MB
}
# Limit new connections per IP: 10 connections per second burst, 2/s sustained
# Requires caddy-ratelimit plugin (xcaddy build --with github.com/mholt/caddy-ratelimit)
rate_limit {
zone mcp_connections {
key {remote_host}
events 10
window 1s
}
}
# Timeout slow-write attacks: 10s to read the full request body
@mcp_endpoint path /mcp /mcp/*
handle @mcp_endpoint {
reverse_proxy localhost:3000 {
transport http {
read_buffer 4096
write_buffer 4096
# 30s timeout for tool calls (long-running tools need headroom)
response_header_timeout 30s
}
}
}
}
If you're using nginx instead of Caddy, apply the same constraints via limit_conn, limit_req, and client_max_body_size.
Node.js server: argument size validation and depth limits
Even with reverse proxy size limits, validate tool argument sizes inside the MCP server. The reverse proxy limit is a coarse guard; per-tool validation catches oversized arguments that fit within the global body limit but are still unreasonable for a specific tool.
// src/middleware/abuse-guard.ts
import { z } from 'zod';
// Rough byte-size estimate for a JS object
function estimateBytes(value: unknown): number {
return JSON.stringify(value)?.length ?? 0;
}
const MAX_ARGUMENT_BYTES = 64 * 1024; // 64KB per tool call
const MAX_STRING_LENGTH = 10_000; // 10k chars per string argument
// Recursively check argument size
function validateArgSize(args: Record<string, unknown>, maxBytes = MAX_ARGUMENT_BYTES): void {
const size = estimateBytes(args);
if (size > maxBytes) {
throw new Error(`argument_too_large: ${size} bytes exceeds ${maxBytes} byte limit`);
}
// Check individual string fields
for (const [key, val] of Object.entries(args)) {
if (typeof val === 'string' && val.length > MAX_STRING_LENGTH) {
throw new Error(`argument_string_too_long: field '${key}' is ${val.length} chars, max ${MAX_STRING_LENGTH}`);
}
}
}
// Per-session tool call depth counter
const sessionDepth = new Map<string, number>();
const MAX_DEPTH = 10; // max tool calls in a single session before requiring a pause
export function checkDepth(sessionId: string): void {
const current = sessionDepth.get(sessionId) ?? 0;
if (current >= MAX_DEPTH) {
throw new Error(`depth_limit_exceeded: ${current} tool calls in this session. Start a new session or wait.`);
}
sessionDepth.set(sessionId, current + 1);
}
export function decrementDepth(sessionId: string): void {
const current = sessionDepth.get(sessionId) ?? 0;
if (current > 0) sessionDepth.set(sessionId, current - 1);
}
// Wire into the tool handler
server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
const sessionId = (extra as any)?._meta?.sessionId ?? 'unknown';
const args = (request.params.arguments ?? {}) as Record<string, unknown>;
try {
validateArgSize(args);
checkDepth(sessionId);
} catch (err) {
return {
content: [{ type: 'text', text: JSON.stringify({ error: 'request_rejected', message: String(err) }) }],
isError: true,
};
}
try {
// ... normal dispatch
return await dispatchTool(request.params.name, args);
} finally {
decrementDepth(sessionId);
}
});
Global concurrency cap
A global concurrency cap is orthogonal to rate limiting: it limits how many tool calls are running simultaneously rather than how many have been called per unit time. This is the last line of defense against a flood of concurrent requests that each individually stay within rate limits.
// src/middleware/concurrency-guard.ts
export class ConcurrencyGuard {
private active = 0;
private readonly max: number;
constructor(max = 50) { this.max = max; }
async acquire(): Promise<() => void> {
if (this.active >= this.max) {
throw new Error(`server_overloaded: ${this.active}/${this.max} concurrent tool calls active`);
}
this.active++;
return () => { this.active--; };
}
stats() { return { active: this.active, max: this.max, utilization: this.active / this.max }; }
}
const concurrencyGuard = new ConcurrencyGuard(50);
server.setRequestHandler(CallToolRequestSchema, async (request) => {
let release: (() => void) | undefined;
try {
release = await concurrencyGuard.acquire();
return await dispatchTool(request.params.name, request.params.arguments);
} catch (err) {
if (String(err).startsWith('server_overloaded')) {
return {
content: [{ type: 'text', text: JSON.stringify({ error: 'server_overloaded', retryable: true, retry_after_ms: 2000 }) }],
isError: true,
};
}
throw err;
} finally {
release?.();
}
});
Cloudflare WAF rules for MCP endpoints
If your MCP server is behind Cloudflare, custom WAF rules let you challenge or block traffic patterns that look like abuse before the request reaches your origin. These rules complement — not replace — the server-level defenses above.
# Cloudflare WAF custom rule examples (defined in Cloudflare dashboard or Terraform)
# Rule 1: Challenge IPs making more than 60 requests/min to the MCP endpoint
# Field: http.request.uri.path starts_with "/mcp"
# AND: rate limit (60 requests per 60 seconds per IP)
# Action: JS Challenge
# Rule 2: Block requests with oversized Content-Length to MCP endpoint
# Field: http.request.uri.path starts_with "/mcp"
# AND: http.request.headers["content-length"] > "1048576" (1MB)
# Action: Block
# Rule 3: Allow known MCP registry crawlers (AliveMCP, Glama, etc.)
# Field: http.user_agent contains "AliveMCP-Probe"
# Action: Allow (bypass rate limit rules)
# Rule 4: Challenge automated headless browsers that don't present TLS fingerprints
# consistent with real MCP SDK clients
# Field: cf.bot_score > 30
# AND: http.request.uri.path starts_with "/mcp"
# Action: Managed Challenge
Cloudflare's managed challenge is gentler than a block — legitimate MCP SDK clients running in Node.js will pass the JS challenge, while bot traffic typically fails. Use managed challenge for ambiguous traffic and hard block only for known attack signatures.
Detecting when defenses are being triggered
Defenses that fire silently are invisible to operators. Log every defense trigger with structured events so you can monitor attack patterns and tune your limits without over-blocking legitimate traffic.
// Structured log events for each defense layer
const defenseEvents = {
connectionRateLimited: (ip: string) =>
console.log(JSON.stringify({ event: 'defense_triggered', type: 'connection_rate', ip, ts: new Date().toISOString() })),
argumentTooLarge: (tool: string, bytes: number) =>
console.log(JSON.stringify({ event: 'defense_triggered', type: 'arg_size', tool, bytes, ts: new Date().toISOString() })),
depthLimitExceeded: (sessionId: string, depth: number) =>
console.log(JSON.stringify({ event: 'defense_triggered', type: 'depth_limit', sessionId, depth, ts: new Date().toISOString() })),
serverOverloaded: (active: number, max: number) =>
console.log(JSON.stringify({ event: 'defense_triggered', type: 'concurrency_cap', active, max, ts: new Date().toISOString() })),
};
AliveMCP external probes will detect when defense triggers are causing the server to return consistent errors before your internal monitoring fires, giving you an early warning signal that something is wrong at the protocol level.
Related questions
Is a tool call depth limit effective against prompt injection?
A depth limit is a blunt defense against prompt injection that causes recursive tool calls. It bounds the damage: even if an attacker injects a prompt that causes an LLM to call a tool 100 times, the depth limit stops it at 10. It does not prevent the attack — it limits the blast radius. Combine it with input sanitization (strip or escape suspicious patterns in tool arguments), output sanitization (don't echo tool outputs directly to the LLM without filtering), and use a separate trust boundary between content fetched from external sources and LLM-visible text.
What body size limit should I set?
Start at 1MB for the global request body limit. Most legitimate MCP tool calls are much smaller — a tool call with a document query or file path is typically under 10KB. The 1MB limit stops naive large payload attacks while allowing edge cases like embedding a small base64-encoded image in a tool argument. Add per-tool argument validation for any tool that is expected to receive smaller inputs (e.g., cap a query string at 2KB even though the global limit is 1MB).
Should the concurrency limit be per-server or per-session?
Both. A global cap (50 concurrent calls) prevents the server from being overwhelmed regardless of how many sessions are active. A per-session cap (e.g., 5 concurrent calls per session) prevents a single session from holding all 50 slots simultaneously. Implement the global cap first — it is simpler and covers the most dangerous case. Add per-session caps if you see a specific pattern where one session monopolizes the server.
How do MCP-specific attacks differ from standard REST attacks?
The key differences: (1) Persistent connections — an SSE or WebSocket connection holds state and resources for the session lifetime, so connection flood attacks are more effective than against stateless REST endpoints. (2) Long-running operations — tool calls can run for 30 seconds or more; a flood of long tool calls exhausts concurrency faster than a flood of short REST requests. (3) Automated callers — LLM agents are faster than human users and will naturally call tools in tight loops without the natural think-time pauses that slow down interactive REST APIs. Rate limits need to be calibrated for machine-speed callers, not human-speed users.