Guide · Deployment
MCP server zero-downtime deployment
Deploying an MCP server without downtime is harder than deploying a stateless REST API. SSE transport creates long-lived connections: a client connects once, the MCP initialize handshake establishes session state on the server, and subsequent tool calls flow through that persistent connection. When you deploy a new version and the old process receives SIGTERM, it must not immediately exit — doing so terminates every active session simultaneously. The client sees a connection drop, must reconnect to the new server, and must restart the session from scratch. This guide covers the two zero-downtime strategies for MCP servers (rolling update with session drain, and blue-green with traffic switch) and the implementation details for each: drain handler, Kubernetes configuration, PM2 reload sequence, and post-deploy verification.
TL;DR
Implement a SIGTERM handler that (1) marks the server as draining (health check returns 503 to stop new sessions arriving), (2) waits for active SSE sessions to close naturally, (3) calls process.exit(0) after a configurable drain timeout. For Kubernetes rolling updates, set maxUnavailable: 0, maxSurge: 1, and terminationGracePeriodSeconds above your drain timeout. For blue-green, bring the new environment to healthy before switching the load balancer upstream — zero sessions are interrupted because no pod with active sessions is stopped until after the traffic switch.
Why REST deployments are easier than MCP deployments
A REST API processes each HTTP request independently. A request that arrives at the old server before SIGTERM completes normally. Requests that arrive after the load balancer switches upstream go to the new server. There is no per-session state to preserve — each request carries its own authentication and context.
An MCP server using HTTP/SSE transport is different in three ways:
| Property | REST API | MCP server (HTTP/SSE) |
|---|---|---|
| Connection lifetime | One request, one response | One SSE connection per session (minutes to hours) |
| Session state | Stateless per request | Session state bound to a specific server process |
| Interrupted request cost | Client retries one request | Client must re-initialize entire session |
| Deploy impact | Sub-second gap for in-flight requests | All sessions on old server are disrupted simultaneously |
The gap between "deploy starts" and "old server exits" is the window during which active sessions are at risk. The two strategies below close that window in different ways.
The session drain handler
Both rolling update and blue-green strategies require the old server to drain before exiting. The drain handler is the core primitive.
import { Server as HttpServer } from 'node:http';
// Track active SSE sessions — each session registers itself on connect and deregisters on close
const activeSessions = new Map<string, { close: () => void }>();
let serverState: 'starting' | 'ready' | 'draining' | 'stopped' = 'starting';
export function registerSession(id: string, closeSession: () => void): () => void {
activeSessions.set(id, { close: closeSession });
return () => activeSessions.delete(id);
}
// /health returns 503 while draining so the load balancer stops routing new sessions here
export function getHealthStatus() {
if (serverState === 'draining' || serverState === 'stopped') {
return { code: 503, body: { status: serverState, active_sessions: activeSessions.size } };
}
if (serverState !== 'ready') {
return { code: 503, body: { status: 'starting' } };
}
return { code: 200, body: { status: 'ok', active_sessions: activeSessions.size } };
}
async function drain(signal: string, httpServer: HttpServer) {
console.log({ signal, sessions: activeSessions.size }, 'Drain started');
serverState = 'draining';
// Stop accepting new TCP connections — load balancer health check now returns 503
// This gives the load balancer time to remove this instance from its rotation
// before active sessions are affected
httpServer.close();
const drainTimeoutMs = parseInt(process.env.DRAIN_TIMEOUT_MS ?? '25000', 10);
const pollIntervalMs = 500;
const deadline = Date.now() + drainTimeoutMs;
// Wait for sessions to close naturally (clients detect connection drop and reconnect elsewhere)
while (activeSessions.size > 0 && Date.now() < deadline) {
console.log({ remaining: activeSessions.size, ms_left: deadline - Date.now() }, 'Draining');
await new Promise(r => setTimeout(r, pollIntervalMs));
}
if (activeSessions.size > 0) {
console.warn({ remaining: activeSessions.size }, 'Drain timeout — force-closing remaining sessions');
for (const [id, session] of activeSessions) {
session.close();
}
}
serverState = 'stopped';
console.log('Drain complete — exiting');
process.exit(0);
}
// Register for both signals — SIGTERM from Docker/Kubernetes/systemd; SIGINT from PM2
let draining = false;
const onSignal = (signal: string) => (httpServer: HttpServer) => {
if (draining) return; // prevent double-drain if both signals fire
draining = true;
drain(signal, httpServer);
};
export function installDrainHandlers(httpServer: HttpServer) {
process.on('SIGTERM', () => onSignal('SIGTERM')(httpServer));
process.on('SIGINT', () => onSignal('SIGINT')(httpServer));
}
The critical detail is that httpServer.close() stops the server from accepting new connections immediately, but does not close existing SSE connections. The health check returns 503 after close() is called, which triggers the load balancer to remove this instance from its pool within one health check interval (typically 5–15 seconds). Active SSE sessions remain connected throughout this window. See MCP server graceful shutdown for the per-session close implementation.
Strategy 1: rolling update
Rolling update deploys the new version incrementally: bring up one new pod, wait for it to pass health checks, then terminate one old pod. Repeat until all pods are replaced. The key is that the old pod is not terminated until the new pod is ready and the load balancer has started routing new sessions to it.
# Kubernetes Deployment — rolling update strategy for MCP servers
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # Never reduce below 3 healthy pods during the update
maxSurge: 1 # Allow one extra pod (4 total) during the update window
template:
spec:
# Must be longer than DRAIN_TIMEOUT_MS + load balancer deregistration delay
terminationGracePeriodSeconds: 60
containers:
- name: mcp-server
image: your-registry/mcp-server:v2
env:
- name: DRAIN_TIMEOUT_MS
value: "25000"
# Readiness probe: returns 503 when draining — load balancer stops routing here
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
# Liveness probe: only fails if the process is deadlocked, not just draining
livenessProbe:
httpGet:
path: /healthz # separate endpoint that returns 200 while draining
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 6
lifecycle:
preStop:
# Add a small sleep before SIGTERM to give the load balancer time
# to see the pod as NotReady before connections are dropped.
# Without this, the LB may send new requests to a terminating pod.
exec:
command: ["/bin/sh", "-c", "sleep 5"]
The preStop sleep is important. Kubernetes sends SIGTERM and updates the endpoint list simultaneously — there is a race where the load balancer may route new traffic to a pod that has already received SIGTERM. The 5-second preStop sleep ensures the pod is removed from the endpoint list before SIGTERM triggers the drain. During this 5 seconds, the pod continues serving normally.
Strategy 2: blue-green deployment
Blue-green maintains two environments — blue (current live) and green (new version). The traffic switch happens at the load balancer level after green is fully healthy. No pod with active sessions is stopped until after the switch — all sessions migrate naturally as clients reconnect after the switch completes.
# Blue-green with nginx upstream swap
# Blue environment: pods labelled version=blue, port 3000
# Green environment: pods labelled version=green, port 3001
# Step 1: Deploy green environment and wait for health
kubectl apply -f deployment-green.yaml
kubectl rollout status deployment/mcp-server-green
# Step 2: Verify green passes protocol compliance test
curl -s http://green-internal:3001/health | jq .status # must be "ok"
# Step 3: Run MCP initialize smoke test against green
node scripts/mcp-smoke-test.js --endpoint http://green-internal:3001
# Step 4: Switch nginx upstream from blue to green
# Update the nginx config to point the upstream at green pods, then reload nginx
kubectl annotate configmap nginx-config \
upstream-target=green --overwrite
kubectl rollout restart deployment/nginx
# Step 5: Wait for nginx to pick up the new config
sleep 10
# Step 6: Verify AliveMCP sees the new version as healthy
# (AliveMCP will probe within 60s of the config change)
# Step 7: Drain and scale down blue (now receives no new traffic)
# Sessions on blue pods will close naturally; wait DRAIN_TIMEOUT_MS before deleting
sleep 30
kubectl scale deployment/mcp-server-blue --replicas=0
Blue-green has higher infrastructure cost during the transition (two full environments running simultaneously) but zero session disruption: sessions on blue pods continue undisturbed until they close naturally. New sessions after the traffic switch go to green. There is no overlap window where active sessions might be interrupted. The trade-off versus rolling update is cost (double infra for the transition window) vs. user impact (zero vs. possible session interruption during drain).
Post-deploy verification gate
Both strategies benefit from an automated verification step that runs after the deploy and rolls back if the new version fails. The verification must go beyond a HTTP 200 on /health — it should validate the MCP protocol itself.
// scripts/mcp-smoke-test.js
// Runs after deploy; exits non-zero if the new server fails protocol validation
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';
const endpoint = process.argv[2] ?? 'https://your-mcp-server.example.com';
async function verify() {
const transport = new SSEClientTransport(new URL(`${endpoint}/sse`));
const client = new Client({ name: 'smoke-test', version: '1.0.0' }, {});
try {
await client.connect(transport);
// Verify protocol version
const serverInfo = client.getServerVersion();
if (!serverInfo?.protocolVersion) {
throw new Error('Missing protocolVersion in server info');
}
// Verify tools are registered
const { tools } = await client.listTools();
if (tools.length === 0) {
throw new Error('Server returned zero tools — expected at least one');
}
// Load the pre-deploy tool schema snapshot and compare hashes
const snapshot = JSON.parse(fs.readFileSync('tools-snapshot.json', 'utf8'));
const currentHash = hashTools(tools);
if (currentHash !== snapshot.hash) {
// Schema changed — this is expected on intentional schema updates
// but should fail the gate if not expected, to catch accidental breakage
console.warn('Tool schema changed — review tools-snapshot.json diff');
}
console.log(`OK — protocol ${serverInfo.protocolVersion}, ${tools.length} tools`);
await client.close();
process.exit(0);
} catch (err) {
console.error('Smoke test FAILED:', err.message);
await client.close().catch(() => {});
process.exit(1);
}
}
function hashTools(tools: any[]) {
const sorted = tools.map(t => ({ name: t.name, description: t.description }))
.sort((a, b) => a.name.localeCompare(b.name));
return createHash('sha256').update(JSON.stringify(sorted)).digest('hex');
}
verify();
Run this smoke test in your CI/CD pipeline after every deploy. If it fails, trigger the rollback immediately — before AliveMCP detects the failure from outside. See MCP server CI/CD for the full GitHub Actions pipeline with automatic rollback on smoke test failure.
How AliveMCP observes deploy events
AliveMCP probes your MCP endpoint every 60 seconds. A rolling update that takes 3 minutes to complete will be sampled 3 times during the deploy. During a well-implemented rolling update:
- Probes during the update hit healthy pods (old or new) — no downtime recorded
- Probes on a pod in the drain window hit the new pod (old pod returns 503 to the health check and is removed from rotation before sessions drain)
- After the update, AliveMCP confirms the new version is healthy within one probe window (60 seconds)
A misconfigured rolling update — for example, maxUnavailable: 1 on a 2-replica deployment, which allows both pods to be unavailable simultaneously — shows as downtime in AliveMCP even if the deploy finishes successfully. AliveMCP's 90-day uptime history makes accidental deploy-caused downtime visible in retrospect. See MCP server uptime monitoring for the probe sequence details.
Related questions
What drain timeout should I use?
The drain timeout should be longer than your longest expected MCP session. Typical MCP sessions last 1–15 minutes (the duration of a single agent task). A drain timeout of 30 seconds is suitable for short-lived agent interactions. If your MCP server supports long-running autonomous agent sessions that may persist for hours, you have three options: (1) longer drain timeout (not recommended — keeps old pods running for hours), (2) signal clients to gracefully migrate their sessions before the deploy, (3) use blue-green so no sessions are interrupted. Most production MCP servers use 20–30 second drain timeouts, accepting that sessions longer than the drain window will be interrupted.
How do I prevent new sessions from arriving during the drain window?
The mechanism is the readiness probe. When the drain starts, the server returns 503 from its health endpoint. The load balancer polls the health endpoint on its own schedule (every 5–30 seconds). There is a gap between SIGTERM arrival and the load balancer detecting the 503. The preStop lifecycle hook (a 5–10 second sleep before SIGTERM) reduces this gap. In Kubernetes, the preStop hook runs before SIGTERM, giving the endpoint controller time to remove the pod from the service endpoint list before the drain begins. Without preStop, new connections may arrive during the drain window and be rejected.
Does blue-green double my infrastructure cost?
Only during the transition window (typically 5–30 minutes per deploy). After the traffic switch and drain, the old environment is scaled down. For a cloud deployment where you pay per-second, the incremental cost of a 30-minute blue-green window on a typical MCP server is less than $1. For a bare-metal VPS, blue-green requires provisioning two VPS instances, which may not be practical. In that case, use PM2 reload with a drain handler (pm2 reload + wait_ready: true) — it provides near-zero-downtime on a single server. See MCP server PM2 for the pm2 reload configuration.
Should I use Kubernetes rolling update or blue-green for MCP servers?
For most MCP servers with session lifetimes under 5 minutes: rolling update with maxUnavailable: 0 and a 30-second drain window. The session interruption risk is low (sessions that end naturally during the drain window are seamlessly replaced on the new pod), and rolling update is operationally simpler. For MCP servers with long-lived sessions (agent workflows that run for 30+ minutes) or strict SLAs: blue-green. Zero sessions are interrupted, the trade-off is briefly doubling infrastructure cost and the complexity of the traffic switch step.
Further reading
- MCP server graceful shutdown — per-session close implementation
- MCP server Kubernetes — full Deployment manifest with PodDisruptionBudget
- MCP server PM2 — pm2 reload with wait_ready for single-server zero-downtime
- MCP server CI/CD — automated deploy pipeline with rollback on smoke test failure
- MCP server Docker — container build with tini and SIGTERM handling
- MCP server health check — readiness vs. liveness probe design
- AliveMCP — uptime monitoring that detects deploy-caused downtime within 60 seconds