Guide · Deployment

MCP server zero-downtime deployment

Deploying an MCP server without downtime is harder than deploying a stateless REST API. SSE transport creates long-lived connections: a client connects once, the MCP initialize handshake establishes session state on the server, and subsequent tool calls flow through that persistent connection. When you deploy a new version and the old process receives SIGTERM, it must not immediately exit — doing so terminates every active session simultaneously. The client sees a connection drop, must reconnect to the new server, and must restart the session from scratch. This guide covers the two zero-downtime strategies for MCP servers (rolling update with session drain, and blue-green with traffic switch) and the implementation details for each: drain handler, Kubernetes configuration, PM2 reload sequence, and post-deploy verification.

TL;DR

Implement a SIGTERM handler that (1) marks the server as draining (health check returns 503 to stop new sessions arriving), (2) waits for active SSE sessions to close naturally, (3) calls process.exit(0) after a configurable drain timeout. For Kubernetes rolling updates, set maxUnavailable: 0, maxSurge: 1, and terminationGracePeriodSeconds above your drain timeout. For blue-green, bring the new environment to healthy before switching the load balancer upstream — zero sessions are interrupted because no pod with active sessions is stopped until after the traffic switch.

Why REST deployments are easier than MCP deployments

A REST API processes each HTTP request independently. A request that arrives at the old server before SIGTERM completes normally. Requests that arrive after the load balancer switches upstream go to the new server. There is no per-session state to preserve — each request carries its own authentication and context.

An MCP server using HTTP/SSE transport is different in three ways:

Property	REST API	MCP server (HTTP/SSE)
Connection lifetime	One request, one response	One SSE connection per session (minutes to hours)
Session state	Stateless per request	Session state bound to a specific server process
Interrupted request cost	Client retries one request	Client must re-initialize entire session
Deploy impact	Sub-second gap for in-flight requests	All sessions on old server are disrupted simultaneously

The gap between "deploy starts" and "old server exits" is the window during which active sessions are at risk. The two strategies below close that window in different ways.

The session drain handler

Both rolling update and blue-green strategies require the old server to drain before exiting. The drain handler is the core primitive.

import { Server as HttpServer } from 'node:http';

// Track active SSE sessions — each session registers itself on connect and deregisters on close
const activeSessions = new Map<string, { close: () => void }>();
let serverState: 'starting' | 'ready' | 'draining' | 'stopped' = 'starting';

export function registerSession(id: string, closeSession: () => void): () => void {
  activeSessions.set(id, { close: closeSession });
  return () => activeSessions.delete(id);
}

// /health returns 503 while draining so the load balancer stops routing new sessions here
export function getHealthStatus() {
  if (serverState === 'draining' || serverState === 'stopped') {
    return { code: 503, body: { status: serverState, active_sessions: activeSessions.size } };
  }
  if (serverState !== 'ready') {
    return { code: 503, body: { status: 'starting' } };
  }
  return { code: 200, body: { status: 'ok', active_sessions: activeSessions.size } };
}

async function drain(signal: string, httpServer: HttpServer) {
  console.log({ signal, sessions: activeSessions.size }, 'Drain started');
  serverState = 'draining';

  // Stop accepting new TCP connections — load balancer health check now returns 503
  // This gives the load balancer time to remove this instance from its rotation
  // before active sessions are affected
  httpServer.close();

  const drainTimeoutMs  = parseInt(process.env.DRAIN_TIMEOUT_MS ?? '25000', 10);
  const pollIntervalMs  = 500;
  const deadline        = Date.now() + drainTimeoutMs;

  // Wait for sessions to close naturally (clients detect connection drop and reconnect elsewhere)
  while (activeSessions.size > 0 && Date.now() < deadline) {
    console.log({ remaining: activeSessions.size, ms_left: deadline - Date.now() }, 'Draining');
    await new Promise(r => setTimeout(r, pollIntervalMs));
  }

  if (activeSessions.size > 0) {
    console.warn({ remaining: activeSessions.size }, 'Drain timeout — force-closing remaining sessions');
    for (const [id, session] of activeSessions) {
      session.close();
    }
  }

  serverState = 'stopped';
  console.log('Drain complete — exiting');
  process.exit(0);
}

// Register for both signals — SIGTERM from Docker/Kubernetes/systemd; SIGINT from PM2
let draining = false;
const onSignal = (signal: string) => (httpServer: HttpServer) => {
  if (draining) return;  // prevent double-drain if both signals fire
  draining = true;
  drain(signal, httpServer);
};

export function installDrainHandlers(httpServer: HttpServer) {
  process.on('SIGTERM', () => onSignal('SIGTERM')(httpServer));
  process.on('SIGINT',  () => onSignal('SIGINT')(httpServer));
}

The critical detail is that httpServer.close() stops the server from accepting new connections immediately, but does not close existing SSE connections. The health check returns 503 after close() is called, which triggers the load balancer to remove this instance from its pool within one health check interval (typically 5–15 seconds). Active SSE sessions remain connected throughout this window. See MCP server graceful shutdown for the per-session close implementation.

Strategy 1: rolling update

Rolling update deploys the new version incrementally: bring up one new pod, wait for it to pass health checks, then terminate one old pod. Repeat until all pods are replaced. The key is that the old pod is not terminated until the new pod is ready and the load balancer has started routing new sessions to it.

# Kubernetes Deployment — rolling update strategy for MCP servers
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0   # Never reduce below 3 healthy pods during the update
      maxSurge: 1         # Allow one extra pod (4 total) during the update window
  template:
    spec:
      # Must be longer than DRAIN_TIMEOUT_MS + load balancer deregistration delay
      terminationGracePeriodSeconds: 60

      containers:
      - name: mcp-server
        image: your-registry/mcp-server:v2
        env:
        - name: DRAIN_TIMEOUT_MS
          value: "25000"

        # Readiness probe: returns 503 when draining — load balancer stops routing here
        readinessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 3

        # Liveness probe: only fails if the process is deadlocked, not just draining
        livenessProbe:
          httpGet:
            path: /healthz    # separate endpoint that returns 200 while draining
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 10
          failureThreshold: 6

        lifecycle:
          preStop:
            # Add a small sleep before SIGTERM to give the load balancer time
            # to see the pod as NotReady before connections are dropped.
            # Without this, the LB may send new requests to a terminating pod.
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]

The preStop sleep is important. Kubernetes sends SIGTERM and updates the endpoint list simultaneously — there is a race where the load balancer may route new traffic to a pod that has already received SIGTERM. The 5-second preStop sleep ensures the pod is removed from the endpoint list before SIGTERM triggers the drain. During this 5 seconds, the pod continues serving normally.

Strategy 2: blue-green deployment

Blue-green maintains two environments — blue (current live) and green (new version). The traffic switch happens at the load balancer level after green is fully healthy. No pod with active sessions is stopped until after the switch — all sessions migrate naturally as clients reconnect after the switch completes.

# Blue-green with nginx upstream swap
# Blue environment: pods labelled version=blue, port 3000
# Green environment: pods labelled version=green, port 3001

# Step 1: Deploy green environment and wait for health
kubectl apply -f deployment-green.yaml
kubectl rollout status deployment/mcp-server-green

# Step 2: Verify green passes protocol compliance test
curl -s http://green-internal:3001/health | jq .status   # must be "ok"

# Step 3: Run MCP initialize smoke test against green
node scripts/mcp-smoke-test.js --endpoint http://green-internal:3001

# Step 4: Switch nginx upstream from blue to green
# Update the nginx config to point the upstream at green pods, then reload nginx
kubectl annotate configmap nginx-config \
  upstream-target=green --overwrite
kubectl rollout restart deployment/nginx

# Step 5: Wait for nginx to pick up the new config
sleep 10

# Step 6: Verify AliveMCP sees the new version as healthy
# (AliveMCP will probe within 60s of the config change)

# Step 7: Drain and scale down blue (now receives no new traffic)
# Sessions on blue pods will close naturally; wait DRAIN_TIMEOUT_MS before deleting
sleep 30
kubectl scale deployment/mcp-server-blue --replicas=0

Blue-green has higher infrastructure cost during the transition (two full environments running simultaneously) but zero session disruption: sessions on blue pods continue undisturbed until they close naturally. New sessions after the traffic switch go to green. There is no overlap window where active sessions might be interrupted. The trade-off versus rolling update is cost (double infra for the transition window) vs. user impact (zero vs. possible session interruption during drain).

Post-deploy verification gate

Both strategies benefit from an automated verification step that runs after the deploy and rolls back if the new version fails. The verification must go beyond a HTTP 200 on /health — it should validate the MCP protocol itself.

// scripts/mcp-smoke-test.js
// Runs after deploy; exits non-zero if the new server fails protocol validation

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';

const endpoint = process.argv[2] ?? 'https://your-mcp-server.example.com';

async function verify() {
  const transport = new SSEClientTransport(new URL(`${endpoint}/sse`));
  const client    = new Client({ name: 'smoke-test', version: '1.0.0' }, {});

  try {
    await client.connect(transport);

    // Verify protocol version
    const serverInfo = client.getServerVersion();
    if (!serverInfo?.protocolVersion) {
      throw new Error('Missing protocolVersion in server info');
    }

    // Verify tools are registered
    const { tools } = await client.listTools();
    if (tools.length === 0) {
      throw new Error('Server returned zero tools — expected at least one');
    }

    // Load the pre-deploy tool schema snapshot and compare hashes
    const snapshot = JSON.parse(fs.readFileSync('tools-snapshot.json', 'utf8'));
    const currentHash  = hashTools(tools);
    if (currentHash !== snapshot.hash) {
      // Schema changed — this is expected on intentional schema updates
      // but should fail the gate if not expected, to catch accidental breakage
      console.warn('Tool schema changed — review tools-snapshot.json diff');
    }

    console.log(`OK — protocol ${serverInfo.protocolVersion}, ${tools.length} tools`);
    await client.close();
    process.exit(0);
  } catch (err) {
    console.error('Smoke test FAILED:', err.message);
    await client.close().catch(() => {});
    process.exit(1);
  }
}

function hashTools(tools: any[]) {
  const sorted = tools.map(t => ({ name: t.name, description: t.description }))
    .sort((a, b) => a.name.localeCompare(b.name));
  return createHash('sha256').update(JSON.stringify(sorted)).digest('hex');
}

verify();

Run this smoke test in your CI/CD pipeline after every deploy. If it fails, trigger the rollback immediately — before AliveMCP detects the failure from outside. See MCP server CI/CD for the full GitHub Actions pipeline with automatic rollback on smoke test failure.

How AliveMCP observes deploy events

AliveMCP probes your MCP endpoint every 60 seconds. A rolling update that takes 3 minutes to complete will be sampled 3 times during the deploy. During a well-implemented rolling update:

Probes during the update hit healthy pods (old or new) — no downtime recorded
Probes on a pod in the drain window hit the new pod (old pod returns 503 to the health check and is removed from rotation before sessions drain)
After the update, AliveMCP confirms the new version is healthy within one probe window (60 seconds)

A misconfigured rolling update — for example, maxUnavailable: 1 on a 2-replica deployment, which allows both pods to be unavailable simultaneously — shows as downtime in AliveMCP even if the deploy finishes successfully. AliveMCP's 90-day uptime history makes accidental deploy-caused downtime visible in retrospect. See MCP server uptime monitoring for the probe sequence details.