Guide · Operations

MCP server graceful shutdown

Graceful shutdown is the difference between a deploy that silently drops active tool calls and one that waits for them to finish. MCP servers are stateful — each session starts with an initialize handshake and may have a tool call in flight when a shutdown signal arrives. A process that exits immediately on SIGTERM kills that tool call mid-execution, leaving the client with a broken connection. A server with graceful shutdown stops accepting new sessions, waits for active sessions to complete their current tool call, then exits cleanly. AliveMCP's probe starts a new session every 60 seconds — during a graceful shutdown, the probe will see the server transition from up to a brief 503 state (health check returns unhealthy) before the process exits, which is distinct from an unplanned crash.

TL;DR

The shutdown sequence is: (1) mark health check as unhealthy so the load balancer stops routing new sessions; (2) stop the HTTP listener to reject new initialize requests; (3) wait up to DRAIN_TIMEOUT_MS for active sessions to finish their current tool call; (4) close database connections and other resources; (5) exit. The container orchestrator's terminationGracePeriodSeconds (Kubernetes) or stop_grace_period (Docker Compose) must exceed DRAIN_TIMEOUT_MS or the process will be SIGKILL'd before the drain completes.

SIGTERM handler and shutdown sequence

Node.js does not exit on SIGTERM by default — you must register a handler. In Docker, if your CMD is in shell form (CMD node server.js), Node.js runs as a child of sh and may never receive SIGTERM. Use exec form (CMD ["node", "dist/server.js"]) so Node.js is PID 1 and receives signals directly.

// server.ts — complete graceful shutdown implementation
import express from 'express';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { db } from './db.js';

const DRAIN_TIMEOUT_MS = parseInt(process.env.DRAIN_TIMEOUT_MS ?? '30000');

let isShuttingDown = false;
const activeSessions = new Map<string, StreamableHTTPServerTransport>();

const app = express();

// Health check — returns 503 during shutdown so load balancer drains traffic
app.get('/healthz', (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({ status: 'shutting_down' });
  }
  res.json({ status: 'ok' });
});

app.post('/mcp', async (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({ error: 'server is shutting down' });
  }
  const server = new McpServer({ name: 'my-server', version: '1.0.0' });
  const transport = new StreamableHTTPServerTransport({ sessionIdHeader: 'mcp-session-id' });

  const sessionId = req.headers['mcp-session-id'] as string ?? crypto.randomUUID();
  activeSessions.set(sessionId, transport);
  res.on('close', () => activeSessions.delete(sessionId));

  await server.connect(transport);
  await transport.handleRequest(req, res);
});

const httpServer = app.listen(3001, () => {
  console.log({ event: 'server_started', port: 3001 });
});

async function shutdown(signal: string) {
  if (isShuttingDown) return;
  isShuttingDown = true;
  console.log({ event: 'shutdown_started', signal, activeSessions: activeSessions.size });

  // Step 1: stop accepting new connections
  httpServer.close();

  // Step 2: wait for active sessions to finish (up to DRAIN_TIMEOUT_MS)
  const deadline = Date.now() + DRAIN_TIMEOUT_MS;
  while (activeSessions.size > 0 && Date.now() < deadline) {
    await new Promise(resolve => setTimeout(resolve, 200));
  }

  if (activeSessions.size > 0) {
    console.warn({ event: 'drain_timeout', remainingSessions: activeSessions.size });
    // Force-close remaining transports
    for (const [id, transport] of activeSessions) {
      try { await transport.close(); } catch {}
      activeSessions.delete(id);
    }
  }

  // Step 3: close DB connections and other resources
  await db.destroy();

  console.log({ event: 'shutdown_complete' });
  process.exit(0);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT',  () => shutdown('SIGINT'));

Container orchestrator configuration

The orchestrator's termination grace period must be longer than your DRAIN_TIMEOUT_MS. If the orchestrator sends SIGKILL before the drain completes, your in-flight tool calls are killed regardless:

Platform	Setting	Must exceed
Kubernetes	`terminationGracePeriodSeconds` on Pod spec	`DRAIN_TIMEOUT_MS / 1000` + 5s buffer
Docker Compose	`stop_grace_period` on service	`DRAIN_TIMEOUT_MS / 1000`s + 5s buffer
Fly.io	`kill_timeout` in `fly.toml`	`DRAIN_TIMEOUT_MS / 1000`s + 5s buffer
Docker run	`--stop-timeout`	`DRAIN_TIMEOUT_MS / 1000`s + 5s buffer

Example for a 30-second drain timeout, Kubernetes:

spec:
  terminationGracePeriodSeconds: 40  # 30s drain + 10s buffer
  containers:
    - name: mcp-server
      env:
        - name: DRAIN_TIMEOUT_MS
          value: "30000"
      lifecycle:
        preStop:
          exec:
            # Give the load balancer time to remove this pod from rotation
            # before the HTTP listener stops. Prevents a brief window where
            # traffic still routes here after shutdown starts.
            command: ["/bin/sleep", "5"]

The preStop hook with a 5-second sleep is a standard pattern for Kubernetes rolling deploys. Without it, there is a race condition where Kubernetes removes the pod from the Service endpoints at the same time the pod starts rejecting new connections — some requests may hit a 503 during the 1-2 second propagation delay. The sleep absorbs that window.

Health check transition during shutdown

The most important part of graceful shutdown from a monitoring perspective is the health check transition. When /healthz starts returning 503:

Kubernetes removes the pod from the Service endpoint slice — new requests stop routing here.
AliveMCP's probe sees the 503 on the next 60-second cycle — it treats this as a degraded state, not an outage, because the status changes from 503 to offline (connection refused) rather than going directly offline.
Caddy or nginx upstream health checks remove the backend from the pool.

This transition is what makes AliveMCP downtime distinct from maintenance. If you shut down the health check route before beginning the drain, the load balancer stops routing, the MCP transport stays accepting, and active sessions complete normally. AliveMCP sees a brief gap in probe responses and then the server returns after the new version is up — this is expected deploy behavior and does not trigger a downtime alert if the gap is within the SLO threshold.

Drain timeout sizing

Set DRAIN_TIMEOUT_MS to your P99 tool-call duration plus 5 seconds of buffer. If your slowest tool call takes 15 seconds (P99), set the drain timeout to 20 seconds. This ensures all but the most extreme tail calls finish during the drain window. You can read P99 tool-call duration from structured logs (duration_ms field on tool-call events) or from AliveMCP's response-time history (which measures initialize latency, a reliable lower bound on tool-call latency).

If your longest tool calls genuinely take 60+ seconds (e.g., a tool that calls an LLM with a slow model), consider adding a per-tool timeout that returns an isError: true result after a maximum duration. This caps the drain window rather than waiting 60+ seconds per deploy cycle.