Guide · Operations
MCP server graceful shutdown
Graceful shutdown is the difference between a deploy that silently drops active tool calls and one that waits for them to finish. MCP servers are stateful — each session starts with an initialize handshake and may have a tool call in flight when a shutdown signal arrives. A process that exits immediately on SIGTERM kills that tool call mid-execution, leaving the client with a broken connection. A server with graceful shutdown stops accepting new sessions, waits for active sessions to complete their current tool call, then exits cleanly. AliveMCP's probe starts a new session every 60 seconds — during a graceful shutdown, the probe will see the server transition from up to a brief 503 state (health check returns unhealthy) before the process exits, which is distinct from an unplanned crash.
TL;DR
The shutdown sequence is: (1) mark health check as unhealthy so the load balancer stops routing new sessions; (2) stop the HTTP listener to reject new initialize requests; (3) wait up to DRAIN_TIMEOUT_MS for active sessions to finish their current tool call; (4) close database connections and other resources; (5) exit. The container orchestrator's terminationGracePeriodSeconds (Kubernetes) or stop_grace_period (Docker Compose) must exceed DRAIN_TIMEOUT_MS or the process will be SIGKILL'd before the drain completes.
SIGTERM handler and shutdown sequence
Node.js does not exit on SIGTERM by default — you must register a handler. In Docker, if your CMD is in shell form (CMD node server.js), Node.js runs as a child of sh and may never receive SIGTERM. Use exec form (CMD ["node", "dist/server.js"]) so Node.js is PID 1 and receives signals directly.
// server.ts — complete graceful shutdown implementation
import express from 'express';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { db } from './db.js';
const DRAIN_TIMEOUT_MS = parseInt(process.env.DRAIN_TIMEOUT_MS ?? '30000');
let isShuttingDown = false;
const activeSessions = new Map<string, StreamableHTTPServerTransport>();
const app = express();
// Health check — returns 503 during shutdown so load balancer drains traffic
app.get('/healthz', (req, res) => {
if (isShuttingDown) {
return res.status(503).json({ status: 'shutting_down' });
}
res.json({ status: 'ok' });
});
app.post('/mcp', async (req, res) => {
if (isShuttingDown) {
return res.status(503).json({ error: 'server is shutting down' });
}
const server = new McpServer({ name: 'my-server', version: '1.0.0' });
const transport = new StreamableHTTPServerTransport({ sessionIdHeader: 'mcp-session-id' });
const sessionId = req.headers['mcp-session-id'] as string ?? crypto.randomUUID();
activeSessions.set(sessionId, transport);
res.on('close', () => activeSessions.delete(sessionId));
await server.connect(transport);
await transport.handleRequest(req, res);
});
const httpServer = app.listen(3001, () => {
console.log({ event: 'server_started', port: 3001 });
});
async function shutdown(signal: string) {
if (isShuttingDown) return;
isShuttingDown = true;
console.log({ event: 'shutdown_started', signal, activeSessions: activeSessions.size });
// Step 1: stop accepting new connections
httpServer.close();
// Step 2: wait for active sessions to finish (up to DRAIN_TIMEOUT_MS)
const deadline = Date.now() + DRAIN_TIMEOUT_MS;
while (activeSessions.size > 0 && Date.now() < deadline) {
await new Promise(resolve => setTimeout(resolve, 200));
}
if (activeSessions.size > 0) {
console.warn({ event: 'drain_timeout', remainingSessions: activeSessions.size });
// Force-close remaining transports
for (const [id, transport] of activeSessions) {
try { await transport.close(); } catch {}
activeSessions.delete(id);
}
}
// Step 3: close DB connections and other resources
await db.destroy();
console.log({ event: 'shutdown_complete' });
process.exit(0);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
Container orchestrator configuration
The orchestrator's termination grace period must be longer than your DRAIN_TIMEOUT_MS. If the orchestrator sends SIGKILL before the drain completes, your in-flight tool calls are killed regardless:
| Platform | Setting | Must exceed |
|---|---|---|
| Kubernetes | terminationGracePeriodSeconds on Pod spec | DRAIN_TIMEOUT_MS / 1000 + 5s buffer |
| Docker Compose | stop_grace_period on service | DRAIN_TIMEOUT_MS / 1000s + 5s buffer |
| Fly.io | kill_timeout in fly.toml | DRAIN_TIMEOUT_MS / 1000s + 5s buffer |
| Docker run | --stop-timeout | DRAIN_TIMEOUT_MS / 1000s + 5s buffer |
Example for a 30-second drain timeout, Kubernetes:
spec:
terminationGracePeriodSeconds: 40 # 30s drain + 10s buffer
containers:
- name: mcp-server
env:
- name: DRAIN_TIMEOUT_MS
value: "30000"
lifecycle:
preStop:
exec:
# Give the load balancer time to remove this pod from rotation
# before the HTTP listener stops. Prevents a brief window where
# traffic still routes here after shutdown starts.
command: ["/bin/sleep", "5"]
The preStop hook with a 5-second sleep is a standard pattern for Kubernetes rolling deploys. Without it, there is a race condition where Kubernetes removes the pod from the Service endpoints at the same time the pod starts rejecting new connections — some requests may hit a 503 during the 1-2 second propagation delay. The sleep absorbs that window.
Health check transition during shutdown
The most important part of graceful shutdown from a monitoring perspective is the health check transition. When /healthz starts returning 503:
- Kubernetes removes the pod from the Service endpoint slice — new requests stop routing here.
- AliveMCP's probe sees the
503on the next 60-second cycle — it treats this as a degraded state, not an outage, because the status changes from503to offline (connection refused) rather than going directly offline. - Caddy or nginx upstream health checks remove the backend from the pool.
This transition is what makes AliveMCP downtime distinct from maintenance. If you shut down the health check route before beginning the drain, the load balancer stops routing, the MCP transport stays accepting, and active sessions complete normally. AliveMCP sees a brief gap in probe responses and then the server returns after the new version is up — this is expected deploy behavior and does not trigger a downtime alert if the gap is within the SLO threshold.
Drain timeout sizing
Set DRAIN_TIMEOUT_MS to your P99 tool-call duration plus 5 seconds of buffer. If your slowest tool call takes 15 seconds (P99), set the drain timeout to 20 seconds. This ensures all but the most extreme tail calls finish during the drain window. You can read P99 tool-call duration from structured logs (duration_ms field on tool-call events) or from AliveMCP's response-time history (which measures initialize latency, a reliable lower bound on tool-call latency).
If your longest tool calls genuinely take 60+ seconds (e.g., a tool that calls an LLM with a slow model), consider adding a per-tool timeout that returns an isError: true result after a maximum duration. This caps the drain window rather than waiting 60+ seconds per deploy cycle.
Related questions
How does graceful shutdown interact with session affinity during rolling deploys?
During a rolling deploy, new pods start accepting sessions while old pods drain. If you use session affinity (sticky routing on mcp-session-id), clients with existing sessions stay on the old pod until the session ends or the drain timeout is hit. Clients starting new sessions route to the new pod. This is the safest pattern — in-flight tool calls complete on the old version, new sessions get the new version. Configure your load balancer to respect the mcp-session-id header for affinity routing.
What happens to the AliveMCP probe during a graceful shutdown?
AliveMCP probes fire every 60 seconds. When the probe hits your server during the drain window, it will either: (1) receive a 503 from /healthz if the probe is a health-check probe; (2) receive a 503 from /mcp with the "shutting down" error if the probe sends a new initialize request during the drain window. AliveMCP records this as a failed probe cycle. If the server comes back up before the next probe cycle, no alert is sent. If the downtime extends past the alert threshold, an alert fires. Planned maintenance should use AliveMCP's maintenance-window feature to suppress alerts during expected downtime.
Should I close SSE connections immediately on SIGTERM?
No. SSE (Server-Sent Events) connections carry active MCP sessions. If a tool call is in flight over an SSE connection and you close it immediately, the client's tool call fails. The graceful shutdown pattern above lets SSE connections drain naturally — the session's res.on('close', ...) handler fires when the session ends, at which point the session is removed from activeSessions. Force-close SSE connections only after the drain timeout expires.
Further reading
- MCP server deployment — rolling deploys and session affinity configuration
- MCP server Docker — exec form CMD and stop_grace_period configuration
- MCP server Kubernetes — terminationGracePeriodSeconds and preStop lifecycle hook
- MCP server health check — health check endpoint patterns during deploy lifecycle
- MCP server connection pooling — draining DB connections during shutdown
- MCP server logging — structured events for shutdown lifecycle
- AliveMCP — uptime monitoring that distinguishes graceful shutdowns from crashes