Deployment guide · 2026-06-04 · Production MCP servers
MCP Server Deployment Guide: PM2, systemd, nginx, Fly.io, and Zero-Downtime Deployment
Deploying an MCP server to production looks similar to deploying any Node.js HTTP server — until the first time you restart the process and every active LLM session drops simultaneously. A conventional REST API server can be restarted freely because each request is independent; a dropped connection causes a retry that succeeds immediately on the new process. MCP servers using Server-Sent Events are different: each session is a long-lived SSE connection backed by session state accumulated through an initialize handshake and subsequent tool calls. Killing the process terminates every active session, and every LLM agent calling your tools must reinitialise from scratch. A complete production deployment system for an MCP server has five concerns: PM2 for process management on Linux VPS, systemd as the native Linux service layer beneath or instead of PM2, nginx as the reverse proxy that handles TLS and SSE-specific buffering, Fly.io for PaaS deployments with idle-timeout and session-affinity caveats, and zero-downtime deployment as the cross-cutting concern that makes the other four work together without dropping sessions. This guide covers them as a system — how each concern addresses a distinct part of the deployment problem, how they compose for different deployment contexts, and what remains invisible to process managers and load balancers that an external probe sees clearly.
TL;DR
- Use fork mode, not cluster mode, for most MCP servers under PM2. PM2 cluster mode spawns multiple workers and load-balances connections across them, but SSE connections are bound to a specific worker. When a worker is reloaded, every SSE session on that worker terminates. Fork mode runs a single process — simpler, correct, and sufficient for most indie and small-team MCP deployments.
- systemd's
TimeoutStopSecmust exceed your drain timeout. systemd sends SIGTERM, then SIGKILL afterTimeoutStopSec. If your SIGTERM drain handler takes 25 seconds andTimeoutStopSecis 20, systemd kills the process mid-drain and every session in it. SetTimeoutStopSec=35— five seconds larger thanDRAIN_TIMEOUT_MS— so the drain always completes before systemd escalates. - nginx needs two non-default settings for SSE:
proxy_buffering offandproxy_read_timeout 3600s. Withoutproxy_buffering off, nginx buffers the SSE event stream and the client never receives events in real time. Without an extendedproxy_read_timeout, nginx closes idle SSE connections after 60 seconds — the default for HTTP proxying. - Fly.io's
idle_timeoutterminates SSE sessions after 60 seconds of quiet by default. Fly.io closes HTTP connections idle for 60 seconds at the load balancer layer — before your MCP server process sees them end. Sethttp_options.idle_timeout = 3600infly.tomlto match the maximum realistic session length. - A SIGTERM drain handler is the single most important piece of zero-downtime deployment. Zero-downtime deployment requires the old process to stop accepting new connections, return HTTP 503 from
/healthso the load balancer removes it from rotation, wait for active sessions to complete or timeout, and then exit cleanly. Rolling updates, blue-green, and PM2 graceful reload all require this handler to work. - External probes see what process managers cannot. PM2, systemd, and Fly.io know whether the process is running. They do not know whether the process is correctly responding to MCP protocol requests. AliveMCP probes from outside — it detects when a server is running but no longer responding to tool calls, when a deploy caused an elevated error rate, or when a misconfigured drain is cutting sessions short.
Why MCP Deployment Is Different
A conventional HTTP API and an MCP server differ in one deployment-critical way: state per connection.
A REST API is stateless at the connection level. Each HTTP request carries all the information needed to process it. A client that gets a connection error retries the same request against any live instance. You can kill and replace API servers freely — the worst case is one failed request that retries successfully in under a second.
An MCP server accumulates state over the lifetime of a session. The initialize handshake negotiates protocol version, registers tools, and may run expensive setup (database pool acquisition, credential validation, feature flag evaluation). Subsequent tool calls depend on that session context. If the underlying SSE connection drops, the client must reinitialise from scratch — re-running the handshake, re-establishing session state, possibly losing mid-task progress in the LLM's working context.
This distinction drives the deployment constraints that distinguish MCP servers from REST servers:
| Concern | REST server | MCP server with SSE |
|---|---|---|
| Process restart cost | One failed request, retried immediately | All active sessions terminated; each must reinitialise |
| Idle connection timeout | No cost — request is complete before timeout fires | Silent session termination mid-task if SSE connection is idle |
| Load balancing | Any replica can serve any request | SSE client must reach the same process for all tool calls in a session |
| Health check result | HTTP 200 means requests will succeed | HTTP 200 does not confirm the process correctly handles MCP protocol |
| Deploy downtime | Seconds acceptable — requests retry | Any downtime interrupts in-progress LLM tasks |
The five deployment concerns in this guide address these differences systematically. PM2 and systemd handle the process lifecycle — they restart the server on crash and keep it running through reboots. nginx and Fly.io handle the network boundary — they terminate TLS, enforce rate limits, and must be configured to not silently kill idle SSE connections. Zero-downtime deployment is the concern that ties all four together: without a drain handler that signals the load balancer and waits for active sessions to complete, even a perfect PM2 or systemd configuration will drop sessions on every deploy.
The Five Concerns and Their Roles
| Concern | Where it runs | What it provides | What it cannot do alone |
|---|---|---|---|
| PM2 | Linux VPS, bare metal | Auto-restart on crash, memory-limit restart, log rotation, startup integration, graceful reload | Does not handle TLS; does not control load balancer routing; graceful reload requires your server to implement a drain handler |
| systemd | Any Linux distribution | Service lifecycle, SIGTERM → SIGKILL escalation with configurable timeout, credential injection via EnvironmentFile, security sandboxing, journal logging | Does not know when your server is ready to accept traffic; does not handle TLS; does not perform application-level health checks |
| nginx | Reverse proxy in front of MCP server | TLS termination, SSE buffering control, per-IP rate limiting, structured access logging, certbot integration | Cannot drain application sessions — it routes connections, not MCP sessions; a reload replaces workers and may close long-lived upstream connections |
| Fly.io | PaaS deployment | Managed TLS, global anycast, rolling deploys, secrets management, volume storage for SQLite | Default idle_timeout silently terminates SSE sessions; session affinity requires explicit configuration or single-machine deployment |
| Zero-downtime deployment | Application layer, cross-cutting | Drain handler that signals the load balancer, waits for active sessions, then exits; enables PM2 reload, rolling updates, blue-green without session drops | Requires the load balancer (nginx, Fly, Kubernetes) to honour the 503 health signal and stop routing new connections |
PM2: Process Management on Linux VPS
PM2 is the most common process manager for Node.js on Linux VPS instances. It watches the process, restarts it on crash, and integrates with Linux init so the server survives reboots. For MCP servers, two configuration decisions matter: whether to use fork or cluster mode, and how to configure the drain timeout.
Fork mode vs. cluster mode
PM2 cluster mode spawns multiple worker processes using Node's cluster module and distributes incoming connections across them. The problem for MCP servers is that SSE connections are stateful — once a client opens an SSE connection to a specific worker and completes initialize, all subsequent tool calls must reach that same worker. PM2's cluster mode load balancing does not guarantee this. In a cluster of four workers, a client that gets routed to worker 2 on initialize may get routed to worker 3 on the next tool call — resulting in a session-not-found error.
Fork mode runs a single process. It does not use multiple CPU cores, but a single Node.js process on a modern VPS handles 50–200 concurrent MCP sessions comfortably within the event loop. For most indie and small-team MCP deployments, fork mode is the right choice: simpler, correct, and requires no sticky-session infrastructure.
A minimal ecosystem.config.js for an MCP server in fork mode:
module.exports = {
apps: [{
name: 'mcp-server',
script: './dist/index.js',
exec_mode: 'fork', // single process — no sticky-session problem
max_memory_restart: '512M', // contain leaks before OOM kill
kill_timeout: 30000, // wait 30s for drain before force-kill
wait_ready: true, // PM2 reload waits for process.send('ready')
listen_timeout: 10000, // startup timeout before marking unhealthy
restart_delay: 1000,
exp_backoff_restart_delay: 100,
max_restarts: 10,
min_uptime: '10s',
env: { NODE_ENV: 'production', PORT: '3000' }
}]
};
wait_ready: true is the key setting that enables graceful reload. When set, PM2 will not stop the old process until the new process emits process.send('ready'). This means you control exactly when traffic shifts: the new process signals ready only after it has opened database connections, loaded secrets, and is genuinely prepared to accept connections.
async function main() {
await initDatabase();
await loadSecrets();
app.listen(3000, () => {
if (process.send) {
process.send('ready'); // signal PM2 the new process is live
}
});
}
// PM2 sends SIGINT on reload, SIGTERM on stop — handle both
async function shutdown(signal) {
await drainActiveSessions();
process.exit(0);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
Note that PM2 sends SIGINT during graceful reload (not SIGTERM). A handler that only listens to SIGTERM will not drain sessions during pm2 reload. Both signals must be handled.
Cluster mode with nginx sticky sessions
If you need multi-core utilisation, cluster mode is possible but requires nginx ip_hash sticky routing to ensure each client's connections consistently reach the same worker. Each worker listens on a different port (derived from PM2_INSTANCE_ID), and nginx hashes the client IP to a stable upstream. This adds operational complexity — a NAT'd client whose public IP changes mid-session will fail — and is generally not worth it until a single process is saturating a single core.
Log rotation and startup integration
Install pm2-logrotate to prevent log files from growing unbounded: pm2 install pm2-logrotate. After configuring the ecosystem file, run pm2 startup (generates a systemd unit for PM2 itself) and pm2 save (writes the current process list to restore on reboot). PM2 effectively becomes a user-space process manager supervised by systemd.
systemd: The Native Linux Service Layer
systemd is the init system on every major Linux distribution. You can use it directly to manage your MCP server (instead of PM2), or use it to supervise PM2 itself. When you manage the MCP server directly with systemd, the configuration decisions that matter most are TimeoutStopSec, Type=notify, and the EnvironmentFile credential injection pattern.
Unit file essentials
[Unit]
Description=MCP Server
After=network.target
[Service]
Type=notify
User=mcp
Group=mcp
WorkingDirectory=/opt/mcp-server
EnvironmentFile=/etc/mcp-server/env
ExecStart=/usr/bin/node dist/index.js
Restart=on-failure
RestartSec=5s
StartLimitBurst=5
StartLimitIntervalSec=300
TimeoutStopSec=35
# Security hardening
PrivateTmp=yes
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/lib/mcp-server
PrivateDevices=yes
ProtectKernelTunables=yes
SystemCallFilter=@system-service
[Install]
WantedBy=multi-user.target
TimeoutStopSec: the most common misconfiguration
When you run systemctl stop mcp-server or deploy a new version, systemd sends SIGTERM to the process. Your drain handler then closes the HTTP listener, sets the health endpoint to return 503, and waits for active sessions to complete or time out. If systemd's TimeoutStopSec expires before the drain completes, systemd escalates to SIGKILL — which immediately terminates the process with all active sessions in it.
The rule is: TimeoutStopSec must exceed DRAIN_TIMEOUT_MS by a margin. If your drain timeout is 25 seconds (DRAIN_TIMEOUT_MS = 25000), set TimeoutStopSec=35. The five-second margin gives the drain code time to complete its cleanup after the last session closes.
Type=notify and sd_notify
Type=notify tells systemd to wait for the process to send sd_notify(READY=1) before marking the service as started. Without it (Type=simple), systemd marks the service started as soon as the process spawns — before database connections are open or secrets are loaded. Traffic may arrive at the process before it is ready.
In Node.js, use the sd-notify npm package:
import sdNotify from 'sd-notify';
async function main() {
await initDatabase();
await loadSecrets();
app.listen(3000, () => {
sdNotify.ready(); // READY=1 — systemd marks service started
});
}
process.on('SIGTERM', async () => {
sdNotify.stopping(); // STOPPING=1 — optional but helps systemd timing
await drainActiveSessions();
process.exit(0);
});
EnvironmentFile for credential injection
Credentials should never be in the systemd unit file (which is world-readable via systemctl cat). Use EnvironmentFile=/etc/mcp-server/env, a file owned by root:mcp with mode 640. The file is not in your application repository and not readable by other users. This is equivalent to Fly.io's fly secrets set — the credentials are injected as environment variables at process start without appearing in logs or version control.
nginx: Reverse Proxy with SSE-Specific Configuration
nginx is the most common reverse proxy for MCP servers on Linux VPS instances. It handles TLS termination, HTTP-to-HTTPS redirection, per-IP rate limiting, and structured access logging. Two nginx default settings silently break SSE connections and must be changed for MCP.
proxy_buffering off: the critical SSE setting
nginx buffers proxy responses by default. For SSE, buffering means nginx accumulates events from the upstream MCP server in memory and periodically flushes them to the client in batches — breaking the real-time delivery that SSE provides. The client receives tool responses seconds after they were sent, or not until the buffer fills. Set proxy_buffering off on the SSE location block.
proxy_read_timeout: prevent idle session termination
nginx's default proxy_read_timeout is 60 seconds. It measures time since the last data was received from the upstream. For an SSE connection where the LLM is thinking between tool calls, 60 seconds of silence is normal — but nginx will close the connection and the client will need to reconnect and reinitialise. Set proxy_read_timeout 3600s (one hour) on the SSE location.
Core nginx configuration
limit_req_zone $binary_remote_addr zone=mcp_per_ip:10m rate=30r/m;
limit_req_zone $binary_remote_addr zone=mcp_health:1m rate=5r/s;
upstream mcp_server {
server 127.0.0.1:3000;
keepalive 16; # persistent connections to Node — eliminates per-request TCP overhead
}
server {
listen 443 ssl;
server_name example.com;
# ssl_certificate / ssl_certificate_key managed by certbot
# Health check — higher rate limit, standard timeout
location /health {
limit_req zone=mcp_health burst=10 nodelay;
proxy_pass http://mcp_server;
proxy_set_header Host $host;
}
# SSE transport — buffering disabled, extended timeout
location /sse {
limit_req zone=mcp_per_ip burst=5 nodelay;
proxy_pass http://mcp_server;
proxy_http_version 1.1;
proxy_set_header Connection ""; # keepalive to upstream
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_buffering off; # must be off for SSE
proxy_read_timeout 3600s; # prevent idle session termination
proxy_cache off;
}
# General API traffic
location / {
limit_req zone=mcp_per_ip burst=20 nodelay;
proxy_pass http://mcp_server;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 30s;
}
}
In your MCP server, set trustProxy: '127.0.0.1' (Fastify) or equivalent to trust X-Forwarded-For only from localhost. Without this, a client can set an arbitrary X-Forwarded-For header and bypass per-IP rate limiting.
Reload nginx with nginx -t && systemctl reload nginx — reload replaces worker processes gracefully without dropping existing connections, unlike restart.
Fly.io: PaaS Deployment with MCP-Specific Caveats
Fly.io provides managed TLS, global anycast routing, rolling deploys, and integrated secrets management — everything a Linux VPS deployment needs nginx and systemd to handle, provided as platform services. Two Fly.io defaults must be changed for MCP servers.
idle_timeout: the most common Fly.io MCP failure
Fly.io's load balancer terminates HTTP connections idle for 60 seconds by default. For SSE, "idle" means no bytes have been exchanged on the connection — not that no tool calls are in flight. An LLM thinking between tool calls produces an idle SSE connection; after 60 seconds, Fly closes it at the load balancer layer. The MCP server process never sees the close; it continues waiting on a dead connection while the client must reinitialise.
Set http_options.idle_timeout = 3600 in fly.toml:
[[services]]
internal_port = 3000
protocol = "tcp"
[services.concurrency]
type = "connections"
hard_limit = 200
soft_limit = 150
[[services.http_checks]]
interval = "10s"
grace_period = "15s"
method = "get"
path = "/health"
timeout = "5s"
[services.http_options]
idle_timeout = 3600 # match maximum realistic session length
Session affinity: single machine vs. multi-machine
Fly distributes incoming connections across machines by connection count. If you run two machines and a client's SSE connection lands on machine A, subsequent HTTP requests to the same server may land on machine B — which has no record of the session. For most indie MCP deployments, the correct answer is one machine: a single Fly shared-cpu-1x instance at 512 MB RAM handles 50–200 concurrent sessions within the Node.js event loop, and there is no session-affinity problem with one machine.
If you need multi-machine for availability, externalise session state to Fly Postgres or Upstash Redis. Each machine stores session metadata in the shared store; any machine can resume a session started on another.
auto_stop_machines and cold starts
Fly's auto_stop_machines is cost-effective but adds 1–3 seconds of cold start latency when a machine spins up from stopped. For MCP, this appears as elevated connection time on the first probe after the machine stops — AliveMCP distinguishes this pattern from genuine slowness (a cold-start spike that resolves on the next probe versus a sustained latency increase). Set min_machines_running = 1 to keep one machine always warm if cold-start latency is unacceptable.
fly secrets for credential injection
fly secrets set DATABASE_URL="postgres://..." encrypts the secret at rest and injects it as an environment variable, triggering a rolling restart of all machines. This is the Fly.io equivalent of systemd's EnvironmentFile pattern — credentials are never in version control or fly.toml.
Zero-Downtime Deployment: The Cross-Cutting Concern
Zero-downtime deployment is not a deployment platform feature — it is a pattern you implement in your MCP server application that makes every other deployment mechanism safe. PM2 graceful reload, systemd service restart, Fly.io rolling deploy, and Kubernetes rolling update all work by starting a new process and stopping the old one. Whether active sessions survive that transition depends entirely on whether your application implements a drain handler.
The drain handler
A drain handler implements a state machine: the process transitions from ready to draining when it receives SIGTERM, stops accepting new connections, signals the load balancer that it is no longer healthy, waits for active sessions to complete or timeout, then exits.
type ServerState = 'starting' | 'ready' | 'draining' | 'stopped';
let state: ServerState = 'starting';
const activeSessions = new Map<string, Session>();
const DRAIN_TIMEOUT_MS = 25_000;
// Health endpoint — load balancer removes this instance when draining
app.get('/health', (_req, res) => {
if (state === 'draining' || state === 'stopped') {
res.status(503).json({ status: 'draining' });
} else {
res.json({ status: 'ok', sessions: activeSessions.size });
}
});
async function drain() {
state = 'draining';
// Stop accepting new connections — existing connections remain open
httpServer.close();
// Wait for active sessions to complete or timeout
const deadline = Date.now() + DRAIN_TIMEOUT_MS;
while (activeSessions.size > 0 && Date.now() < deadline) {
await new Promise(r => setTimeout(r, 250));
}
state = 'stopped';
process.exit(0);
}
process.on('SIGTERM', drain);
process.on('SIGINT', drain); // PM2 reload sends SIGINT
The health endpoint returning 503 during drain is the load balancer signal. nginx upstream health checks, Kubernetes readiness probes, and Fly.io HTTP checks all remove the instance from rotation when they see 503 — before new connections are routed to a draining instance.
Kubernetes rolling update
In Kubernetes, the drain handler pairs with a rolling update strategy that ensures the old pod has time to drain before new traffic stops reaching it:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # never reduce below replica count
maxSurge: 1 # allow one extra pod during update
template:
spec:
terminationGracePeriodSeconds: 60 # must exceed DRAIN_TIMEOUT_MS (25s) by margin
containers:
- name: mcp-server
readinessProbe:
httpGet:
path: /health
port: 3000
periodSeconds: 5
failureThreshold: 2
livenessProbe:
httpGet:
path: /health
port: 3000
periodSeconds: 10
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["sleep", "5"] # endpoint controller lag before SIGTERM
The preStop: sleep 5 is necessary because there is a lag between Kubernetes deregistering the pod from the Endpoints object and the load balancer actually stopping routing traffic to it. Without the pre-stop pause, a few requests may land on a pod that has already begun draining.
Post-deploy smoke test
Automate a smoke test after each deploy to verify the new process handles the full MCP protocol, not just TCP connections:
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';
import crypto from 'crypto';
async function smokeTest(url: string) {
const client = new Client({ name: 'smoke-test', version: '1.0.0' }, {});
const transport = new SSEClientTransport(new URL(url));
await client.connect(transport);
// Verify protocol version
const info = client.getServerVersion();
if (!info?.protocolVersion) throw new Error('no protocol version in initialize response');
// List tools and compare schema hash against committed baseline
const { tools } = await client.listTools();
const schemaHash = crypto
.createHash('sha256')
.update(JSON.stringify(tools.sort((a, b) => a.name.localeCompare(b.name))))
.digest('hex');
const baseline = process.env.EXPECTED_TOOL_SCHEMA_HASH;
if (baseline && schemaHash !== baseline) {
throw new Error(`tool schema hash mismatch: expected ${baseline}, got ${schemaHash}`);
}
await client.close();
console.log('smoke test passed');
}
smokeTest(process.argv[2]).catch(err => { console.error(err); process.exit(1); });
Exit code 1 on failure integrates with CI/CD systems to trigger automatic rollback — a deploy that passes the HTTP health check but fails the MCP smoke test is caught before users are affected.
Composition: Which Stack for Which Context
The five concerns do not all apply equally to every deployment. Different contexts require different combinations.
| Deployment context | Process manager | Reverse proxy | Drain handler | Notes |
|---|---|---|---|---|
| Single Linux VPS, one developer | PM2 fork mode | nginx | SIGINT + SIGTERM, wait_ready: true |
PM2 graceful reload for zero-downtime; systemd supervises PM2 for boot integration |
| Single Linux VPS, team deployment | systemd directly | nginx | SIGTERM drain, TimeoutStopSec=35, Type=notify |
No PM2; deploy.sh runs rsync + systemctl restart + smoke test + rollback |
| PaaS, one developer, cost-sensitive | Fly.io managed | Fly.io managed | SIGTERM drain | Set idle_timeout=3600; min_machines_running=1; single machine avoids session-affinity problem |
| Kubernetes, small team | Kubernetes pod | nginx Ingress or service mesh | SIGTERM drain, preStop: sleep 5, terminationGracePeriodSeconds=60 |
Rolling update with maxUnavailable=0; post-deploy smoke test; readiness probe on /health |
The drain handler is the constant across all four contexts. PM2, systemd, Fly.io, and Kubernetes all terminate the old process by sending a signal — SIGTERM or SIGINT. Without a drain handler, all four deployment approaches drop sessions. With it, all four can achieve zero session interruption.
Introduction order
If you are adding these concerns to an existing MCP server, the right order is:
- Drain handler first. This is the highest-value addition. Even without PM2 or systemd configured optimally, a drain handler means your next manual
node index.jsrestart will be graceful. - nginx second. Fixes
proxy_bufferingand idle timeout immediately, before you work on any other infrastructure. - systemd or PM2 third. Choose systemd for simplicity and native Linux integration; choose PM2 if you want cluster mode or richer log management. Not both — PM2 as a direct supervisor and systemd as a supervisor introduce two layers of SIGTERM handling that must both be configured correctly.
- Fly.io or Kubernetes last. These replace the VPS setup entirely; add them when you need geographic distribution, managed TLS, or team-scale deployment tooling.
What External Probes See That Process Managers Cannot
PM2, systemd, nginx, and Fly.io all have health-check mechanisms. They share a common limitation: they verify that the process is running and responding to HTTP requests. They do not verify that the process is correctly handling the MCP protocol.
Consider these failure modes:
- Process running, MCP broken. A deploy introduces a bug in the
initializehandler — the process starts, nginx routes connections to it, the Kubernetes readiness probe returns 200 from/health, but every MCP session fails at theinitializestage. The process manager reports healthy. Users see every LLM task fail. - Drain misconfigured.
TimeoutStopSecis smaller thanDRAIN_TIMEOUT_MS. Every deploy kills the process mid-drain — sessions are lost — but the rolling update completes successfully, systemd reports success, and the new version reports healthy. The session drops are invisible to all internal monitoring. - nginx misconfigured post-certbot renewal. Certbot renews the TLS certificate and reloads nginx. The nginx config has a syntax error introduced in the last deploy that
nginx -tdid not catch. nginx fails to reload; new connections get a TLS error. The MCP server process is healthy; systemd reports healthy; the failure is at the proxy layer. - Fly.io idle_timeout not set. Sessions appear healthy for 59 seconds, then terminate. The MCP server process never sees an error — the Fly load balancer closes the connection silently. Session drops are invisible to application-layer monitoring.
AliveMCP probes from outside the deployment stack. It connects via the full MCP protocol — SSE transport, initialize handshake, tool call — and measures whether the server correctly responds at each stage. A process that is running but not responding to MCP protocol requests appears as a probe failure, not a health-check pass. A deploy that misconfigures the drain and drops sessions during the rolling update appears as an elevated error rate in the 90-day history — visible to your team before it becomes a support incident.
See also: MCP server health check patterns, MCP server uptime monitoring, and the observability stack guide for the internal instrumentation that complements external probing.
Related Guides
- MCP server PM2 — fork vs. cluster mode,
ecosystem.config.js, log rotation, startup integration - MCP server systemd — unit file,
TimeoutStopSec,Type=notify, security hardening - MCP server nginx reverse proxy —
proxy_buffering off, rate limiting, TLS, access logging - MCP server Fly.io deployment —
idle_timeout, session affinity,fly secrets, volume storage - MCP server zero-downtime deployment — drain handler, rolling update, blue-green, smoke test
- MCP Server Observability Stack Guide — OpenTelemetry, Prometheus metrics, structured logging
- MCP Server Authentication and Authorization Guide — JWT validation, JWKS rotation, RBAC
- MCP Server Infrastructure Operations Guide — dependency injection, load balancing, async work
- AliveMCP — external uptime monitoring for MCP servers