Deployment guide · 2026-06-04 · Production MCP servers

MCP Server Deployment Guide: PM2, systemd, nginx, Fly.io, and Zero-Downtime Deployment

Deploying an MCP server to production looks similar to deploying any Node.js HTTP server — until the first time you restart the process and every active LLM session drops simultaneously. A conventional REST API server can be restarted freely because each request is independent; a dropped connection causes a retry that succeeds immediately on the new process. MCP servers using Server-Sent Events are different: each session is a long-lived SSE connection backed by session state accumulated through an initialize handshake and subsequent tool calls. Killing the process terminates every active session, and every LLM agent calling your tools must reinitialise from scratch. A complete production deployment system for an MCP server has five concerns: PM2 for process management on Linux VPS, systemd as the native Linux service layer beneath or instead of PM2, nginx as the reverse proxy that handles TLS and SSE-specific buffering, Fly.io for PaaS deployments with idle-timeout and session-affinity caveats, and zero-downtime deployment as the cross-cutting concern that makes the other four work together without dropping sessions. This guide covers them as a system — how each concern addresses a distinct part of the deployment problem, how they compose for different deployment contexts, and what remains invisible to process managers and load balancers that an external probe sees clearly.

TL;DR

Use fork mode, not cluster mode, for most MCP servers under PM2. PM2 cluster mode spawns multiple workers and load-balances connections across them, but SSE connections are bound to a specific worker. When a worker is reloaded, every SSE session on that worker terminates. Fork mode runs a single process — simpler, correct, and sufficient for most indie and small-team MCP deployments.
systemd's TimeoutStopSec must exceed your drain timeout. systemd sends SIGTERM, then SIGKILL after TimeoutStopSec. If your SIGTERM drain handler takes 25 seconds and TimeoutStopSec is 20, systemd kills the process mid-drain and every session in it. Set TimeoutStopSec=35 — five seconds larger than DRAIN_TIMEOUT_MS — so the drain always completes before systemd escalates.
nginx needs two non-default settings for SSE: proxy_buffering off and proxy_read_timeout 3600s. Without proxy_buffering off, nginx buffers the SSE event stream and the client never receives events in real time. Without an extended proxy_read_timeout, nginx closes idle SSE connections after 60 seconds — the default for HTTP proxying.
Fly.io's idle_timeout terminates SSE sessions after 60 seconds of quiet by default. Fly.io closes HTTP connections idle for 60 seconds at the load balancer layer — before your MCP server process sees them end. Set http_options.idle_timeout = 3600 in fly.toml to match the maximum realistic session length.
A SIGTERM drain handler is the single most important piece of zero-downtime deployment. Zero-downtime deployment requires the old process to stop accepting new connections, return HTTP 503 from /health so the load balancer removes it from rotation, wait for active sessions to complete or timeout, and then exit cleanly. Rolling updates, blue-green, and PM2 graceful reload all require this handler to work.
External probes see what process managers cannot. PM2, systemd, and Fly.io know whether the process is running. They do not know whether the process is correctly responding to MCP protocol requests. AliveMCP probes from outside — it detects when a server is running but no longer responding to tool calls, when a deploy caused an elevated error rate, or when a misconfigured drain is cutting sessions short.

Why MCP Deployment Is Different

A conventional HTTP API and an MCP server differ in one deployment-critical way: state per connection.

A REST API is stateless at the connection level. Each HTTP request carries all the information needed to process it. A client that gets a connection error retries the same request against any live instance. You can kill and replace API servers freely — the worst case is one failed request that retries successfully in under a second.

An MCP server accumulates state over the lifetime of a session. The initialize handshake negotiates protocol version, registers tools, and may run expensive setup (database pool acquisition, credential validation, feature flag evaluation). Subsequent tool calls depend on that session context. If the underlying SSE connection drops, the client must reinitialise from scratch — re-running the handshake, re-establishing session state, possibly losing mid-task progress in the LLM's working context.

This distinction drives the deployment constraints that distinguish MCP servers from REST servers:

Concern	REST server	MCP server with SSE
Process restart cost	One failed request, retried immediately	All active sessions terminated; each must reinitialise
Idle connection timeout	No cost — request is complete before timeout fires	Silent session termination mid-task if SSE connection is idle
Load balancing	Any replica can serve any request	SSE client must reach the same process for all tool calls in a session
Health check result	HTTP 200 means requests will succeed	HTTP 200 does not confirm the process correctly handles MCP protocol
Deploy downtime	Seconds acceptable — requests retry	Any downtime interrupts in-progress LLM tasks

The five deployment concerns in this guide address these differences systematically. PM2 and systemd handle the process lifecycle — they restart the server on crash and keep it running through reboots. nginx and Fly.io handle the network boundary — they terminate TLS, enforce rate limits, and must be configured to not silently kill idle SSE connections. Zero-downtime deployment is the concern that ties all four together: without a drain handler that signals the load balancer and waits for active sessions to complete, even a perfect PM2 or systemd configuration will drop sessions on every deploy.

The Five Concerns and Their Roles

Concern	Where it runs	What it provides	What it cannot do alone
PM2	Linux VPS, bare metal	Auto-restart on crash, memory-limit restart, log rotation, startup integration, graceful reload	Does not handle TLS; does not control load balancer routing; graceful reload requires your server to implement a drain handler
systemd	Any Linux distribution	Service lifecycle, SIGTERM → SIGKILL escalation with configurable timeout, credential injection via EnvironmentFile, security sandboxing, journal logging	Does not know when your server is ready to accept traffic; does not handle TLS; does not perform application-level health checks
nginx	Reverse proxy in front of MCP server	TLS termination, SSE buffering control, per-IP rate limiting, structured access logging, certbot integration	Cannot drain application sessions — it routes connections, not MCP sessions; a reload replaces workers and may close long-lived upstream connections
Fly.io	PaaS deployment	Managed TLS, global anycast, rolling deploys, secrets management, volume storage for SQLite	Default idle_timeout silently terminates SSE sessions; session affinity requires explicit configuration or single-machine deployment
Zero-downtime deployment	Application layer, cross-cutting	Drain handler that signals the load balancer, waits for active sessions, then exits; enables PM2 reload, rolling updates, blue-green without session drops	Requires the load balancer (nginx, Fly, Kubernetes) to honour the 503 health signal and stop routing new connections

PM2: Process Management on Linux VPS

PM2 is the most common process manager for Node.js on Linux VPS instances. It watches the process, restarts it on crash, and integrates with Linux init so the server survives reboots. For MCP servers, two configuration decisions matter: whether to use fork or cluster mode, and how to configure the drain timeout.

Fork mode vs. cluster mode

PM2 cluster mode spawns multiple worker processes using Node's cluster module and distributes incoming connections across them. The problem for MCP servers is that SSE connections are stateful — once a client opens an SSE connection to a specific worker and completes initialize, all subsequent tool calls must reach that same worker. PM2's cluster mode load balancing does not guarantee this. In a cluster of four workers, a client that gets routed to worker 2 on initialize may get routed to worker 3 on the next tool call — resulting in a session-not-found error.

Fork mode runs a single process. It does not use multiple CPU cores, but a single Node.js process on a modern VPS handles 50–200 concurrent MCP sessions comfortably within the event loop. For most indie and small-team MCP deployments, fork mode is the right choice: simpler, correct, and requires no sticky-session infrastructure.

A minimal ecosystem.config.js for an MCP server in fork mode:

module.exports = {
  apps: [{
    name: 'mcp-server',
    script: './dist/index.js',
    exec_mode: 'fork',            // single process — no sticky-session problem
    max_memory_restart: '512M',   // contain leaks before OOM kill
    kill_timeout: 30000,          // wait 30s for drain before force-kill
    wait_ready: true,             // PM2 reload waits for process.send('ready')
    listen_timeout: 10000,        // startup timeout before marking unhealthy
    restart_delay: 1000,
    exp_backoff_restart_delay: 100,
    max_restarts: 10,
    min_uptime: '10s',
    env: { NODE_ENV: 'production', PORT: '3000' }
  }]
};

wait_ready: true is the key setting that enables graceful reload. When set, PM2 will not stop the old process until the new process emits process.send('ready'). This means you control exactly when traffic shifts: the new process signals ready only after it has opened database connections, loaded secrets, and is genuinely prepared to accept connections.

async function main() {
  await initDatabase();
  await loadSecrets();
  app.listen(3000, () => {
    if (process.send) {
      process.send('ready');      // signal PM2 the new process is live
    }
  });
}

// PM2 sends SIGINT on reload, SIGTERM on stop — handle both
async function shutdown(signal) {
  await drainActiveSessions();
  process.exit(0);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT',  () => shutdown('SIGINT'));

Note that PM2 sends SIGINT during graceful reload (not SIGTERM). A handler that only listens to SIGTERM will not drain sessions during pm2 reload. Both signals must be handled.

Cluster mode with nginx sticky sessions

If you need multi-core utilisation, cluster mode is possible but requires nginx ip_hash sticky routing to ensure each client's connections consistently reach the same worker. Each worker listens on a different port (derived from PM2_INSTANCE_ID), and nginx hashes the client IP to a stable upstream. This adds operational complexity — a NAT'd client whose public IP changes mid-session will fail — and is generally not worth it until a single process is saturating a single core.

Log rotation and startup integration

Install pm2-logrotate to prevent log files from growing unbounded: pm2 install pm2-logrotate. After configuring the ecosystem file, run pm2 startup (generates a systemd unit for PM2 itself) and pm2 save (writes the current process list to restore on reboot). PM2 effectively becomes a user-space process manager supervised by systemd.

systemd: The Native Linux Service Layer

systemd is the init system on every major Linux distribution. You can use it directly to manage your MCP server (instead of PM2), or use it to supervise PM2 itself. When you manage the MCP server directly with systemd, the configuration decisions that matter most are TimeoutStopSec, Type=notify, and the EnvironmentFile credential injection pattern.

Unit file essentials

[Unit]
Description=MCP Server
After=network.target

[Service]
Type=notify
User=mcp
Group=mcp
WorkingDirectory=/opt/mcp-server
EnvironmentFile=/etc/mcp-server/env
ExecStart=/usr/bin/node dist/index.js
Restart=on-failure
RestartSec=5s
StartLimitBurst=5
StartLimitIntervalSec=300
TimeoutStopSec=35

# Security hardening
PrivateTmp=yes
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/lib/mcp-server
PrivateDevices=yes
ProtectKernelTunables=yes
SystemCallFilter=@system-service

[Install]
WantedBy=multi-user.target

TimeoutStopSec: the most common misconfiguration

When you run systemctl stop mcp-server or deploy a new version, systemd sends SIGTERM to the process. Your drain handler then closes the HTTP listener, sets the health endpoint to return 503, and waits for active sessions to complete or time out. If systemd's TimeoutStopSec expires before the drain completes, systemd escalates to SIGKILL — which immediately terminates the process with all active sessions in it.

The rule is: TimeoutStopSec must exceed DRAIN_TIMEOUT_MS by a margin. If your drain timeout is 25 seconds (DRAIN_TIMEOUT_MS = 25000), set TimeoutStopSec=35. The five-second margin gives the drain code time to complete its cleanup after the last session closes.

Type=notify and sd_notify

Type=notify tells systemd to wait for the process to send sd_notify(READY=1) before marking the service as started. Without it (Type=simple), systemd marks the service started as soon as the process spawns — before database connections are open or secrets are loaded. Traffic may arrive at the process before it is ready.

In Node.js, use the sd-notify npm package:

import sdNotify from 'sd-notify';

async function main() {
  await initDatabase();
  await loadSecrets();
  app.listen(3000, () => {
    sdNotify.ready();     // READY=1 — systemd marks service started
  });
}

process.on('SIGTERM', async () => {
  sdNotify.stopping();   // STOPPING=1 — optional but helps systemd timing
  await drainActiveSessions();
  process.exit(0);
});

EnvironmentFile for credential injection

Credentials should never be in the systemd unit file (which is world-readable via systemctl cat). Use EnvironmentFile=/etc/mcp-server/env, a file owned by root:mcp with mode 640. The file is not in your application repository and not readable by other users. This is equivalent to Fly.io's fly secrets set — the credentials are injected as environment variables at process start without appearing in logs or version control.

nginx: Reverse Proxy with SSE-Specific Configuration

nginx is the most common reverse proxy for MCP servers on Linux VPS instances. It handles TLS termination, HTTP-to-HTTPS redirection, per-IP rate limiting, and structured access logging. Two nginx default settings silently break SSE connections and must be changed for MCP.

proxy_buffering off: the critical SSE setting

nginx buffers proxy responses by default. For SSE, buffering means nginx accumulates events from the upstream MCP server in memory and periodically flushes them to the client in batches — breaking the real-time delivery that SSE provides. The client receives tool responses seconds after they were sent, or not until the buffer fills. Set proxy_buffering off on the SSE location block.

proxy_read_timeout: prevent idle session termination

nginx's default proxy_read_timeout is 60 seconds. It measures time since the last data was received from the upstream. For an SSE connection where the LLM is thinking between tool calls, 60 seconds of silence is normal — but nginx will close the connection and the client will need to reconnect and reinitialise. Set proxy_read_timeout 3600s (one hour) on the SSE location.

Core nginx configuration

limit_req_zone $binary_remote_addr zone=mcp_per_ip:10m rate=30r/m;
limit_req_zone $binary_remote_addr zone=mcp_health:1m rate=5r/s;

upstream mcp_server {
  server 127.0.0.1:3000;
  keepalive 16;              # persistent connections to Node — eliminates per-request TCP overhead
}

server {
  listen 443 ssl;
  server_name example.com;
  # ssl_certificate / ssl_certificate_key managed by certbot

  # Health check — higher rate limit, standard timeout
  location /health {
    limit_req zone=mcp_health burst=10 nodelay;
    proxy_pass http://mcp_server;
    proxy_set_header Host $host;
  }

  # SSE transport — buffering disabled, extended timeout
  location /sse {
    limit_req zone=mcp_per_ip burst=5 nodelay;
    proxy_pass http://mcp_server;
    proxy_http_version 1.1;
    proxy_set_header Connection "";          # keepalive to upstream
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_buffering off;                     # must be off for SSE
    proxy_read_timeout 3600s;               # prevent idle session termination
    proxy_cache off;
  }

  # General API traffic
  location / {
    limit_req zone=mcp_per_ip burst=20 nodelay;
    proxy_pass http://mcp_server;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_read_timeout 30s;
  }
}

In your MCP server, set trustProxy: '127.0.0.1' (Fastify) or equivalent to trust X-Forwarded-For only from localhost. Without this, a client can set an arbitrary X-Forwarded-For header and bypass per-IP rate limiting.

Reload nginx with nginx -t && systemctl reload nginx — reload replaces worker processes gracefully without dropping existing connections, unlike restart.

Fly.io: PaaS Deployment with MCP-Specific Caveats

Fly.io provides managed TLS, global anycast routing, rolling deploys, and integrated secrets management — everything a Linux VPS deployment needs nginx and systemd to handle, provided as platform services. Two Fly.io defaults must be changed for MCP servers.

idle_timeout: the most common Fly.io MCP failure

Fly.io's load balancer terminates HTTP connections idle for 60 seconds by default. For SSE, "idle" means no bytes have been exchanged on the connection — not that no tool calls are in flight. An LLM thinking between tool calls produces an idle SSE connection; after 60 seconds, Fly closes it at the load balancer layer. The MCP server process never sees the close; it continues waiting on a dead connection while the client must reinitialise.

Set http_options.idle_timeout = 3600 in fly.toml:

[[services]]
  internal_port = 3000
  protocol = "tcp"

  [services.concurrency]
    type = "connections"
    hard_limit = 200
    soft_limit = 150

  [[services.http_checks]]
    interval = "10s"
    grace_period = "15s"
    method = "get"
    path = "/health"
    timeout = "5s"

  [services.http_options]
    idle_timeout = 3600         # match maximum realistic session length

Session affinity: single machine vs. multi-machine

Fly distributes incoming connections across machines by connection count. If you run two machines and a client's SSE connection lands on machine A, subsequent HTTP requests to the same server may land on machine B — which has no record of the session. For most indie MCP deployments, the correct answer is one machine: a single Fly shared-cpu-1x instance at 512 MB RAM handles 50–200 concurrent sessions within the Node.js event loop, and there is no session-affinity problem with one machine.

If you need multi-machine for availability, externalise session state to Fly Postgres or Upstash Redis. Each machine stores session metadata in the shared store; any machine can resume a session started on another.

auto_stop_machines and cold starts

Fly's auto_stop_machines is cost-effective but adds 1–3 seconds of cold start latency when a machine spins up from stopped. For MCP, this appears as elevated connection time on the first probe after the machine stops — AliveMCP distinguishes this pattern from genuine slowness (a cold-start spike that resolves on the next probe versus a sustained latency increase). Set min_machines_running = 1 to keep one machine always warm if cold-start latency is unacceptable.

fly secrets for credential injection

fly secrets set DATABASE_URL="postgres://..." encrypts the secret at rest and injects it as an environment variable, triggering a rolling restart of all machines. This is the Fly.io equivalent of systemd's EnvironmentFile pattern — credentials are never in version control or fly.toml.

Zero-Downtime Deployment: The Cross-Cutting Concern

Zero-downtime deployment is not a deployment platform feature — it is a pattern you implement in your MCP server application that makes every other deployment mechanism safe. PM2 graceful reload, systemd service restart, Fly.io rolling deploy, and Kubernetes rolling update all work by starting a new process and stopping the old one. Whether active sessions survive that transition depends entirely on whether your application implements a drain handler.

The drain handler

A drain handler implements a state machine: the process transitions from ready to draining when it receives SIGTERM, stops accepting new connections, signals the load balancer that it is no longer healthy, waits for active sessions to complete or timeout, then exits.

type ServerState = 'starting' | 'ready' | 'draining' | 'stopped';
let state: ServerState = 'starting';
const activeSessions = new Map<string, Session>();
const DRAIN_TIMEOUT_MS = 25_000;

// Health endpoint — load balancer removes this instance when draining
app.get('/health', (_req, res) => {
  if (state === 'draining' || state === 'stopped') {
    res.status(503).json({ status: 'draining' });
  } else {
    res.json({ status: 'ok', sessions: activeSessions.size });
  }
});

async function drain() {
  state = 'draining';

  // Stop accepting new connections — existing connections remain open
  httpServer.close();

  // Wait for active sessions to complete or timeout
  const deadline = Date.now() + DRAIN_TIMEOUT_MS;
  while (activeSessions.size > 0 && Date.now() < deadline) {
    await new Promise(r => setTimeout(r, 250));
  }

  state = 'stopped';
  process.exit(0);
}

process.on('SIGTERM', drain);
process.on('SIGINT',  drain); // PM2 reload sends SIGINT

The health endpoint returning 503 during drain is the load balancer signal. nginx upstream health checks, Kubernetes readiness probes, and Fly.io HTTP checks all remove the instance from rotation when they see 503 — before new connections are routed to a draining instance.

Kubernetes rolling update

In Kubernetes, the drain handler pairs with a rolling update strategy that ensures the old pod has time to drain before new traffic stops reaching it:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 0    # never reduce below replica count
    maxSurge: 1          # allow one extra pod during update

template:
  spec:
    terminationGracePeriodSeconds: 60  # must exceed DRAIN_TIMEOUT_MS (25s) by margin
    containers:
    - name: mcp-server
      readinessProbe:
        httpGet:
          path: /health
          port: 3000
        periodSeconds: 5
        failureThreshold: 2
      livenessProbe:
        httpGet:
          path: /health
          port: 3000
        periodSeconds: 10
        failureThreshold: 3
      lifecycle:
        preStop:
          exec:
            command: ["sleep", "5"]  # endpoint controller lag before SIGTERM

The preStop: sleep 5 is necessary because there is a lag between Kubernetes deregistering the pod from the Endpoints object and the load balancer actually stopping routing traffic to it. Without the pre-stop pause, a few requests may land on a pod that has already begun draining.

Post-deploy smoke test

Automate a smoke test after each deploy to verify the new process handles the full MCP protocol, not just TCP connections:

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';
import crypto from 'crypto';

async function smokeTest(url: string) {
  const client = new Client({ name: 'smoke-test', version: '1.0.0' }, {});
  const transport = new SSEClientTransport(new URL(url));
  await client.connect(transport);

  // Verify protocol version
  const info = client.getServerVersion();
  if (!info?.protocolVersion) throw new Error('no protocol version in initialize response');

  // List tools and compare schema hash against committed baseline
  const { tools } = await client.listTools();
  const schemaHash = crypto
    .createHash('sha256')
    .update(JSON.stringify(tools.sort((a, b) => a.name.localeCompare(b.name))))
    .digest('hex');

  const baseline = process.env.EXPECTED_TOOL_SCHEMA_HASH;
  if (baseline && schemaHash !== baseline) {
    throw new Error(`tool schema hash mismatch: expected ${baseline}, got ${schemaHash}`);
  }

  await client.close();
  console.log('smoke test passed');
}

smokeTest(process.argv[2]).catch(err => { console.error(err); process.exit(1); });

Exit code 1 on failure integrates with CI/CD systems to trigger automatic rollback — a deploy that passes the HTTP health check but fails the MCP smoke test is caught before users are affected.

Composition: Which Stack for Which Context

The five concerns do not all apply equally to every deployment. Different contexts require different combinations.

Deployment context	Process manager	Reverse proxy	Drain handler	Notes
Single Linux VPS, one developer	PM2 fork mode	nginx	SIGINT + SIGTERM, `wait_ready: true`	PM2 graceful reload for zero-downtime; systemd supervises PM2 for boot integration
Single Linux VPS, team deployment	systemd directly	nginx	SIGTERM drain, `TimeoutStopSec=35`, `Type=notify`	No PM2; `deploy.sh` runs rsync + systemctl restart + smoke test + rollback
PaaS, one developer, cost-sensitive	Fly.io managed	Fly.io managed	SIGTERM drain	Set `idle_timeout=3600`; `min_machines_running=1`; single machine avoids session-affinity problem
Kubernetes, small team	Kubernetes pod	nginx Ingress or service mesh	SIGTERM drain, `preStop: sleep 5`, `terminationGracePeriodSeconds=60`	Rolling update with `maxUnavailable=0`; post-deploy smoke test; readiness probe on `/health`

The drain handler is the constant across all four contexts. PM2, systemd, Fly.io, and Kubernetes all terminate the old process by sending a signal — SIGTERM or SIGINT. Without a drain handler, all four deployment approaches drop sessions. With it, all four can achieve zero session interruption.

Introduction order

If you are adding these concerns to an existing MCP server, the right order is:

Drain handler first. This is the highest-value addition. Even without PM2 or systemd configured optimally, a drain handler means your next manual node index.js restart will be graceful.
nginx second. Fixes proxy_buffering and idle timeout immediately, before you work on any other infrastructure.
systemd or PM2 third. Choose systemd for simplicity and native Linux integration; choose PM2 if you want cluster mode or richer log management. Not both — PM2 as a direct supervisor and systemd as a supervisor introduce two layers of SIGTERM handling that must both be configured correctly.
Fly.io or Kubernetes last. These replace the VPS setup entirely; add them when you need geographic distribution, managed TLS, or team-scale deployment tooling.

What External Probes See That Process Managers Cannot

PM2, systemd, nginx, and Fly.io all have health-check mechanisms. They share a common limitation: they verify that the process is running and responding to HTTP requests. They do not verify that the process is correctly handling the MCP protocol.

Consider these failure modes:

Process running, MCP broken. A deploy introduces a bug in the initialize handler — the process starts, nginx routes connections to it, the Kubernetes readiness probe returns 200 from /health, but every MCP session fails at the initialize stage. The process manager reports healthy. Users see every LLM task fail.
Drain misconfigured. TimeoutStopSec is smaller than DRAIN_TIMEOUT_MS. Every deploy kills the process mid-drain — sessions are lost — but the rolling update completes successfully, systemd reports success, and the new version reports healthy. The session drops are invisible to all internal monitoring.
nginx misconfigured post-certbot renewal. Certbot renews the TLS certificate and reloads nginx. The nginx config has a syntax error introduced in the last deploy that nginx -t did not catch. nginx fails to reload; new connections get a TLS error. The MCP server process is healthy; systemd reports healthy; the failure is at the proxy layer.
Fly.io idle_timeout not set. Sessions appear healthy for 59 seconds, then terminate. The MCP server process never sees an error — the Fly load balancer closes the connection silently. Session drops are invisible to application-layer monitoring.

AliveMCP probes from outside the deployment stack. It connects via the full MCP protocol — SSE transport, initialize handshake, tool call — and measures whether the server correctly responds at each stage. A process that is running but not responding to MCP protocol requests appears as a probe failure, not a health-check pass. A deploy that misconfigures the drain and drops sessions during the rolling update appears as an elevated error rate in the 90-day history — visible to your team before it becomes a support incident.

See also: MCP server health check patterns, MCP server uptime monitoring, and the observability stack guide for the internal instrumentation that complements external probing.

Related Guides

MCP server PM2 — fork vs. cluster mode, ecosystem.config.js, log rotation, startup integration
MCP server systemd — unit file, TimeoutStopSec, Type=notify, security hardening
MCP server nginx reverse proxy — proxy_buffering off, rate limiting, TLS, access logging
MCP server Fly.io deployment — idle_timeout, session affinity, fly secrets, volume storage
MCP server zero-downtime deployment — drain handler, rolling update, blue-green, smoke test
MCP Server Observability Stack Guide — OpenTelemetry, Prometheus metrics, structured logging
MCP Server Authentication and Authorization Guide — JWT validation, JWKS rotation, RBAC
MCP Server Infrastructure Operations Guide — dependency injection, load balancing, async work
AliveMCP — external uptime monitoring for MCP servers