Guide · Deployment
MCP server PM2
PM2 is the most widely used process manager for Node.js servers on Linux VPS instances. It handles automatic restarts on crash, cluster mode for multi-core utilisation, memory-limit restarts to contain leaks, and startup integration so your MCP server survives reboots. PM2 has one critical interaction with MCP servers that plain Node.js REST servers do not have: SSE transport creates long-lived connections that belong to a specific worker process. In cluster mode, if a worker is reloaded, all SSE sessions on that worker are terminated. This guide covers how to configure PM2 correctly for MCP servers, including sticky sessions via nginx upstream hashing and graceful reload that drains active sessions before replacing a worker.
TL;DR
Run a single-instance MCP server with exec_mode: "fork" (not cluster) to avoid the sticky-session problem. If you need multi-core, pair cluster mode with nginx ip_hash sticky routing and implement a SIGINT drain handler in each worker. Set max_memory_restart: "512M" and kill_timeout: 30000 to give workers 30 seconds to drain before PM2 force-kills them. Run pm2 startup + pm2 save so the process survives reboots.
Fork mode vs. cluster mode for MCP servers
PM2 offers two execution modes. Fork mode runs a single Node.js process — PM2 restarts it on crash but it uses only one CPU core. Cluster mode forks N worker processes (one per core by default) and load-balances incoming TCP connections across them via Node's built-in cluster module.
| Mode | CPU cores used | SSE session affinity | When to use |
|---|---|---|---|
fork | 1 | Guaranteed (one process) | Most MCP servers — simple, no sticky-session risk |
cluster | All (configurable) | Not guaranteed — requires sticky routing | High-traffic servers on multi-core VPS with nginx ip_hash |
The problem with cluster mode and MCP: when PM2 reloads a worker (during pm2 reload), the old worker receives SIGINT and should drain its sessions. New connections from the load balancer immediately route to the replacement worker. But an SSE client that was connected to the old worker must either close its SSE connection and reconnect to the new worker (restarting the MCP session) or hold the connection open until the drain timeout expires. There is no transparent migration of MCP session state between workers.
For most production MCP servers on a VPS, fork mode is the right choice. The throughput ceiling of a single Node.js event loop serving MCP tool calls is well above what typical traffic demands — the bottleneck is the downstream API or database, not the event loop. Use cluster mode only when CPU-intensive tool handlers (JSON parsing of large payloads, in-process data transformation) are saturating a single core.
ecosystem.config.js
// ecosystem.config.js
module.exports = {
apps: [
{
name: 'mcp-server',
script: 'dist/server.js',
exec_mode: 'fork', // single process — avoids sticky-session complexity
instances: 1,
node_args: '--max-old-space-size=400', // heap cap below OS memory limit
// Restart on memory leak containment (before OOM kill)
max_memory_restart: '512M',
// Give the process 30 seconds to drain SSE sessions before force-kill
kill_timeout: 30000,
// Exponential back-off on repeated crash restarts
// Prevents a crash loop from hammering external dependencies
restart_delay: 1000,
exp_backoff_restart_delay: 100,
max_restarts: 10,
min_uptime: '10s', // a restart is only counted if the app was up for less than 10s
// Environment variables
env: {
NODE_ENV: 'production',
PORT: '3000',
LOG_LEVEL: 'info',
},
env_development: {
NODE_ENV: 'development',
LOG_LEVEL: 'debug',
},
// Merge stdout and stderr into a single log stream
merge_logs: true,
log_date_format: 'YYYY-MM-DDTHH:mm:ss.SSSZ',
// Do not watch for file changes in production — use pm2 reload for updates
watch: false,
// Wait for the app to be ready before marking it as online
// Requires app.emit('ready') in your server code (see below)
wait_ready: true,
listen_timeout: 15000, // max time to wait for 'ready' event
},
],
};
The wait_ready: true option tells PM2 to wait for a process.send('ready') call from the application before routing traffic to the new process during a reload. Without it, PM2 marks the process online the moment it starts — before the database is initialised or the JWKS cache is loaded. The application must call process.send('ready') after all startup tasks complete.
Startup sequence with process.send('ready')
// src/server.ts
import Fastify from 'fastify';
const app = Fastify({ logger: { level: process.env.LOG_LEVEL ?? 'info' } });
async function start() {
// All startup tasks before accepting traffic
await initDatabase();
await loadJwksCache();
await registerMcpRoutes(app);
await app.listen({ port: parseInt(process.env.PORT ?? '3000', 10), host: '0.0.0.0' });
// Signal PM2 that the server is ready to receive traffic
// This unblocks pm2 reload — the old worker will be killed only after this fires
if (process.send) {
process.send('ready');
app.log.info('Sent ready signal to PM2');
}
}
// Graceful shutdown on SIGINT (sent by PM2 during reload/stop)
process.on('SIGINT', async () => {
app.log.info('SIGINT received — starting graceful drain');
// Stop accepting new connections
await app.close();
// Active SSE sessions will close when clients detect the connection drop
// The kill_timeout (30s) gives them time to reconnect to the new worker
process.exit(0);
});
start().catch(err => {
app.log.error(err, 'Startup failed');
process.exit(1);
});
PM2 sends SIGINT (not SIGTERM) to managed processes by default. If your server is also run under Docker or systemd, those runtimes send SIGTERM. Handle both signals with the same drain logic to avoid environment-specific shutdown differences. See MCP server graceful shutdown for the full session-drain implementation with a configurable timeout.
Cluster mode with nginx sticky routing
If you need cluster mode, configure nginx to route each client to the same upstream worker for the lifetime of its SSE connection. The reliable sticky mechanism for MCP is ip_hash (hash on client IP). Cookie-based sticky sessions (nginx Plus sticky cookie) work too but require the client to send cookies — not all MCP clients do.
# /etc/nginx/sites-available/mcp-server
upstream mcp_workers {
# ip_hash ensures a client always routes to the same upstream worker
# Required for SSE sessions to survive across multiple PM2 cluster workers
ip_hash;
server 127.0.0.1:3001; # worker 0
server 127.0.0.1:3002; # worker 1
server 127.0.0.1:3003; # worker 2
server 127.0.0.1:3004; # worker 3
}
server {
listen 443 ssl http2;
server_name your-mcp-server.example.com;
# SSE requires long-lived connections — disable proxy buffering
proxy_buffering off;
proxy_read_timeout 3600s; # 1 hour — match your max session lifetime
proxy_send_timeout 3600s;
location / {
proxy_pass http://mcp_workers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection ""; # Keep-alive for SSE
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
// ecosystem.config.js — cluster mode variant
{
exec_mode: 'cluster',
instances: 'max', // one worker per CPU core
// Each worker listens on its own port so nginx upstream can address them individually
// Pass the port via environment variable offset: PORT_BASE + instance_id
env: { PORT_BASE: '3001' },
}
Note: Node's built-in cluster module shares a single port across workers. For nginx's ip_hash upstream to address individual workers, each worker must bind to its own port. This requires the worker to read its port from process.env.pm_id (PM2 injects the instance index) and compute PORT_BASE + pm_id. This is more complex than fork mode — the simpler production setup is one MCP server per VPS, fork mode, and a load balancer at the VPS level rather than the PM2 level. See MCP server load balancing for the multi-instance architecture.
Log rotation with pm2-logrotate
PM2 writes logs to ~/.pm2/logs/ by default. Without log rotation, these files grow unbounded. The pm2-logrotate module handles rotation at a configurable size or schedule.
# Install log rotation module (runs inside PM2's module system)
pm2 install pm2-logrotate
# Configure rotation (examples)
pm2 set pm2-logrotate:max_size 50M # rotate when file exceeds 50 MB
pm2 set pm2-logrotate:retain 7 # keep 7 rotated files
pm2 set pm2-logrotate:compress true # gzip rotated files
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss
pm2 set pm2-logrotate:rotateModule true # also rotate PM2's own log
# Verify configuration
pm2 conf pm2-logrotate
For production MCP servers, ship logs to a log aggregation backend rather than relying on local file rotation. Configure your MCP server to write JSON to stdout, and configure a log shipper (Promtail, Filebeat, or Vector) to read from PM2's stdout stream and forward to your aggregation stack. See MCP server log aggregation for Grafana Loki and Elasticsearch configurations.
Startup on boot: pm2 startup and pm2 save
# Generate a startup script for the current init system (systemd on modern Linux)
# Run as the user that owns the PM2 daemon
pm2 startup
# PM2 will print a command to run as root, e.g.:
# sudo env PATH=$PATH:/usr/bin /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u ubuntu --hp /home/ubuntu
# Run that command.
# Save the current process list so PM2 restores it on boot
pm2 save
# Verify the systemd unit was installed
systemctl status pm2-ubuntu # replace 'ubuntu' with your username
# After the next reboot, verify the MCP server started
pm2 list
The pm2 startup command generates a systemd unit that starts the PM2 daemon at boot under the correct user, with the correct PATH. The daemon then starts all processes listed in the saved process list (written by pm2 save). Run pm2 save every time you add or remove a process from PM2, otherwise the saved list will not reflect the current state.
If you use systemd directly (without PM2), see MCP server systemd for a native unit file that handles session draining and automatic restart without the PM2 daemon layer.
Common PM2 operations
# Start from ecosystem file
pm2 start ecosystem.config.js --env production
# Graceful reload (zero-downtime if wait_ready + SIGINT drain implemented)
# PM2 starts a new process, waits for 'ready', then sends SIGINT to the old process
pm2 reload mcp-server
# Hard restart (kills immediately — drops all active SSE sessions)
pm2 restart mcp-server
# Monitor in real time (CPU, memory, logs)
pm2 monit
# Display process list with status, memory, CPU
pm2 list
# Tail logs
pm2 logs mcp-server --lines 100
# Flush logs (truncate log files)
pm2 flush mcp-server
# Show detailed info including environment and metadata
pm2 show mcp-server
# Stop without removing from process list
pm2 stop mcp-server
# Remove from process list
pm2 delete mcp-server
Use pm2 reload (not pm2 restart) for deployments. With wait_ready: true and a process.send('ready') call in the server, pm2 reload is zero-downtime: the new process starts and becomes healthy before the old process is stopped. Without wait_ready, there is a window between the old process stopping and the new process accepting connections. See MCP server zero-downtime deployment for the full reload sequence with health check gates.
Monitoring PM2-managed MCP servers with AliveMCP
PM2 monitors the process from the inside: it knows whether the Node.js process is running, how much memory it uses, and how many times it has restarted. What PM2 cannot observe is whether the MCP server is reachable from outside the host: whether nginx is routing to it correctly, whether TLS is valid, whether the MCP initialize handshake completes successfully from a remote client's perspective.
AliveMCP probes the public endpoint every 60 seconds, validates the full MCP protocol handshake, and reports the result on a public status page. PM2 restart loops (the server crashes and restarts repeatedly) are visible to AliveMCP as a high restart rate — the /health endpoint returns 200 between restarts but the MCP handshake may fail during the restart window. AliveMCP's 60-second probe cadence detects restart loops within two probe windows (2 minutes) regardless of how fast PM2 restarts the process. See MCP server uptime monitoring for the probe sequence.
Related questions
Should I use PM2 or Docker for production MCP servers?
It depends on your infrastructure. PM2 is the right choice for a bare-metal or VPS Linux server where you want process management without container overhead — no Docker daemon, no container network, no image build pipeline. Docker is the right choice when you need reproducible builds, container isolation, or when you are deploying to a container orchestrator (Kubernetes, Fly.io, Railway). Many teams use both: Docker for the build and image artifact, PM2 inside the container to manage the Node.js process (though tini + a single Node process is cleaner in containers). On a bare Linux VPS, PM2 is simpler and lighter than Docker. See MCP server Docker for the container approach.
How do I deploy a new version with PM2 without downtime?
The sequence: (1) build the new version (npm run build), (2) run pm2 reload mcp-server. With wait_ready: true configured, PM2 starts the new process, waits up to listen_timeout milliseconds for process.send('ready'), then sends SIGINT to the old process. The old process drains its SSE sessions (up to kill_timeout milliseconds) and exits. New connections go to the new process immediately after it signals ready. See MCP server zero-downtime deployment for the full sequence including post-deploy health check gates.
How do I pass secrets to PM2 without writing them to ecosystem.config.js?
Do not write production secrets to ecosystem.config.js — this file is typically committed to git. Instead, set secrets as environment variables before starting PM2 (via a .env file sourced in the shell, or via systemd's EnvironmentFile directive), or use PM2's env_file option pointing to a file that is excluded from git. The application reads secrets from process.env at runtime — PM2 does not need to know the secret values. See MCP server secrets management for the full secret injection pattern.
What does max_memory_restart actually do?
When a PM2-managed process exceeds the max_memory_restart threshold, PM2 sends SIGINT to the process and then starts a fresh one. This is a leak-containment mechanism, not a performance optimisation — a process at 512 MB is likely behaving correctly; a process at 512 MB and growing means there is a memory leak in a tool handler or middleware. The restart buys time until the leak is fixed. Without max_memory_restart, a leaking process eventually triggers the Linux OOM killer, which kills the process immediately without a SIGINT/drain opportunity. Set the threshold 20–30% below the host memory limit so PM2 triggers the graceful restart before the OS forces a hard kill.
Further reading
- MCP server graceful shutdown — SIGINT drain implementation
- MCP server zero-downtime deployment — pm2 reload with health check gates
- MCP server Docker — containerised alternative to PM2
- MCP server systemd — native Linux process management without PM2
- MCP server nginx — reverse proxy configuration for SSE
- MCP server structured logging — JSON logs for PM2 log rotation
- MCP server secrets management — environment variable injection without ecosystem.config.js
- AliveMCP — external uptime monitoring for PM2-managed MCP servers