Guide · Deployment

MCP server PM2

PM2 is the most widely used process manager for Node.js servers on Linux VPS instances. It handles automatic restarts on crash, cluster mode for multi-core utilisation, memory-limit restarts to contain leaks, and startup integration so your MCP server survives reboots. PM2 has one critical interaction with MCP servers that plain Node.js REST servers do not have: SSE transport creates long-lived connections that belong to a specific worker process. In cluster mode, if a worker is reloaded, all SSE sessions on that worker are terminated. This guide covers how to configure PM2 correctly for MCP servers, including sticky sessions via nginx upstream hashing and graceful reload that drains active sessions before replacing a worker.

TL;DR

Run a single-instance MCP server with exec_mode: "fork" (not cluster) to avoid the sticky-session problem. If you need multi-core, pair cluster mode with nginx ip_hash sticky routing and implement a SIGINT drain handler in each worker. Set max_memory_restart: "512M" and kill_timeout: 30000 to give workers 30 seconds to drain before PM2 force-kills them. Run pm2 startup + pm2 save so the process survives reboots.

Fork mode vs. cluster mode for MCP servers

PM2 offers two execution modes. Fork mode runs a single Node.js process — PM2 restarts it on crash but it uses only one CPU core. Cluster mode forks N worker processes (one per core by default) and load-balances incoming TCP connections across them via Node's built-in cluster module.

Mode	CPU cores used	SSE session affinity	When to use
`fork`	1	Guaranteed (one process)	Most MCP servers — simple, no sticky-session risk
`cluster`	All (configurable)	Not guaranteed — requires sticky routing	High-traffic servers on multi-core VPS with nginx `ip_hash`

The problem with cluster mode and MCP: when PM2 reloads a worker (during pm2 reload), the old worker receives SIGINT and should drain its sessions. New connections from the load balancer immediately route to the replacement worker. But an SSE client that was connected to the old worker must either close its SSE connection and reconnect to the new worker (restarting the MCP session) or hold the connection open until the drain timeout expires. There is no transparent migration of MCP session state between workers.

For most production MCP servers on a VPS, fork mode is the right choice. The throughput ceiling of a single Node.js event loop serving MCP tool calls is well above what typical traffic demands — the bottleneck is the downstream API or database, not the event loop. Use cluster mode only when CPU-intensive tool handlers (JSON parsing of large payloads, in-process data transformation) are saturating a single core.

ecosystem.config.js

// ecosystem.config.js
module.exports = {
  apps: [
    {
      name: 'mcp-server',
      script: 'dist/server.js',
      exec_mode: 'fork',          // single process — avoids sticky-session complexity
      instances: 1,
      node_args: '--max-old-space-size=400',  // heap cap below OS memory limit

      // Restart on memory leak containment (before OOM kill)
      max_memory_restart: '512M',

      // Give the process 30 seconds to drain SSE sessions before force-kill
      kill_timeout: 30000,

      // Exponential back-off on repeated crash restarts
      // Prevents a crash loop from hammering external dependencies
      restart_delay: 1000,
      exp_backoff_restart_delay: 100,
      max_restarts: 10,
      min_uptime: '10s',  // a restart is only counted if the app was up for less than 10s

      // Environment variables
      env: {
        NODE_ENV: 'production',
        PORT: '3000',
        LOG_LEVEL: 'info',
      },
      env_development: {
        NODE_ENV: 'development',
        LOG_LEVEL: 'debug',
      },

      // Merge stdout and stderr into a single log stream
      merge_logs: true,
      log_date_format: 'YYYY-MM-DDTHH:mm:ss.SSSZ',

      // Do not watch for file changes in production — use pm2 reload for updates
      watch: false,

      // Wait for the app to be ready before marking it as online
      // Requires app.emit('ready') in your server code (see below)
      wait_ready: true,
      listen_timeout: 15000,  // max time to wait for 'ready' event
    },
  ],
};

The wait_ready: true option tells PM2 to wait for a process.send('ready') call from the application before routing traffic to the new process during a reload. Without it, PM2 marks the process online the moment it starts — before the database is initialised or the JWKS cache is loaded. The application must call process.send('ready') after all startup tasks complete.

Startup sequence with process.send('ready')

// src/server.ts
import Fastify from 'fastify';

const app = Fastify({ logger: { level: process.env.LOG_LEVEL ?? 'info' } });

async function start() {
  // All startup tasks before accepting traffic
  await initDatabase();
  await loadJwksCache();
  await registerMcpRoutes(app);

  await app.listen({ port: parseInt(process.env.PORT ?? '3000', 10), host: '0.0.0.0' });

  // Signal PM2 that the server is ready to receive traffic
  // This unblocks pm2 reload — the old worker will be killed only after this fires
  if (process.send) {
    process.send('ready');
    app.log.info('Sent ready signal to PM2');
  }
}

// Graceful shutdown on SIGINT (sent by PM2 during reload/stop)
process.on('SIGINT', async () => {
  app.log.info('SIGINT received — starting graceful drain');
  // Stop accepting new connections
  await app.close();
  // Active SSE sessions will close when clients detect the connection drop
  // The kill_timeout (30s) gives them time to reconnect to the new worker
  process.exit(0);
});

start().catch(err => {
  app.log.error(err, 'Startup failed');
  process.exit(1);
});

PM2 sends SIGINT (not SIGTERM) to managed processes by default. If your server is also run under Docker or systemd, those runtimes send SIGTERM. Handle both signals with the same drain logic to avoid environment-specific shutdown differences. See MCP server graceful shutdown for the full session-drain implementation with a configurable timeout.

Cluster mode with nginx sticky routing

If you need cluster mode, configure nginx to route each client to the same upstream worker for the lifetime of its SSE connection. The reliable sticky mechanism for MCP is ip_hash (hash on client IP). Cookie-based sticky sessions (nginx Plus sticky cookie) work too but require the client to send cookies — not all MCP clients do.

# /etc/nginx/sites-available/mcp-server
upstream mcp_workers {
  # ip_hash ensures a client always routes to the same upstream worker
  # Required for SSE sessions to survive across multiple PM2 cluster workers
  ip_hash;

  server 127.0.0.1:3001;  # worker 0
  server 127.0.0.1:3002;  # worker 1
  server 127.0.0.1:3003;  # worker 2
  server 127.0.0.1:3004;  # worker 3
}

server {
  listen 443 ssl http2;
  server_name your-mcp-server.example.com;

  # SSE requires long-lived connections — disable proxy buffering
  proxy_buffering off;
  proxy_read_timeout 3600s;  # 1 hour — match your max session lifetime
  proxy_send_timeout 3600s;

  location / {
    proxy_pass http://mcp_workers;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "";  # Keep-alive for SSE
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
}

// ecosystem.config.js — cluster mode variant
{
  exec_mode: 'cluster',
  instances: 'max',  // one worker per CPU core
  // Each worker listens on its own port so nginx upstream can address them individually
  // Pass the port via environment variable offset: PORT_BASE + instance_id
  env: { PORT_BASE: '3001' },
}

Note: Node's built-in cluster module shares a single port across workers. For nginx's ip_hash upstream to address individual workers, each worker must bind to its own port. This requires the worker to read its port from process.env.pm_id (PM2 injects the instance index) and compute PORT_BASE + pm_id. This is more complex than fork mode — the simpler production setup is one MCP server per VPS, fork mode, and a load balancer at the VPS level rather than the PM2 level. See MCP server load balancing for the multi-instance architecture.

Log rotation with pm2-logrotate

PM2 writes logs to ~/.pm2/logs/ by default. Without log rotation, these files grow unbounded. The pm2-logrotate module handles rotation at a configurable size or schedule.

# Install log rotation module (runs inside PM2's module system)
pm2 install pm2-logrotate

# Configure rotation (examples)
pm2 set pm2-logrotate:max_size 50M      # rotate when file exceeds 50 MB
pm2 set pm2-logrotate:retain 7          # keep 7 rotated files
pm2 set pm2-logrotate:compress true     # gzip rotated files
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss
pm2 set pm2-logrotate:rotateModule true # also rotate PM2's own log

# Verify configuration
pm2 conf pm2-logrotate

For production MCP servers, ship logs to a log aggregation backend rather than relying on local file rotation. Configure your MCP server to write JSON to stdout, and configure a log shipper (Promtail, Filebeat, or Vector) to read from PM2's stdout stream and forward to your aggregation stack. See MCP server log aggregation for Grafana Loki and Elasticsearch configurations.

Startup on boot: pm2 startup and pm2 save

# Generate a startup script for the current init system (systemd on modern Linux)
# Run as the user that owns the PM2 daemon
pm2 startup

# PM2 will print a command to run as root, e.g.:
# sudo env PATH=$PATH:/usr/bin /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u ubuntu --hp /home/ubuntu
# Run that command.

# Save the current process list so PM2 restores it on boot
pm2 save

# Verify the systemd unit was installed
systemctl status pm2-ubuntu  # replace 'ubuntu' with your username

# After the next reboot, verify the MCP server started
pm2 list

The pm2 startup command generates a systemd unit that starts the PM2 daemon at boot under the correct user, with the correct PATH. The daemon then starts all processes listed in the saved process list (written by pm2 save). Run pm2 save every time you add or remove a process from PM2, otherwise the saved list will not reflect the current state.

If you use systemd directly (without PM2), see MCP server systemd for a native unit file that handles session draining and automatic restart without the PM2 daemon layer.

Common PM2 operations

# Start from ecosystem file
pm2 start ecosystem.config.js --env production

# Graceful reload (zero-downtime if wait_ready + SIGINT drain implemented)
# PM2 starts a new process, waits for 'ready', then sends SIGINT to the old process
pm2 reload mcp-server

# Hard restart (kills immediately — drops all active SSE sessions)
pm2 restart mcp-server

# Monitor in real time (CPU, memory, logs)
pm2 monit

# Display process list with status, memory, CPU
pm2 list

# Tail logs
pm2 logs mcp-server --lines 100

# Flush logs (truncate log files)
pm2 flush mcp-server

# Show detailed info including environment and metadata
pm2 show mcp-server

# Stop without removing from process list
pm2 stop mcp-server

# Remove from process list
pm2 delete mcp-server

Use pm2 reload (not pm2 restart) for deployments. With wait_ready: true and a process.send('ready') call in the server, pm2 reload is zero-downtime: the new process starts and becomes healthy before the old process is stopped. Without wait_ready, there is a window between the old process stopping and the new process accepting connections. See MCP server zero-downtime deployment for the full reload sequence with health check gates.

Monitoring PM2-managed MCP servers with AliveMCP

PM2 monitors the process from the inside: it knows whether the Node.js process is running, how much memory it uses, and how many times it has restarted. What PM2 cannot observe is whether the MCP server is reachable from outside the host: whether nginx is routing to it correctly, whether TLS is valid, whether the MCP initialize handshake completes successfully from a remote client's perspective.

AliveMCP probes the public endpoint every 60 seconds, validates the full MCP protocol handshake, and reports the result on a public status page. PM2 restart loops (the server crashes and restarts repeatedly) are visible to AliveMCP as a high restart rate — the /health endpoint returns 200 between restarts but the MCP handshake may fail during the restart window. AliveMCP's 60-second probe cadence detects restart loops within two probe windows (2 minutes) regardless of how fast PM2 restarts the process. See MCP server uptime monitoring for the probe sequence.