Guide · Data Persistence

MCP server database migrations

Database migrations in a conventional REST API are run once per deploy, typically by a CI/CD step before traffic shifts to the new version. An MCP server with long-lived SSE sessions adds two complications. First, migration timing relative to the ready signal: if migrations run after the process manager marks the service as ready, tool handlers execute against an outdated schema before migration completes. The migration must complete before process.send('ready') (PM2) or sd_notify(READY=1) (systemd). Second, multi-replica migration races: on a rolling update with two replicas, both new replicas start simultaneously and both attempt to run migrations — concurrent migrations on the same table produce deadlocks or duplicate migration execution. An advisory lock or a migration-runner-as-init-container pattern solves the race. This guide covers the full migration lifecycle for both single-server and multi-replica MCP deployments.

TL;DR

Run migrations synchronously before signalling ready. For multi-replica deployments, run the migration as a separate init step (Kubernetes init container, Fly.io release_command) rather than in each replica's startup sequence. Keep every migration backward-compatible — the old code version must work against the new schema during a rolling update window.

Migration tools: comparison

Tool	Schema source	Migration file format	Best for
Prisma Migrate	`schema.prisma`	Auto-generated SQL	Teams using Prisma ORM
Drizzle Kit	TypeScript schema files	Auto-generated SQL	Teams using Drizzle ORM
node-pg-migrate	JavaScript migration files	JS/TS with up/down	PostgreSQL, raw SQL control
db-migrate	JS migration files	JS with up/down	Multi-database support
Flyway / Liquibase	SQL/XML files	Versioned SQL files	Enterprise, Java ecosystem
Raw SQL + version table	SQL files in `migrations/`	Plain SQL	Zero-dependency, SQLite

Single-server migration: before the ready signal

// src/server.ts — single-server startup sequence
import { execSync } from 'child_process';
import { runMigrations } from './migrations.js';

async function start() {
  console.log('[startup] Running database migrations...');

  try {
    // For Prisma: prisma migrate deploy applies all pending migrations
    // For Drizzle: drizzle-kit push or drizzle migrate
    // For custom: see the raw SQL runner below
    execSync('npx prisma migrate deploy', {
      stdio: 'inherit',
      timeout: 60_000,  // 60-second timeout — fail fast on stuck migrations
    });
    console.log('[startup] Migrations complete');
  } catch (err) {
    // Migration failure is fatal — do not signal ready
    // The process manager (PM2, systemd) will restart with the current deploy
    // If the migration itself is broken, fix it and redeploy
    console.error('[startup] Migration failed — aborting:', err);
    process.exit(1);
  }

  // Proceed only after migrations succeed
  const httpServer = await startMcpServer();

  // Signal ready — process manager allows traffic now
  if (process.send) process.send('ready');  // PM2 wait_ready
  // For systemd: notify('READY=1')
}

The process.exit(1) on migration failure is intentional. PM2 and systemd interpret a non-zero exit during startup as a failed start attempt and apply the restart policy. After StartLimitBurst failures, the service is marked as failed and alerts fire — which is correct, because a broken migration requires a human (or a new deploy with a fixed migration) to resolve.

Multi-replica migration: advisory lock pattern

When two replicas start simultaneously (Kubernetes rolling update with maxSurge:1, blue-green deploy), both attempt migrations at the same moment. For PostgreSQL, use an advisory lock to serialise the migration across replicas:

// src/migrate.ts — PostgreSQL advisory lock for multi-replica safety
import pg from 'pg';

const MIGRATION_LOCK_ID = 1234567890;  // Arbitrary integer, unique per application

export async function runMigrationsWithLock(databaseUrl: string): Promise {
  const client = new pg.Client({ connectionString: databaseUrl });
  await client.connect();

  try {
    // pg_try_advisory_lock returns true if lock acquired, false if already held
    // The lock is automatically released when this client disconnects
    const { rows } = await client.query(
      'SELECT pg_try_advisory_lock($1) AS acquired',
      [MIGRATION_LOCK_ID]
    );

    if (!rows[0].acquired) {
      // Another replica holds the lock and is running migrations — wait for it
      console.log('[migrations] Waiting for peer to complete migrations...');
      await client.query(
        'SELECT pg_advisory_lock($1)',  // Blocking wait
        [MIGRATION_LOCK_ID]
      );
      console.log('[migrations] Peer completed migrations, proceeding');
      // Lock acquired after peer finished — migrations already applied, proceed
      return;
    }

    // Lock acquired — we are the migration runner
    console.log('[migrations] Acquired migration lock, running migrations...');
    // Run your migrations here
    // execSync('npx prisma migrate deploy', { stdio: 'inherit' });
    console.log('[migrations] Migrations complete');

  } finally {
    // Release lock (also released automatically on disconnect)
    await client.query('SELECT pg_advisory_unlock($1)', [MIGRATION_LOCK_ID]);
    await client.end();
  }
}

For SQLite, concurrent multi-replica migrations are not possible — SQLite does not support multiple concurrent writers. SQLite deployments are inherently single-process, so the race condition does not arise.

Migrations as a separate init step

For Kubernetes, the cleanest approach is a migration init container that runs before the main container starts. The main containers start only after the init container exits successfully:

# kubernetes/deployment.yaml
spec:
  initContainers:
    - name: migrate
      image: your-mcp-server:latest
      command: ["npx", "prisma", "migrate", "deploy"]
      env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: DATABASE_URL
  containers:
    - name: mcp-server
      image: your-mcp-server:latest
      # Main container starts only after migrate init container exits 0
      # No migration logic needed in the application startup sequence

For Fly.io, the equivalent is release_command in fly.toml:

[deploy]
  release_command = "npx prisma migrate deploy"
  # Fly runs this command before routing traffic to the new machines
  # If it exits non-zero, the deploy is aborted and traffic stays on the old machines

Backward-compatible migration patterns

During a rolling update, the old code version and new code version run simultaneously for a window of 10–60 seconds. Any migration that removes a column used by old code, or adds a NOT NULL column without a default, breaks old code during this window. The backward-compatible migration sequence for adding a column:

Step	Deploy	Migration	Code change
1	Deploy N	Add column with DEFAULT (nullable or default value)	Old code ignores new column
2	Deploy N+1	Backfill column values if needed	New code reads and writes new column
3	Deploy N+2	Add NOT NULL constraint (optional)	Code depends on column being present

The backward-compatible pattern for removing a column: remove code references first (Deploy N), then drop the column (Deploy N+1). Never remove a column in the same deploy that removes the last code reference — the rolling update window means old code instances hit the new schema for tens of seconds.

Raw SQL migration runner (zero dependencies)

// src/migrations.ts — minimal migration runner for SQLite (no ORM required)
import Database from 'better-sqlite3';
import fs from 'fs';
import path from 'path';

export function runMigrations(db: Database.Database, migrationsDir: string): void {
  // Track applied migrations in a dedicated table
  db.exec(`
    CREATE TABLE IF NOT EXISTS _migrations (
      id         INTEGER PRIMARY KEY AUTOINCREMENT,
      filename   TEXT    NOT NULL UNIQUE,
      applied_at INTEGER NOT NULL DEFAULT (unixepoch())
    )
  `);

  const applied = new Set(
    (db.prepare('SELECT filename FROM _migrations').all() as { filename: string }[])
      .map(r => r.filename)
  );

  const files = fs.readdirSync(migrationsDir)
    .filter(f => f.endsWith('.sql'))
    .sort();  // Alphabetical order: 001_initial.sql, 002_add_users.sql, ...

  for (const file of files) {
    if (applied.has(file)) continue;

    const sql = fs.readFileSync(path.join(migrationsDir, file), 'utf8');

    // Each migration runs in a transaction — partial migration is rolled back
    const migrate = db.transaction(() => {
      db.exec(sql);
      db.prepare('INSERT INTO _migrations (filename) VALUES (?)').run(file);
    });

    console.log(`[migrations] Applying ${file}...`);
    migrate();
    console.log(`[migrations] Applied ${file}`);
  }
}

AliveMCP and migration failures

A deploy where the migration succeeds but introduces a schema mismatch with a tool handler (e.g., a column was renamed) shows up to AliveMCP as tool call errors rather than server downtime. The MCP server is up and responding to initialize, but tools/call returns isError: true. AliveMCP detects this within the next probe cycle and alerts on tool failure rate, giving you a 60-second detection window — much faster than user reports. This pattern distinguishes a bad migration from a full deployment failure.