Data persistence guide · 2026-06-05 · Production MCP servers

MCP Server Data Persistence Guide: SQLite, Prisma, Redis, Database Migrations, and Drizzle ORM

Adding a database to an MCP server looks like adding a database to any other Node.js server — until you hit a 503 during a routine deploy because your migration ran while old tool handlers were still writing, or until your database connection pool exhausts itself after 20 concurrent sessions because you opened a connection inside each tool handler. MCP servers have two properties that change the persistence model compared to conventional REST APIs: sessions are long-lived SSE connections that may stay open for minutes or hours, and the process lifecycle has a precise ordering requirement — schema migrations and connection pool validation must complete before the server signals readiness to the process manager. Get either property wrong and you get failures that are difficult to trace: tool calls that intermittently fail when the pool is exhausted, deploys that corrupt rows because old and new schema code ran simultaneously, or crashes that leave a SQLite WAL file in an inconsistent state. A complete persistence stack for a production MCP server has five concerns: SQLite for embedded single-process storage, Prisma ORM for type-safe queries over SQLite or PostgreSQL, Redis for caching, rate limiting, and distributed coordination, database migrations for controlled schema evolution, and Drizzle ORM as a build-step-free Prisma alternative suited for edge deployments. This guide covers them as a system — how each concern addresses a distinct part of the persistence problem, how they compose, and what the correct startup and shutdown ordering looks like when all five are present.

TL;DR

Why MCP Changes the Persistence Model

A REST API and an MCP server differ in one persistence-critical way: how long a single logical unit of work takes.

A REST API processes one HTTP request at a time per connection. A database connection is acquired at the start of the handler and released at the end. The connection's entire lifetime is one request — milliseconds to low seconds. A pool of 20 connections serves hundreds of concurrent requests because each connection is free almost immediately after it is acquired.

An MCP server accumulates context over a session. An LLM agent calls initialize, then may call dozens of tools over several minutes — pausing between calls while it processes results, plans next steps, or waits for user input. The session stays open the entire time. If the server holds a database connection for the lifetime of the session, a pool of 20 connections handles exactly 20 concurrent sessions, regardless of how infrequently each session queries the database.

This drives the core rule for MCP persistence: acquire per tool call, not per session.

Pattern When pool exhausts Correct for MCP?
Hold connection per session At pool_size concurrent sessions No
Acquire per HTTP request (REST pattern) At pool_size concurrent in-flight requests Mostly — but sessions are longer than requests
Acquire per tool call At pool_size concurrent queries Yes — correct for MCP

The second MCP-specific constraint is startup ordering. A process manager like PM2 or systemd routes traffic to the new process as soon as it signals ready. If the process signals ready before migrations complete, a tool call that runs between the ready signal and the migration finishing touches the old schema. The fix is simple: run await migrate() before process.send('ready').

Embedded Database: SQLite

SQLite is the right persistence choice for most indie and small-team MCP servers: zero-dependency, single-file, no separate process to manage. The critical configuration for MCP use is WAL mode.

SQLite's default journal mode (DELETE) acquires a write lock on the entire database file for the duration of every write transaction. While that lock is held, any concurrent read — from a different SSE session calling a different read-only tool — blocks until the write completes. Under normal MCP load with a handful of concurrent sessions, this produces intermittent lock timeouts that appear in logs as SQLITE_BUSY.

WAL (Write-Ahead Log) mode allows concurrent reads alongside a single writer. Readers read from the last committed snapshot; the writer appends to the WAL file. Switching is one pragma:

import Database from 'better-sqlite3';

const db = new Database(process.env.DB_PATH ?? 'data/app.db');
db.pragma('journal_mode = WAL');
db.pragma('busy_timeout = 5000');
db.pragma('foreign_keys = ON');
db.pragma('synchronous = NORMAL');

busy_timeout = 5000 makes SQLite retry for up to five seconds before throwing SQLITE_BUSY, which handles the brief moment when two writes collide. synchronous = NORMAL is safe with WAL mode — it flushes WAL to disk at each checkpoint instead of every transaction, maintaining durability without the throughput cost of FULL mode.

Prepare all statements at module load time, not inside tool handlers:

// Module level — prepare once
const insertCache = db.prepare(
  'INSERT OR REPLACE INTO cache (key, value, expires_at) VALUES (?, ?, ?)'
);
const selectCache = db.prepare(
  'SELECT value FROM cache WHERE key = ? AND expires_at > unixepoch()'
);

// Inside a tool handler — use prepared statements
server.tool('get_cached', schema, async ({ key }) => {
  const row = selectCache.get(key);
  return { content: [{ type: 'text', text: row?.value ?? '' }] };
});

Re-preparing a statement inside a handler adds 5–20µs per call. Over thousands of tool calls per session, this accumulates. Statement preparation also performs schema validation — a failed prepare at startup catches query errors before any tool call reaches production.

The graceful shutdown ordering for SQLite is: stop the HTTP listener, drain active sessions, then call db.close(). Closing while a query is in flight produces SQLITE_INTERRUPT. See the deployment guide for the full SIGTERM drain handler implementation.

ORM Layer: Prisma

Prisma is the most popular ORM for TypeScript Node.js servers. Two patterns matter for MCP.

The singleton rule. Instantiating PrismaClient inside a tool handler creates a new connection pool per call. Each pool maintains its own idle connections, runs its own health-check queries, and consumes its own file descriptors. Three tool handlers each instantiating their own clients on every call exhausts the database's max_connections within minutes under normal load. The fix is a module-level singleton:

// deps.ts
import { PrismaClient } from '@prisma/client';

export async function createDeps() {
  const prisma = new PrismaClient({
    datasourceUrl: process.env.DATABASE_URL,
    log: [{ level: 'warn', emit: 'event' }, { level: 'error', emit: 'event' }],
  });

  // Validate connectivity before signalling ready
  await prisma.$queryRaw`SELECT 1`;

  return { prisma };
}

Migrations before the ready signal. Prisma Migrate's deploy command is idempotent — it applies any pending migrations and exits cleanly if everything is current. Running it synchronously before process.send('ready') ensures tool handlers never touch a stale schema:

import { execSync } from 'node:child_process';

// In startup sequence, before process.send('ready')
execSync('npx prisma migrate deploy', { stdio: 'inherit' });

If the migration fails, execSync throws, the process exits with a non-zero code, and the process manager restarts. Traffic never reaches a misconfigured instance.

For multi-replica deployments — two instances starting simultaneously both attempting migrations — use a PostgreSQL advisory lock to serialise the migration runner, or use Fly.io's release_command to run the migration once before any machine receives traffic.

Error handling in tool responses. Prisma throws typed errors that map cleanly to MCP's isError: true pattern:

import { PrismaClientKnownRequestError } from '@prisma/client/runtime/library';

server.tool('get_item', schema, async ({ id }, { prisma }) => {
  try {
    const item = await prisma.item.findUniqueOrThrow({ where: { id } });
    return { content: [{ type: 'text', text: JSON.stringify(item) }] };
  } catch (err) {
    if (err instanceof PrismaClientKnownRequestError && err.code === 'P2025') {
      return { isError: true, content: [{ type: 'text', text: `Item ${id} not found` }] };
    }
    throw err; // propagates as -32603, triggers global error handler
  }
});

P2025 is the "record not found" code — a recoverable application error that the LLM can handle. Unknown errors rethrow and propagate as JSON-RPC -32603, which AliveMCP surfaces as a probe failure within 60 seconds.

Graceful shutdown. Call prisma.$disconnect() after all active tool handler promises resolve, not concurrently with them. $disconnect() while a findUnique() is in flight throws PrismaClientKnownRequestError.

Alternative ORM: Drizzle

Drizzle ORM offers a different set of trade-offs from Prisma that matter in specific MCP deployment contexts.

The primary difference is the absence of a build step. Prisma generates a type-safe client from the .prisma schema file — running prisma generate is required after every schema change, which adds a step to CI/CD pipelines and complicates deployments where the client is built in a separate container from the runtime. Drizzle defines the schema in TypeScript files; types are inferred directly from the schema at compile time with no separate generation step.

// schema.ts — Drizzle schema in TypeScript
import { sqliteTable, text, integer } from 'drizzle-orm/sqlite-core';

export const cache = sqliteTable('cache', {
  key: text('key').primaryKey(),
  value: text('value').notNull(),
  expiresAt: integer('expires_at').notNull(),
});

// Infer types directly — no build step
export type Cache = typeof cache.$inferSelect;
export type NewCache = typeof cache.$inferInsert;

The query builder uses SQL-like syntax that maps closely to the SQL it generates:

import { drizzle } from 'drizzle-orm/better-sqlite3';
import { gt, eq } from 'drizzle-orm';
import * as schema from './schema.js';

const db = drizzle(sqliteConnection, { schema });

// In a tool handler
const rows = await db
  .select()
  .from(schema.cache)
  .where(gt(schema.cache.expiresAt, Math.floor(Date.now() / 1000)));

Drizzle also supports edge runtimes where Prisma has limited compatibility. Drizzle with the Neon serverless driver or Cloudflare D1 driver works in Cloudflare Workers; Prisma's edge runtime support is partial and requires separate configuration. For MCP servers deployed to Cloudflare Workers or similar edge platforms, Drizzle is the correct choice.

The migration workflow differs as well. Drizzle Kit generates migration SQL by diffing the TypeScript schema against the current database state (drizzle-kit generate), then applies pending migrations (drizzle-kit migrate). The same rule applies as with Prisma: run drizzle-kit migrate before signalling readiness.

Prisma vs. Drizzle: Choosing the Right ORM

Concern Prisma Drizzle
Schema definition .prisma DSL file TypeScript sqliteTable() / pgTable()
Type generation prisma generate required after schema change Types inferred at compile time — no generate step
Query syntax Object-based (prisma.item.findMany({ where: {} })) SQL-like (select().from().where())
Migrations prisma migrate deploy drizzle-kit generate + drizzle-kit migrate
Edge runtime (Workers, D1, Neon) Partial — separate edge client required Full — native HTTP drivers for D1, Neon, Turso
Ecosystem maturity Larger — more examples, more tooling Growing — less tooling, more flexibility
Best for VPS / container deployments, PostgreSQL focus Edge deployments, CI/CD simplicity, SQLite-first stacks

For most MCP servers running on a VPS or in a container with PostgreSQL, Prisma is the pragmatic choice — larger community, more examples in the MCP ecosystem, and a battle-tested migration workflow. For MCP servers targeting edge runtimes or where eliminating the prisma generate CI step matters, Drizzle is the better fit.

Cache and Coordination: Redis

Redis serves three distinct roles in an MCP server: tool result caching, per-session rate limiting, and distributed coordination for multi-instance deployments.

Tool result caching. The cache-aside pattern wraps tool handlers that query external APIs or perform expensive computations:

async function withCache<T>(
  redis: Redis,
  key: string,
  ttlSeconds: number,
  fn: () => Promise<T>
): Promise<T> {
  try {
    const cached = await redis.get(key);
    if (cached) return JSON.parse(cached) as T;
  } catch {
    // Redis unavailable — fall through to real data source
  }
  const value = await fn();
  try {
    await redis.setex(key, ttlSeconds, JSON.stringify(value));
  } catch {
    // Cache write failure is non-fatal
  }
  return value;
}

The try/catch around every Redis operation is intentional. Caching is a performance concern, not a correctness concern — a Redis failure must never propagate as a tool error. The LLM should receive the correct data more slowly, not an error.

Per-session rate limiting. A Lua script implements a sliding-window rate limiter atomically:

const RATE_LIMIT_SCRIPT = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window)
local count = redis.call('ZCARD', key)
if count >= limit then return 0 end
redis.call('ZADD', key, now, now .. math.random())
redis.call('EXPIRE', key, window)
return 1
`;

The Lua script executes atomically — the remove, count, and add happen in a single Redis roundtrip with no race condition between concurrent tool calls from the same session.

Distributed locks for singleton operations. Tool calls that must not execute concurrently — sending an email, initiating a payment — use Redis SET NX PX as a distributed lock:

const acquired = await redis.set(lockKey, lockId, 'NX', 'PX', lockTtlMs);
if (!acquired) {
  return { isError: true, content: [{ type: 'text', text: 'Operation already in progress' }] };
}
try {
  // perform singleton operation
} finally {
  // Lua ownership-check: only release if we still own the lock
  await redis.eval(RELEASE_SCRIPT, 1, lockKey, lockId);
}

The ownership check in the release script prevents a slow operation from releasing a lock acquired by a different caller after the original TTL expired.

Singleton and shutdown. Use one module-level ioredis client shared across all sessions. ioredis's built-in reconnect logic with exponential backoff handles transient Redis restarts without application code. On shutdown, call redis.quit() — it sends the QUIT command and waits for acknowledgement, ensuring in-flight commands complete before the connection closes. redis.disconnect() closes the socket immediately and drops any commands in flight.

Schema Evolution: Database Migrations

Database migrations require precise timing relative to the MCP server's startup sequence because the server signals readiness before accepting traffic, and the process manager interprets that signal as permission to route requests.

The correct sequence is:

  1. Start the process
  2. Run schema migrations (synchronously — block until complete)
  3. Open database connections and validate connectivity
  4. Open Redis connections and validate connectivity
  5. Register tools and start the HTTP listener
  6. Signal ready (process.send('ready') for PM2, sd_notify READY=1 for systemd)

Any failure at steps 2–5 should call process.exit(1). The process manager's restart policy then re-attempts startup after a delay — safer than routing traffic to a partially initialised server.

Multi-replica migration races. When two replicas start simultaneously, both attempt step 2. For SQLite, this is not a problem — only one process writes the database file at a time, and the migration runner uses a write transaction. For PostgreSQL, two replicas can run migrations concurrently, potentially attempting the same ALTER TABLE from different processes. The standard fixes:

Backward-compatible migrations for rolling updates. During a rolling update, old and new versions of the application run simultaneously for 10–60 seconds. A migration that removes a column breaks the old version still running. The safe pattern for removing a column is three deploys: (1) remove code references to the column, (2) add a migration that drops the column, (3) deploy. Adding a column with a DEFAULT value is safe in a single deploy — the old version ignores the new column, the new version reads it.

The connection-refused failure mode is the cleanest signal: if the migration runner exits non-zero, the process never calls app.listen(), and AliveMCP's probe reports the server as down within 60 seconds — before any tool call reaches the broken instance.

A Complete Persistence Stack

The five persistence concerns compose into a single createDeps() function that validates everything before the server accepts connections:

import Database from 'better-sqlite3';
import { PrismaClient } from '@prisma/client';
import { Redis } from 'ioredis';
import { execSync } from 'node:child_process';

export async function createDeps() {
  // 1. Run migrations before opening ORM connections
  execSync('npx prisma migrate deploy', { stdio: 'inherit' });

  // 2. SQLite for embedded storage (e.g. local cache, audit log)
  const db = new Database(process.env.SQLITE_PATH ?? 'data/local.db');
  db.pragma('journal_mode = WAL');
  db.pragma('busy_timeout = 5000');
  db.pragma('foreign_keys = ON');

  // 3. Prisma for main relational data (PostgreSQL or SQLite)
  const prisma = new PrismaClient({ datasourceUrl: process.env.DATABASE_URL });
  await prisma.$queryRaw`SELECT 1`;

  // 4. Redis for caching and coordination
  const redis = new Redis(process.env.REDIS_URL ?? 'redis://localhost:6379', {
    maxRetriesPerRequest: 3,
    enableOfflineQueue: false,
    keyPrefix: 'mcp:',
  });
  await redis.ping();

  return { db, prisma, redis };
}

The function throws on any failure. The caller wraps it in a top-level try/catch that calls process.exit(1). The process manager restarts — migrations are idempotent, so the next attempt picks up from a clean state.

Which layers you include depends on your deployment context:

Deployment context Recommended stack
Single VPS, hobby project SQLite + Drizzle, no Redis (in-memory rate limiting)
Single VPS, small team SQLite + Prisma + Redis
Containerised, single instance PostgreSQL + Prisma + Redis
Multi-instance (Fly.io, Kubernetes) PostgreSQL + Prisma + Redis, advisory lock migration, Fly release_command
Cloudflare Workers / edge D1 + Drizzle, no Redis (use D1 for coordination)

Graceful Shutdown Ordering

The shutdown sequence must close persistence resources in the correct order. Closing too early drops in-flight tool calls; closing too late causes zombie connections to pile up during rolling updates.

The correct shutdown sequence when all five layers are present:

  1. Mark the health endpoint as shutting down — return HTTP 503 from /health. The load balancer removes the instance from rotation. New sessions stop arriving.
  2. Stop the HTTP listenerhttpServer.close(). Existing sessions continue on their open connections.
  3. Wait for active sessions to drain — poll the activeSessions map. Honour a DRAIN_TIMEOUT_MS cap; force-close remaining sessions after the timeout.
  4. Close Redisawait redis.quit(). Any in-flight Lua scripts and setex calls complete first.
  5. Disconnect Prismaawait prisma.$disconnect(). Any in-flight Prisma queries complete first.
  6. Close SQLitedb.close(). All prepared statements complete. SQLite checkpoints the WAL file on close.
  7. Exitprocess.exit(0).

Steps 4–6 must happen after step 3 completes, not concurrently with the drain. A tool handler that is mid-query when the connection closes throws a runtime error — not isError: true in the tool response, but an unhandled exception that may corrupt the session state or leave the response channel in an indeterminate state.

The timeout at step 3 must be smaller than the process manager's stop timeout. For systemd, set TimeoutStopSec to DRAIN_TIMEOUT_MS / 1000 + 10 — the additional ten seconds covers the time to close Redis, Prisma, and SQLite after the drain completes. For PM2, set kill_timeout to DRAIN_TIMEOUT_MS + 10000 in milliseconds.

What AliveMCP Sees That Internal Monitoring Cannot

Database clients, ORMs, and Redis all have internal health-check mechanisms. They share a common limitation: they verify that the connection is open and the server is reachable. They do not verify that the application layer — the tool handlers using those connections — is correctly responding to MCP protocol requests.

Consider these failure modes invisible to internal persistence monitoring:

AliveMCP probes from outside — it connects via the full MCP protocol, completes an initialize handshake, and measures whether the server correctly responds at each stage. A server with a full connection pool that is accepting connections but timing out on every tool call appears as a probe latency alert. A server crashing on startup appears as a connection-refused failure. A server with a corrupted schema appears as an initialize failure within 60 seconds of the first startup attempt.

See also: MCP server health check patterns, MCP server connection pooling, and the observability stack guide for the internal instrumentation that complements external probing.

Related Guides