Data persistence guide · 2026-06-05 · Production MCP servers
MCP Server Data Persistence Guide: SQLite, Prisma, Redis, Database Migrations, and Drizzle ORM
Adding a database to an MCP server looks like adding a database to any other Node.js server — until you hit a 503 during a routine deploy because your migration ran while old tool handlers were still writing, or until your database connection pool exhausts itself after 20 concurrent sessions because you opened a connection inside each tool handler. MCP servers have two properties that change the persistence model compared to conventional REST APIs: sessions are long-lived SSE connections that may stay open for minutes or hours, and the process lifecycle has a precise ordering requirement — schema migrations and connection pool validation must complete before the server signals readiness to the process manager. Get either property wrong and you get failures that are difficult to trace: tool calls that intermittently fail when the pool is exhausted, deploys that corrupt rows because old and new schema code ran simultaneously, or crashes that leave a SQLite WAL file in an inconsistent state. A complete persistence stack for a production MCP server has five concerns: SQLite for embedded single-process storage, Prisma ORM for type-safe queries over SQLite or PostgreSQL, Redis for caching, rate limiting, and distributed coordination, database migrations for controlled schema evolution, and Drizzle ORM as a build-step-free Prisma alternative suited for edge deployments. This guide covers them as a system — how each concern addresses a distinct part of the persistence problem, how they compose, and what the correct startup and shutdown ordering looks like when all five are present.
TL;DR
- Never instantiate a database client inside a tool handler. Opening a PrismaClient or SQLite connection per tool call creates a new connection pool per call — the pool exhausts within a handful of concurrent sessions. Module-level singletons created once in
createDeps()share one pool across all sessions. - Run migrations before signalling readiness. Schema migrations must complete before
process.send('ready')orsd_notify READY=1. Tool handlers must never execute against a stale schema. A failed migration should callprocess.exit(1)so the process manager restarts instead of routing traffic to a broken instance. - SQLite WAL mode is required for concurrent MCP sessions. The default DELETE journal mode blocks all readers while a write is in progress. An MCP server with multiple concurrent SSE sessions calling different tools simultaneously produces lock contention on every tool call. WAL mode allows concurrent reads alongside a single writer.
- Acquire database connections per tool call, not per session. If each session holds a connection from
initializeto session close, your pool exhausts atpool_sizeconcurrent sessions — not at the natural query throughput limit. The correct pattern is acquire-use-release inside each tool handler. - Graceful shutdown ordering matters. The database connection or client
close()must happen after all active tool handler calls complete. Closing SQLite while a query is in flight producesSQLITE_INTERRUPT; calling Prisma's$disconnect()while afindUnique()is in flight throwsPrismaClientKnownRequestError; callingredis.quit()while a Lua script is executing drops the in-flight command.
Why MCP Changes the Persistence Model
A REST API and an MCP server differ in one persistence-critical way: how long a single logical unit of work takes.
A REST API processes one HTTP request at a time per connection. A database connection is acquired at the start of the handler and released at the end. The connection's entire lifetime is one request — milliseconds to low seconds. A pool of 20 connections serves hundreds of concurrent requests because each connection is free almost immediately after it is acquired.
An MCP server accumulates context over a session. An LLM agent calls initialize, then may call dozens of tools over several minutes — pausing between calls while it processes results, plans next steps, or waits for user input. The session stays open the entire time. If the server holds a database connection for the lifetime of the session, a pool of 20 connections handles exactly 20 concurrent sessions, regardless of how infrequently each session queries the database.
This drives the core rule for MCP persistence: acquire per tool call, not per session.
| Pattern | When pool exhausts | Correct for MCP? |
|---|---|---|
| Hold connection per session | At pool_size concurrent sessions |
No |
| Acquire per HTTP request (REST pattern) | At pool_size concurrent in-flight requests |
Mostly — but sessions are longer than requests |
| Acquire per tool call | At pool_size concurrent queries |
Yes — correct for MCP |
The second MCP-specific constraint is startup ordering. A process manager like PM2 or systemd routes traffic to the new process as soon as it signals ready. If the process signals ready before migrations complete, a tool call that runs between the ready signal and the migration finishing touches the old schema. The fix is simple: run await migrate() before process.send('ready').
Embedded Database: SQLite
SQLite is the right persistence choice for most indie and small-team MCP servers: zero-dependency, single-file, no separate process to manage. The critical configuration for MCP use is WAL mode.
SQLite's default journal mode (DELETE) acquires a write lock on the entire database file for the duration of every write transaction. While that lock is held, any concurrent read — from a different SSE session calling a different read-only tool — blocks until the write completes. Under normal MCP load with a handful of concurrent sessions, this produces intermittent lock timeouts that appear in logs as SQLITE_BUSY.
WAL (Write-Ahead Log) mode allows concurrent reads alongside a single writer. Readers read from the last committed snapshot; the writer appends to the WAL file. Switching is one pragma:
import Database from 'better-sqlite3';
const db = new Database(process.env.DB_PATH ?? 'data/app.db');
db.pragma('journal_mode = WAL');
db.pragma('busy_timeout = 5000');
db.pragma('foreign_keys = ON');
db.pragma('synchronous = NORMAL');
busy_timeout = 5000 makes SQLite retry for up to five seconds before throwing SQLITE_BUSY, which handles the brief moment when two writes collide. synchronous = NORMAL is safe with WAL mode — it flushes WAL to disk at each checkpoint instead of every transaction, maintaining durability without the throughput cost of FULL mode.
Prepare all statements at module load time, not inside tool handlers:
// Module level — prepare once
const insertCache = db.prepare(
'INSERT OR REPLACE INTO cache (key, value, expires_at) VALUES (?, ?, ?)'
);
const selectCache = db.prepare(
'SELECT value FROM cache WHERE key = ? AND expires_at > unixepoch()'
);
// Inside a tool handler — use prepared statements
server.tool('get_cached', schema, async ({ key }) => {
const row = selectCache.get(key);
return { content: [{ type: 'text', text: row?.value ?? '' }] };
});
Re-preparing a statement inside a handler adds 5–20µs per call. Over thousands of tool calls per session, this accumulates. Statement preparation also performs schema validation — a failed prepare at startup catches query errors before any tool call reaches production.
The graceful shutdown ordering for SQLite is: stop the HTTP listener, drain active sessions, then call db.close(). Closing while a query is in flight produces SQLITE_INTERRUPT. See the deployment guide for the full SIGTERM drain handler implementation.
ORM Layer: Prisma
Prisma is the most popular ORM for TypeScript Node.js servers. Two patterns matter for MCP.
The singleton rule. Instantiating PrismaClient inside a tool handler creates a new connection pool per call. Each pool maintains its own idle connections, runs its own health-check queries, and consumes its own file descriptors. Three tool handlers each instantiating their own clients on every call exhausts the database's max_connections within minutes under normal load. The fix is a module-level singleton:
// deps.ts
import { PrismaClient } from '@prisma/client';
export async function createDeps() {
const prisma = new PrismaClient({
datasourceUrl: process.env.DATABASE_URL,
log: [{ level: 'warn', emit: 'event' }, { level: 'error', emit: 'event' }],
});
// Validate connectivity before signalling ready
await prisma.$queryRaw`SELECT 1`;
return { prisma };
}
Migrations before the ready signal. Prisma Migrate's deploy command is idempotent — it applies any pending migrations and exits cleanly if everything is current. Running it synchronously before process.send('ready') ensures tool handlers never touch a stale schema:
import { execSync } from 'node:child_process';
// In startup sequence, before process.send('ready')
execSync('npx prisma migrate deploy', { stdio: 'inherit' });
If the migration fails, execSync throws, the process exits with a non-zero code, and the process manager restarts. Traffic never reaches a misconfigured instance.
For multi-replica deployments — two instances starting simultaneously both attempting migrations — use a PostgreSQL advisory lock to serialise the migration runner, or use Fly.io's release_command to run the migration once before any machine receives traffic.
Error handling in tool responses. Prisma throws typed errors that map cleanly to MCP's isError: true pattern:
import { PrismaClientKnownRequestError } from '@prisma/client/runtime/library';
server.tool('get_item', schema, async ({ id }, { prisma }) => {
try {
const item = await prisma.item.findUniqueOrThrow({ where: { id } });
return { content: [{ type: 'text', text: JSON.stringify(item) }] };
} catch (err) {
if (err instanceof PrismaClientKnownRequestError && err.code === 'P2025') {
return { isError: true, content: [{ type: 'text', text: `Item ${id} not found` }] };
}
throw err; // propagates as -32603, triggers global error handler
}
});
P2025 is the "record not found" code — a recoverable application error that the LLM can handle. Unknown errors rethrow and propagate as JSON-RPC -32603, which AliveMCP surfaces as a probe failure within 60 seconds.
Graceful shutdown. Call prisma.$disconnect() after all active tool handler promises resolve, not concurrently with them. $disconnect() while a findUnique() is in flight throws PrismaClientKnownRequestError.
Alternative ORM: Drizzle
Drizzle ORM offers a different set of trade-offs from Prisma that matter in specific MCP deployment contexts.
The primary difference is the absence of a build step. Prisma generates a type-safe client from the .prisma schema file — running prisma generate is required after every schema change, which adds a step to CI/CD pipelines and complicates deployments where the client is built in a separate container from the runtime. Drizzle defines the schema in TypeScript files; types are inferred directly from the schema at compile time with no separate generation step.
// schema.ts — Drizzle schema in TypeScript
import { sqliteTable, text, integer } from 'drizzle-orm/sqlite-core';
export const cache = sqliteTable('cache', {
key: text('key').primaryKey(),
value: text('value').notNull(),
expiresAt: integer('expires_at').notNull(),
});
// Infer types directly — no build step
export type Cache = typeof cache.$inferSelect;
export type NewCache = typeof cache.$inferInsert;
The query builder uses SQL-like syntax that maps closely to the SQL it generates:
import { drizzle } from 'drizzle-orm/better-sqlite3';
import { gt, eq } from 'drizzle-orm';
import * as schema from './schema.js';
const db = drizzle(sqliteConnection, { schema });
// In a tool handler
const rows = await db
.select()
.from(schema.cache)
.where(gt(schema.cache.expiresAt, Math.floor(Date.now() / 1000)));
Drizzle also supports edge runtimes where Prisma has limited compatibility. Drizzle with the Neon serverless driver or Cloudflare D1 driver works in Cloudflare Workers; Prisma's edge runtime support is partial and requires separate configuration. For MCP servers deployed to Cloudflare Workers or similar edge platforms, Drizzle is the correct choice.
The migration workflow differs as well. Drizzle Kit generates migration SQL by diffing the TypeScript schema against the current database state (drizzle-kit generate), then applies pending migrations (drizzle-kit migrate). The same rule applies as with Prisma: run drizzle-kit migrate before signalling readiness.
Prisma vs. Drizzle: Choosing the Right ORM
| Concern | Prisma | Drizzle |
|---|---|---|
| Schema definition | .prisma DSL file |
TypeScript sqliteTable() / pgTable() |
| Type generation | prisma generate required after schema change |
Types inferred at compile time — no generate step |
| Query syntax | Object-based (prisma.item.findMany({ where: {} })) |
SQL-like (select().from().where()) |
| Migrations | prisma migrate deploy |
drizzle-kit generate + drizzle-kit migrate |
| Edge runtime (Workers, D1, Neon) | Partial — separate edge client required | Full — native HTTP drivers for D1, Neon, Turso |
| Ecosystem maturity | Larger — more examples, more tooling | Growing — less tooling, more flexibility |
| Best for | VPS / container deployments, PostgreSQL focus | Edge deployments, CI/CD simplicity, SQLite-first stacks |
For most MCP servers running on a VPS or in a container with PostgreSQL, Prisma is the pragmatic choice — larger community, more examples in the MCP ecosystem, and a battle-tested migration workflow. For MCP servers targeting edge runtimes or where eliminating the prisma generate CI step matters, Drizzle is the better fit.
Cache and Coordination: Redis
Redis serves three distinct roles in an MCP server: tool result caching, per-session rate limiting, and distributed coordination for multi-instance deployments.
Tool result caching. The cache-aside pattern wraps tool handlers that query external APIs or perform expensive computations:
async function withCache<T>(
redis: Redis,
key: string,
ttlSeconds: number,
fn: () => Promise<T>
): Promise<T> {
try {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached) as T;
} catch {
// Redis unavailable — fall through to real data source
}
const value = await fn();
try {
await redis.setex(key, ttlSeconds, JSON.stringify(value));
} catch {
// Cache write failure is non-fatal
}
return value;
}
The try/catch around every Redis operation is intentional. Caching is a performance concern, not a correctness concern — a Redis failure must never propagate as a tool error. The LLM should receive the correct data more slowly, not an error.
Per-session rate limiting. A Lua script implements a sliding-window rate limiter atomically:
const RATE_LIMIT_SCRIPT = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window)
local count = redis.call('ZCARD', key)
if count >= limit then return 0 end
redis.call('ZADD', key, now, now .. math.random())
redis.call('EXPIRE', key, window)
return 1
`;
The Lua script executes atomically — the remove, count, and add happen in a single Redis roundtrip with no race condition between concurrent tool calls from the same session.
Distributed locks for singleton operations. Tool calls that must not execute concurrently — sending an email, initiating a payment — use Redis SET NX PX as a distributed lock:
const acquired = await redis.set(lockKey, lockId, 'NX', 'PX', lockTtlMs);
if (!acquired) {
return { isError: true, content: [{ type: 'text', text: 'Operation already in progress' }] };
}
try {
// perform singleton operation
} finally {
// Lua ownership-check: only release if we still own the lock
await redis.eval(RELEASE_SCRIPT, 1, lockKey, lockId);
}
The ownership check in the release script prevents a slow operation from releasing a lock acquired by a different caller after the original TTL expired.
Singleton and shutdown. Use one module-level ioredis client shared across all sessions. ioredis's built-in reconnect logic with exponential backoff handles transient Redis restarts without application code. On shutdown, call redis.quit() — it sends the QUIT command and waits for acknowledgement, ensuring in-flight commands complete before the connection closes. redis.disconnect() closes the socket immediately and drops any commands in flight.
Schema Evolution: Database Migrations
Database migrations require precise timing relative to the MCP server's startup sequence because the server signals readiness before accepting traffic, and the process manager interprets that signal as permission to route requests.
The correct sequence is:
- Start the process
- Run schema migrations (synchronously — block until complete)
- Open database connections and validate connectivity
- Open Redis connections and validate connectivity
- Register tools and start the HTTP listener
- Signal ready (
process.send('ready')for PM2,sd_notify READY=1for systemd)
Any failure at steps 2–5 should call process.exit(1). The process manager's restart policy then re-attempts startup after a delay — safer than routing traffic to a partially initialised server.
Multi-replica migration races. When two replicas start simultaneously, both attempt step 2. For SQLite, this is not a problem — only one process writes the database file at a time, and the migration runner uses a write transaction. For PostgreSQL, two replicas can run migrations concurrently, potentially attempting the same ALTER TABLE from different processes. The standard fixes:
- Fly.io
release_command— runs the migration command once before any machine receives traffic. All machines start after migrations complete. - Kubernetes init container — a separate container runs migrations before the main container starts.
- PostgreSQL advisory lock —
SELECT pg_advisory_lock(hash)at the start of the migration runner serialises concurrent runners; the second runner waits until the first completes and then finds no pending migrations.
Backward-compatible migrations for rolling updates. During a rolling update, old and new versions of the application run simultaneously for 10–60 seconds. A migration that removes a column breaks the old version still running. The safe pattern for removing a column is three deploys: (1) remove code references to the column, (2) add a migration that drops the column, (3) deploy. Adding a column with a DEFAULT value is safe in a single deploy — the old version ignores the new column, the new version reads it.
The connection-refused failure mode is the cleanest signal: if the migration runner exits non-zero, the process never calls app.listen(), and AliveMCP's probe reports the server as down within 60 seconds — before any tool call reaches the broken instance.
A Complete Persistence Stack
The five persistence concerns compose into a single createDeps() function that validates everything before the server accepts connections:
import Database from 'better-sqlite3';
import { PrismaClient } from '@prisma/client';
import { Redis } from 'ioredis';
import { execSync } from 'node:child_process';
export async function createDeps() {
// 1. Run migrations before opening ORM connections
execSync('npx prisma migrate deploy', { stdio: 'inherit' });
// 2. SQLite for embedded storage (e.g. local cache, audit log)
const db = new Database(process.env.SQLITE_PATH ?? 'data/local.db');
db.pragma('journal_mode = WAL');
db.pragma('busy_timeout = 5000');
db.pragma('foreign_keys = ON');
// 3. Prisma for main relational data (PostgreSQL or SQLite)
const prisma = new PrismaClient({ datasourceUrl: process.env.DATABASE_URL });
await prisma.$queryRaw`SELECT 1`;
// 4. Redis for caching and coordination
const redis = new Redis(process.env.REDIS_URL ?? 'redis://localhost:6379', {
maxRetriesPerRequest: 3,
enableOfflineQueue: false,
keyPrefix: 'mcp:',
});
await redis.ping();
return { db, prisma, redis };
}
The function throws on any failure. The caller wraps it in a top-level try/catch that calls process.exit(1). The process manager restarts — migrations are idempotent, so the next attempt picks up from a clean state.
Which layers you include depends on your deployment context:
| Deployment context | Recommended stack |
|---|---|
| Single VPS, hobby project | SQLite + Drizzle, no Redis (in-memory rate limiting) |
| Single VPS, small team | SQLite + Prisma + Redis |
| Containerised, single instance | PostgreSQL + Prisma + Redis |
| Multi-instance (Fly.io, Kubernetes) | PostgreSQL + Prisma + Redis, advisory lock migration, Fly release_command |
| Cloudflare Workers / edge | D1 + Drizzle, no Redis (use D1 for coordination) |
Graceful Shutdown Ordering
The shutdown sequence must close persistence resources in the correct order. Closing too early drops in-flight tool calls; closing too late causes zombie connections to pile up during rolling updates.
The correct shutdown sequence when all five layers are present:
- Mark the health endpoint as shutting down — return HTTP 503 from
/health. The load balancer removes the instance from rotation. New sessions stop arriving. - Stop the HTTP listener —
httpServer.close(). Existing sessions continue on their open connections. - Wait for active sessions to drain — poll the
activeSessionsmap. Honour aDRAIN_TIMEOUT_MScap; force-close remaining sessions after the timeout. - Close Redis —
await redis.quit(). Any in-flight Lua scripts andsetexcalls complete first. - Disconnect Prisma —
await prisma.$disconnect(). Any in-flight Prisma queries complete first. - Close SQLite —
db.close(). All prepared statements complete. SQLite checkpoints the WAL file on close. - Exit —
process.exit(0).
Steps 4–6 must happen after step 3 completes, not concurrently with the drain. A tool handler that is mid-query when the connection closes throws a runtime error — not isError: true in the tool response, but an unhandled exception that may corrupt the session state or leave the response channel in an indeterminate state.
The timeout at step 3 must be smaller than the process manager's stop timeout. For systemd, set TimeoutStopSec to DRAIN_TIMEOUT_MS / 1000 + 10 — the additional ten seconds covers the time to close Redis, Prisma, and SQLite after the drain completes. For PM2, set kill_timeout to DRAIN_TIMEOUT_MS + 10000 in milliseconds.
What AliveMCP Sees That Internal Monitoring Cannot
Database clients, ORMs, and Redis all have internal health-check mechanisms. They share a common limitation: they verify that the connection is open and the server is reachable. They do not verify that the application layer — the tool handlers using those connections — is correctly responding to MCP protocol requests.
Consider these failure modes invisible to internal persistence monitoring:
- Migration ran, but wrong version. A Fly.io
release_commandran the migration against the staging database instead of production — a mis-setDATABASE_URLenvironment variable. The production MCP server starts, migrations report success against staging, the Prisma client connects to production with the old schema, and every tool call that touches the migrated table returnsP2025or a column-not-found error. The process is running;prisma.$queryRaw\`SELECT 1\`returns successfully. The failure is invisible until tool calls fail. - Connection pool exhausted, no error logged. The pool was set to 5 connections for a server handling 20 concurrent sessions. Every tool call waits for a free connection. The MCP server is technically healthy — no errors — but every tool call takes 8–15 seconds instead of 200ms. AliveMCP's probe measures the latency spike at the
initializelayer; internal pool metrics show high pending-acquire counts, but no alert fired because the error rate was zero. - Redis unavailable, rate limiter falling open. The Redis connection dropped silently. The rate limiter's
try/catchfalls open — all tool calls proceed regardless of rate. A runaway LLM agent hammers the database with thousands of queries per minute. The Prisma client reports healthy; Redis reports disconnected; the tool error rate is zero. The failure is in the interaction between components. - WAL file corrupted after OOM kill. The process was killed by the OOM killer mid-write. The WAL file was left in a partially written state. SQLite replays committed frames on next open — correct — but an application bug left a write transaction open in WAL that the recovery logic cannot resolve. The next startup crashes on
db.pragma('journal_mode = WAL'). AliveMCP reports the server as down within 60 seconds; the process manager has been looping on restarts for five minutes with no alert.
AliveMCP probes from outside — it connects via the full MCP protocol, completes an initialize handshake, and measures whether the server correctly responds at each stage. A server with a full connection pool that is accepting connections but timing out on every tool call appears as a probe latency alert. A server crashing on startup appears as a connection-refused failure. A server with a corrupted schema appears as an initialize failure within 60 seconds of the first startup attempt.
See also: MCP server health check patterns, MCP server connection pooling, and the observability stack guide for the internal instrumentation that complements external probing.
Related Guides
- MCP server SQLite — WAL mode, better-sqlite3, prepared statements, graceful shutdown ordering
- MCP server Prisma ORM — PrismaClient singleton, migrate deploy on startup,
$disconnectafter session drain - MCP server Redis — cache-aside pattern, sliding-window rate limiter, distributed lock with Lua ownership check
- MCP server database migrations — before the ready signal, multi-replica advisory lock, backward-compatible rolling update patterns
- MCP server Drizzle ORM — TypeScript schema inference, SQL-like query builder, D1/Neon edge runtime support
- MCP server connection pooling — acquire per tool call vs. per session, pool size formula, exhaustion handling
- MCP server graceful shutdown — full SIGTERM drain handler, session tracking, health check transition
- MCP Server Deployment Guide — PM2, systemd, nginx, Fly.io, zero-downtime deployment
- MCP Server Observability Stack Guide — OpenTelemetry, Prometheus metrics, structured logging
- MCP Server Infrastructure Operations Guide — dependency injection, load balancing, async work
- AliveMCP — external uptime monitoring for MCP servers