Guide · Data Persistence
MCP server Redis
Redis adds value to an MCP server in three distinct ways. Tool response caching: LLMs frequently call the same tool with the same arguments across sessions — "get_user(id=123)" called by session A and session B five seconds apart should not make two external API requests. A Redis cache with a short TTL collapses repeated calls. Per-session rate limiting: an LLM in a reasoning loop can call a tool dozens of times in a minute; a sliding-window counter in Redis enforces a per-session cap that protects downstream APIs without blocking unrelated sessions. Distributed locks: when multiple connection-pooled workers handle tool calls concurrently, a Redis lock prevents duplicate singleton operations (email send, payment charge, file rename) from racing. This guide covers ioredis vs. node-redis, the cache-aside pattern, rate limiting, distributed locks, and graceful shutdown ordering.
TL;DR
Use ioredis for its automatic reconnect and cluster support. Implement cache-aside in a withCache() wrapper to keep tool handlers free of caching logic. Rate limit with a Lua script atomic compare-and-expire. On SIGTERM, call redis.quit() after sessions drain — redis.disconnect() drops the connection immediately and may leave in-flight commands unacknowledged.
Client choice: ioredis vs. node-redis
| Feature | ioredis | node-redis (v4+) |
|---|---|---|
| Automatic reconnect | Built-in with exponential backoff | Manual reconnect strategy required |
| Cluster support | First-class (new Redis.Cluster()) | Supported but less ergonomic |
| Promises API | All commands return promises | All commands return promises |
| Lua scripting | redis.defineCommand() | redis.createScript() |
| Streams | Full XREAD/XADD/consumer group | Full support |
| Bundle size | Larger | Smaller (modular) |
For most MCP servers on a single Redis instance, both work equally well. ioredis is recommended because its built-in reconnect with exponential backoff handles transient Redis restarts (patching, failover) without application code changes — in an MCP server, a Redis restart without reconnect logic causes all subsequent tool cache misses to throw rather than fall through to the underlying data source.
npm install ioredis
npm install --save-dev @types/ioredis # if using older ioredis v4; v5+ ships its own types
Redis client singleton
// src/redis.ts
import Redis from 'ioredis';
const redis = new Redis({
host: process.env.REDIS_HOST ?? 'localhost',
port: Number(process.env.REDIS_PORT ?? 6379),
password: process.env.REDIS_PASSWORD,
// Reconnect on failure: exponential backoff 50ms → 2000ms, max 10 retries
retryStrategy: (times) => Math.min(times * 50, 2000),
maxRetriesPerRequest: 3,
// Lazy connect: do not open the connection until the first command
lazyConnect: false,
// Key prefix: isolates this MCP server's keys from others sharing the Redis instance
keyPrefix: process.env.REDIS_KEY_PREFIX ?? 'mcp:',
enableOfflineQueue: true,
});
redis.on('error', (err) => {
// Log but do not crash — the MCP server continues without cache on Redis failure
console.error('Redis error:', err.message);
});
redis.on('reconnecting', () => {
console.log('Redis reconnecting...');
});
export default redis;
The enableOfflineQueue: true setting buffers commands issued while Redis is reconnecting and replays them after reconnection. For cache operations, this is acceptable. For rate-limit counters, it means commands queued during a Redis outage are processed in a burst after reconnection — decide whether to disable the queue for rate-limit commands specifically (use a separate Redis client instance with enableOfflineQueue: false).
Tool response cache — cache-aside pattern
// src/cache.ts
import redis from './redis.js';
/**
* Generic cache-aside wrapper for MCP tool handlers.
* On cache hit: returns cached value without calling fn.
* On cache miss: calls fn, stores the result, returns it.
* On Redis error: calls fn directly (cache is best-effort).
*/
export async function withCache(
key: string,
ttlSeconds: number,
fn: () => Promise
): Promise {
try {
const cached = await redis.get(key);
if (cached !== null) {
return JSON.parse(cached) as T;
}
} catch {
// Redis unavailable — fall through to the real data source
}
const result = await fn();
try {
await redis.setex(key, ttlSeconds, JSON.stringify(result));
} catch {
// Cache write failure is non-fatal
}
return result;
}
// Usage in a tool handler:
// const user = await withCache(
// `user:${userId}`,
// 300, // 5-minute TTL
// () => prisma.user.findUniqueOrThrow({ where: { id: userId } })
// );
The wrapper never throws from Redis failures — caching is a performance optimisation, not a correctness requirement. An MCP server that degrades to uncached operation during a Redis outage is far better than one that returns isError: true for every tool call because the cache is unavailable.
Per-session rate limiting
LLMs in autonomous reasoning loops can call the same tool dozens of times per minute. A sliding-window rate limiter in Redis enforces a per-session cap. The Lua script below is atomic — it reads and writes the counter in a single Redis roundtrip, preventing race conditions between concurrent tool calls within the same session.
// src/rate-limit.ts
import redis from './redis.js';
// Lua script: atomic sliding window using a sorted set
// KEYS[1]: the rate limit key (e.g., "ratelimit:session_abc:fetch_user")
// ARGV[1]: current timestamp in milliseconds
// ARGV[2]: window size in milliseconds
// ARGV[3]: max requests per window
// ARGV[4]: TTL for the key in seconds
// Returns: 1 if allowed, 0 if rate limited
const RATE_LIMIT_SCRIPT = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local ttl = tonumber(ARGV[4])
local window_start = now - window
-- Remove entries outside the current window
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
-- Count entries within the window
local count = redis.call('ZCARD', key)
if count >= limit then
return 0
end
-- Add current request with timestamp as score
redis.call('ZADD', key, now, now .. ':' .. math.random(1000000))
redis.call('EXPIRE', key, ttl)
return 1
`;
export async function checkRateLimit(
sessionId: string,
toolName: string,
maxPerMinute = 30
): Promise {
const key = `ratelimit:${sessionId}:${toolName}`;
const now = Date.now();
const windowMs = 60 * 1000;
const ttlSeconds = 120;
const result = await redis.eval(
RATE_LIMIT_SCRIPT, 1, key,
now, windowMs, maxPerMinute, ttlSeconds
) as number;
return result === 1;
}
// In a tool handler:
// const allowed = await checkRateLimit(session.id, 'send_email', 5);
// if (!allowed) return { isError: true, content: [{ type: 'text', text: 'Rate limit exceeded: max 5 send_email calls per minute' }] };
Distributed locks for idempotent operations
A distributed lock (Redlock pattern) prevents two concurrent tool calls from performing the same irreversible operation. The lock acquires an exclusive key in Redis with a TTL; if the lock is already held, the second caller either waits or returns a "try again" error to the LLM.
// src/lock.ts
import redis from './redis.js';
import crypto from 'crypto';
export async function withLock(
resource: string,
ttlMs: number,
fn: () => Promise
): Promise {
const lockKey = `lock:${resource}`;
const lockValue = crypto.randomBytes(16).toString('hex');
// SET key value NX PX ttl — atomic acquire; returns OK or null
const acquired = await redis.set(lockKey, lockValue, 'PX', ttlMs, 'NX');
if (!acquired) return null; // Lock held by another caller
try {
return await fn();
} finally {
// Lua: only release the lock if we still own it (lockValue matches)
const releaseScript = `
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('DEL', KEYS[1])
else
return 0
end
`;
await redis.eval(releaseScript, 1, lockKey, lockValue);
}
}
// Usage:
// const result = await withLock('send-email:user-123', 30_000, () => sendEmail(userId));
// if (result === null) return { isError: true, content: [{ type: 'text', text: 'Operation in progress — try again in a moment' }] };
Graceful shutdown — quit vs. disconnect
process.on('SIGTERM', async () => {
serverState = 'draining';
httpServer.close();
// Wait for active sessions to finish
const drainStart = Date.now();
while (activeSessions.size > 0 && Date.now() - drainStart < DRAIN_TIMEOUT_MS) {
await new Promise(resolve => setTimeout(resolve, 100));
}
// redis.quit() sends QUIT command and waits for acknowledgement —
// in-flight commands are completed before the connection closes.
// redis.disconnect() closes the socket immediately — in-flight commands are lost.
await redis.quit();
process.exit(0);
});
AliveMCP and Redis health
A Redis failure that causes cache misses does not affect MCP protocol correctness if the cache-aside wrapper falls through correctly. But a Redis failure that blocks the event loop (a Redis command that never resolves because the connection is dead and enableOfflineQueue is filling up) does affect tool call latency, which AliveMCP detects as elevated probe response times before timeouts. The MCP server health check endpoint should include a Redis PING check alongside the database check — AliveMCP probes /health to distinguish Redis degradation from full server failure.
Related questions
Should I use Redis or an in-memory Map for tool response caching?
An in-memory Map works for a single-process MCP server and has zero network overhead. Redis is necessary when you have multiple workers (Node.js cluster mode, multiple Fly.io machines, Kubernetes pods) — an in-memory cache is not shared across processes, so cache hit rate drops to near zero across a fleet. Redis is also necessary for rate limiting across workers: a per-process rate limiter allows each worker to accept the full rate limit quota independently. For a single-process server, use an in-memory TTL map (e.g. lru-cache from npm) and only add Redis when you scale to multiple workers.
How do I handle Redis being unavailable at startup?
Do not block startup on Redis availability. The pattern in the singleton above (redis.on('error', log) + enableOfflineQueue: true) means the application starts, attempts to connect, and queues commands until Redis is available. For rate limiting, fail open (allow the request) when Redis is unavailable rather than blocking all tool calls. For distributed locks, proceed without the lock — the lock is a safety net, not a hard requirement.
Can I use Redis as the primary session store instead of an in-process Map?
Yes, but with caveats. Storing MCP session state (tool call history, context, active SSE connection reference) in Redis enables session resumption after process restart and load balancing across workers. The SSE connection itself still lives in a single process — the session state in Redis supplements it. Libraries like connect-redis handle the session storage pattern. The trade-off: every tool call now makes one Redis read to hydrate session state, adding ~1ms of network latency versus an in-process Map lookup.
Further reading
- MCP server SQLite — embedded persistence layer that Redis caches in front of
- MCP server Prisma — ORM whose query results Redis caches with TTL
- MCP server Drizzle ORM — alternative ORM with the same Redis caching pattern
- MCP server connection pooling — Redis client pool sizing alongside DB pool
- MCP server graceful shutdown — quit ordering after session drain
- MCP server rate limiting — full rate limiting strategy beyond Redis counters
- MCP server health check — including Redis PING in the /health endpoint
- MCP server Fly.io — attaching a Redis instance on Fly for single-region deployments
- AliveMCP — external monitoring that detects Redis-induced latency spikes before users notice