Guide · Caching

MCP server caching

Tool result caching reduces external API calls, cuts upstream rate limit consumption, and lowers tool call latency for repeated queries. In an MCP server, caching sits inside tool handlers — a cache hit returns a stored result before the external call is made. The key design decisions are: what to use as the cache key (a deterministic serialization of the tool arguments), what TTL to set (determined by how stale the data can be before it causes user-visible errors), and what not to cache (tools with side effects or user-specific results). These decisions are different for each tool — there is no one-size-fits-all TTL.

TL;DR

Use the lru-cache package for in-process caching with per-entry TTLs. Build cache keys from a sorted, deterministic serialization of the tool arguments (JSON.stringify(args, Object.keys(args).sort())). Set TTLs based on how stale the data is acceptable, not on how fast the external API is — if users would notice a 5-minute-old result, use a 60-second TTL. Log cache hits and misses to measure hit rate. Never cache tools that write data, trigger notifications, or whose results vary by caller identity without including the identity in the cache key.

In-process LRU cache

import { LRUCache } from 'lru-cache';

// One cache per tool, or a shared cache keyed by tool name + args
const searchCache = new LRUCache<string, string>({
  max: 500,          // max 500 cached entries
  ttl: 5 * 60 * 1000, // 5 minutes in milliseconds
  updateAgeOnGet: false, // TTL is absolute from insertion, not from last access
});

function cacheKey(args: Record<string, unknown>): string {
  // Sort keys for deterministic serialization regardless of argument order
  return JSON.stringify(args, Object.keys(args).sort());
}

server.tool(
  'search_docs',
  'Search the documentation for a query',
  {
    query: z.string().min(1),
    limit: z.number().int().min(1).max(20).default(5),
  },
  async (args) => {
    const key = cacheKey(args);
    const cached = searchCache.get(key);

    if (cached !== undefined) {
      logger.info({ event: 'cache_hit', tool: 'search_docs', key_hash: hashKey(key) });
      return { content: [{ type: 'text', text: cached }] };
    }

    const results = await searchDocsApi(args.query, args.limit);
    const serialized = JSON.stringify(results);

    searchCache.set(key, serialized);
    logger.info({ event: 'cache_miss', tool: 'search_docs', key_hash: hashKey(key) });

    return { content: [{ type: 'text', text: serialized }] };
  }
);

The lru-cache package (npm install lru-cache) provides a well-tested in-process cache with LRU eviction and per-entry TTLs. Set updateAgeOnGet: false so cache entries expire absolutely — data that was cached 5 minutes ago expires in 5 minutes whether or not it was read. With updateAgeOnGet: true, a frequently-accessed entry never expires, which means stale data stays cached as long as any client keeps requesting it.

Log the key hash (not the full key) for cache hit/miss events — the full key may contain user query terms. Use a fast non-cryptographic hash like xxhash or a simple FNV hash for the log correlation ID. This gives you enough information to investigate cache behavior without logging user input.

Cache key design

The cache key must be deterministic — the same logical query with arguments in a different order must produce the same key, otherwise the cache has a false miss rate:

// WRONG: argument order matters — these produce different keys
cacheKey({ limit: 5, query: 'mcp monitoring' }) // "{"limit":5,"query":"mcp monitoring"}"
cacheKey({ query: 'mcp monitoring', limit: 5 }) // "{"query":"mcp monitoring","limit":5}"

// CORRECT: sort keys first
function cacheKey(args: Record<string, unknown>): string {
  return JSON.stringify(args, Object.keys(args).sort());
}
// Both produce: "{"limit":5,"query":"mcp monitoring"}"

// For nested objects, deep-sort recursively:
function deepSortKeys(obj: unknown): unknown {
  if (obj === null || typeof obj !== 'object') return obj;
  if (Array.isArray(obj)) return obj.map(deepSortKeys);
  const sorted: Record<string, unknown> = {};
  for (const key of Object.keys(obj as object).sort()) {
    sorted[key] = deepSortKeys((obj as Record<string, unknown>)[key]);
  }
  return sorted;
}

function cacheKey(args: Record<string, unknown>): string {
  return JSON.stringify(deepSortKeys(args));
}

For tools where the cache should be per-caller (user-specific results), include the caller identity in the key: `${userId}:${JSON.stringify(deepSortKeys(args))}`. This trades cache efficiency (less sharing between callers) for correctness (different callers get their own results). Never use a shared cache for tools that return user-specific data without including the user ID in the key — this is a data privacy violation, not just a bug.

Redis cache for distributed deployments

import { createClient } from 'redis';
import { createHash } from 'node:crypto';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

async function cachedToolCall<T>(
  toolName: string,
  args: Record<string, unknown>,
  ttlSeconds: number,
  fn: () => Promise<T>
): Promise<T> {
  const rawKey = `${toolName}:${JSON.stringify(deepSortKeys(args))}`;
  const key = `mcpcache:${createHash('sha256').update(rawKey).digest('hex').slice(0, 16)}`;

  const cached = await redis.get(key);
  if (cached) {
    return JSON.parse(cached) as T;
  }

  const result = await fn();
  await redis.setEx(key, ttlSeconds, JSON.stringify(result));
  return result;
}

// Usage in tool handler:
server.tool('get_metrics', 'Get system metrics for a service', {
  service: z.string(),
  window: z.enum(['1m', '5m', '1h']),
}, async (args) => {
  const metrics = await cachedToolCall('get_metrics', args, 60, () => fetchMetrics(args));
  return { content: [{ type: 'text', text: JSON.stringify(metrics) }] };
});

Use Redis caching when your MCP server runs multiple instances — in-process caches are per-instance and do not share state. The SHA-256 hash truncated to 16 hex characters (8 bytes, 2^64 collision space) gives a compact, safe cache key. Use setEx (SET with EXpiry) to ensure entries always have a TTL — Redis without TTLs grows unbounded. Set the Redis key prefix (mcpcache:) to make cache keys easy to identify and flush in bulk if needed (SCAN + DEL or FLUSHDB).

What not to cache

Four categories of tools should never have their results cached:

Mutation tools — any tool that writes data, sends a message, triggers an action, or has side effects. Returning a cached "success" for a mutation that was never executed is a correctness bug. Examples: create_issue, send_notification, update_config.
Time-sensitive tools — tools where a stale result is actively harmful, not just slightly wrong. Examples: get_current_price, check_availability, get_live_status. If the cache TTL must be under 5 seconds to be safe, consider whether caching adds any value at all — the overhead of a cache lookup approaches the benefit.
User-specific tools without identity in the key — tools that return different results based on who is calling (permissions, personal data, account-specific state) must include the caller identity in the cache key. If you cannot reliably determine the caller identity, do not cache.
Non-deterministic tools — tools that call LLMs, random number generators, or any system that returns different results for the same input by design. Caching these defeats the purpose of calling them.

Cache warming and cold start

The cache is cold after every deployment — the first tool call after deploy hits the upstream API with no cache. If your server handles high-traffic tools that are expensive to call cold, warm the cache at startup:

// Warm the cache with the most common queries at startup
async function warmCache() {
  const commonQueries = ['mcp monitoring', 'uptime check', 'health check'];
  await Promise.all(
    commonQueries.map(query =>
      cachedToolCall('search_docs', { query, limit: 5 }, 300, () => searchDocsApi(query, 5))
    )
  );
  logger.info({ event: 'cache_warmed', queries: commonQueries.length });
}

// Call after server starts, before accepting traffic
await warmCache();

AliveMCP's probe calls initialize and tools/list, not tools/call — this means AliveMCP does not warm your tool cache and does not appear in your cache hit/miss metrics. The first real user session after deployment will experience the cold cache. This shows up as a latency spike in response time metrics immediately after deploy — use it as a signal that cache warming is working (latency normalizes after the first few calls) or not working (latency stays elevated).