Guide · MCP Resilience

MCP server graceful degradation

Graceful degradation means returning something useful when a dependency fails rather than hard-failing the entire tool call. When your MCP server's database is slow, serve a stale cached result. When your search index is down, return an empty result with a degraded: true flag rather than a 500 error. When a third-party API is unavailable, skip the enrichment step and return the base data. The agent gets a response it can reason about and continue with, rather than an error that terminates the task. Graceful degradation is what separates a resilient production server from one that fails completely whenever a non-critical dependency has a bad minute.

TL;DR

Define degradation tiers for each tool before writing fallback code. Tier 1: full response from live data. Tier 2: cached response with age metadata. Tier 3: partial response (some enrichments skipped). Tier 4: minimal response (IDs only, no details). Tier 5: informative error (dependency down, try again in N minutes). Implement each tier as an explicit fallback in the tool handler, ordered from best to worst. Return a degraded flag in the response so agents know to treat the result accordingly.

Graceful degradation vs graceful shutdown

These are often confused but address different failure modes:

Graceful shutdown — the MCP server process itself is stopping. It drains in-flight requests, closes connections cleanly, and then exits. The server is intentionally going away. (covered separately)
Graceful degradation — the MCP server process is running and healthy, but one or more of its dependencies (database, external API, cache, search index) are experiencing failures or elevated latency. The server continues to operate but returns reduced-quality responses.

Graceful shutdown is about the server's own lifecycle. Graceful degradation is about its dependency health. Both are necessary for a production-grade MCP server.

Degradation tier model

Before writing any fallback code, define the degradation tiers for each tool. What is the minimum acceptable response when each dependency fails?

Tier	State	Response quality	When to use
1	Fully operational	Full live data	All dependencies healthy
2	Database slow/unavailable	Stale cached data with `cached_at` timestamp	Redis cache hit; DB read timeout
3	Enrichment service down	Base data without enrichment, `enriched: false`	Optional third-party API unavailable
4	Read replica down, primary overloaded	IDs and essential fields only	DB returning data but extremely slow
5	Primary data source unavailable	Informative error with retry guidance	Nothing can be served safely

Tier 5 is still better than an unhandled exception: it tells the agent how long to wait before retrying, which the agent can use to schedule a delayed retry rather than hammering the server.

Stale cache fallback

The most common degradation pattern is serving a cached result when the authoritative data source is slow. Use Redis with a short TTL for the "fresh" cache and a longer TTL for the "stale" fallback:

import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

async function withStaleCache<T>(
  key: string,
  fetchFresh: () => Promise<T>,
  options: {
    freshTtlSeconds: number;   // e.g. 60 — serve from cache for 1 min before re-fetching
    staleTtlSeconds: number;   // e.g. 3600 — keep stale copy for 1 hour as fallback
    timeoutMs: number;         // e.g. 2000 — how long to wait for fresh data
  }
): Promise<{ data: T; stale: boolean; cachedAt: string | null }> {
  const freshKey = `fresh:${key}`;
  const staleKey = `stale:${key}`;
  const metaKey = `meta:${key}`;

  // Try fresh cache first
  const cached = await redis.get(freshKey);
  if (cached) {
    const meta = await redis.get(metaKey);
    return { data: JSON.parse(cached) as T, stale: false, cachedAt: meta };
  }

  // Try to fetch live data with timeout
  try {
    const result = await Promise.race([
      fetchFresh(),
      new Promise<never>((_, reject) =>
        setTimeout(() => reject(new Error('fetch_timeout')), options.timeoutMs)
      ),
    ]);

    const cachedAt = new Date().toISOString();
    // Update both fresh and stale caches
    await Promise.all([
      redis.set(freshKey, JSON.stringify(result), { EX: options.freshTtlSeconds }),
      redis.set(staleKey, JSON.stringify(result), { EX: options.staleTtlSeconds }),
      redis.set(metaKey, cachedAt, { EX: options.staleTtlSeconds }),
    ]);
    return { data: result, stale: false, cachedAt };
  } catch (err) {
    // Live fetch failed — try stale fallback
    const stale = await redis.get(staleKey);
    const meta = await redis.get(metaKey);
    if (stale) {
      return { data: JSON.parse(stale) as T, stale: true, cachedAt: meta };
    }
    // No cache at all — re-throw
    throw err;
  }
}

// Tool using stale-cache fallback
server.tool(
  'get_account',
  'Get account details by ID',
  { accountId: z.string() },
  async ({ accountId }) => {
    const { data, stale, cachedAt } = await withStaleCache(
      `account:${accountId}`,
      () => db.accounts.findById(accountId),
      { freshTtlSeconds: 30, staleTtlSeconds: 3600, timeoutMs: 2000 }
    );

    return {
      content: [{
        type: 'text',
        text: JSON.stringify({ ...data, _meta: { stale, cachedAt } }),
      }],
    };
  }
);

Partial response pattern

When a non-critical enrichment service is unavailable, return the base data with the enrichment skipped rather than failing the entire call:

server.tool(
  'get_company',
  'Get company details with optional LinkedIn enrichment and funding data',
  { companyId: z.string() },
  async ({ companyId }) => {
    // Core data — required; failure here is a real error
    const company = await db.companies.findById(companyId);
    if (!company) throw new Error(`Company ${companyId} not found`);

    const enrichments: Record<string, unknown> = {};
    const skipped: string[] = [];

    // LinkedIn enrichment — optional; degrade gracefully if unavailable
    try {
      const linkedin = await linkedinApi.getCompanyProfile(company.domain);
      enrichments.linkedin = linkedin;
    } catch {
      skipped.push('linkedin_profile');
    }

    // Funding data — optional
    try {
      const funding = await crunchbaseApi.getFunding(company.domain);
      enrichments.funding = funding;
    } catch {
      skipped.push('funding_data');
    }

    return {
      content: [{
        type: 'text',
        text: JSON.stringify({
          ...company,
          ...enrichments,
          _meta: {
            degraded: skipped.length > 0,
            skipped,
            note: skipped.length > 0
              ? `${skipped.join(', ')} unavailable — base data returned`
              : undefined,
          },
        }),
      }],
    };
  }
);

The agent receives the base company data and can proceed with its task. The _meta.skipped field tells the agent exactly which enrichments were omitted, so it can factor that into its reasoning.

Signaling degraded state to agents

Agents make better decisions when they know a response is degraded. Establish a consistent _meta convention across all your tools:

interface ResponseMeta {
  degraded?: boolean;          // true if any fallback was used
  degradationReason?: string;  // human-readable: 'database_slow', 'enrichment_unavailable'
  cachedAt?: string;           // ISO 8601 — when the cached data was fetched
  stale?: boolean;             // true if served from stale cache
  skipped?: string[];          // list of skipped enrichments/operations
  retryAfterSeconds?: number;  // if degraded: how long before trying again
}

An agent can detect degraded: true and decide whether to: accept the partial result and continue the task, note the limitation in its output to the user, or schedule a retry for operations where fresh data is required.

Circuit breaker integration

Graceful degradation works best when combined with a circuit breaker. When a dependency is consistently failing, the circuit opens and subsequent calls fail fast — returning the stale cache or partial response without waiting for the full timeout on every request:

// Pseudo-code combining circuit breaker with graceful degradation
async function callWithFallback<T>(
  circuitBreaker: CircuitBreaker,
  fetchFresh: () => Promise<T>,
  fetchFallback: () => Promise<{ data: T; degraded: true }>
): Promise<{ data: T; degraded: boolean }> {
  try {
    const data = await circuitBreaker.execute(fetchFresh);
    return { data, degraded: false };
  } catch (err) {
    // Circuit is open or live fetch failed — use fallback
    const fallback = await fetchFallback();
    return fallback;
  }
}

The circuit breaker eliminates the timeout wait — once the circuit opens, the fallback is returned immediately rather than after a 2-second timeout on every call. This keeps response time consistent even during extended dependency outages.

Health check integration

Expose degradation state in your health check endpoint so external monitors can distinguish "fully operational" from "degraded but serving":

app.get('/health', (req, res) => {
  const status = {
    status: 'ok',            // ok | degraded | down
    version: SERVER_VERSION,
    dependencies: {
      database: db.isHealthy() ? 'ok' : 'degraded',
      redis: redis.isReady ? 'ok' : 'degraded',
      searchIndex: searchIndex.isHealthy() ? 'ok' : 'degraded',
    },
    degradedFeatures: [] as string[],
  };

  if (status.dependencies.database !== 'ok') {
    status.status = 'degraded';
    status.degradedFeatures.push('live_data_reads');
  }
  if (status.dependencies.searchIndex !== 'ok') {
    status.status = 'degraded';
    status.degradedFeatures.push('full_text_search');
  }

  const httpStatus = status.status === 'down' ? 503 : 200;
  res.status(httpStatus).json(status);
});

AliveMCP probes this endpoint on every check cycle. A degraded server returns HTTP 200 so it is not flagged as "down" — but the probe body response can be shown in the status dashboard, letting you monitor degradation events without triggering false-positive downtime alerts.