Guide · Database & Event Architecture

Read Replica Routing for MCP Servers — write splitting, lag detection, and health checks

When an AI agent runs a research workflow — listing documents, searching records, fetching details — it generates many read queries per write. Routing those reads to a PostgreSQL read replica offloads the primary, reduces query latency under parallel agent fan-out, and leaves the primary's connection pool headroom for writes. The risk is replication lag: async streaming replication means the replica trails the primary by zero to several seconds. An agent that writes a record and immediately reads it back may see stale data if the read goes to the replica before the write propagates. This guide covers how to classify tools as read or write, how to implement two-pool routing, how to detect replica lag and fall back gracefully, and how to build health checks that validate both database paths simultaneously.

TL;DR

Create two pg.Pool instances: one pointing at the primary, one at the replica. Classify each tool as read-only or write. Route by tool name. Check replica lag on every /ready probe: if lag_seconds > threshold, stop routing reads to the replica and fall back to primary for all queries. Register both pools in your AliveMCP health URL so you alert on primary failure or replica failure independently. A canary tool call that writes to primary and reads from replica validates replication is actually working end-to-end.

Why read replicas help MCP servers specifically

MCP servers serving AI agents experience a read-heavy traffic pattern. A planning agent typically reads 10–50 records per write: it searches for context, retrieves details, checks constraints, and only then calls a mutating tool. Multi-agent fan-out amplifies this: five agents running simultaneously each making 20 read calls generates 100 concurrent read queries.

Without read replicas, all 100 queries compete for connections on the primary. This exhausts the primary's connection pool, adds lock contention pressure during write operations, and causes query latency to spike across all agents. Adding a read replica with a separate connection pool gives read-heavy tool calls a dedicated path that does not compete with write traffic.

The tradeoff is complexity: you must classify tools, manage two pools, detect replication lag, and handle the partial-failure case where the replica is down but the primary is healthy (or vice versa).

Classifying tools as read or write

The classification is explicit, not inferred. Never attempt to parse SQL to determine read vs write — generate tool lists or annotations are more reliable and safer.

// Explicit tool classification — define once, reuse everywhere
const WRITE_TOOLS = new Set([
  'create_item',
  'update_item',
  'delete_item',
  'create_user',
  'update_user',
  'soft_delete_user',
  'create_ticket',
  'close_ticket',
  'append_comment',
]);

// Everything not in WRITE_TOOLS is a read — safe default
function isWriteTool(toolName: string): boolean {
  return WRITE_TOOLS.has(toolName);
}

// Pool selector
function selectPool(toolName: string): Pool {
  return isWriteTool(toolName) ? primaryPool : replicaPool;
}

Annotation-based classification

For larger codebases, embed classification in the tool definition using MCP's annotations field:

// Using MCP tool annotations to declare read/write intent
server.tool(
  'search_items',
  { query: z.string(), limit: z.number().default(20) },
  { readOnlyHint: true, idempotentHint: true },  // MCP tool annotations
  async (args) => {
    const pool = replicaPool;
    // ...
  }
);

server.tool(
  'create_item',
  { name: z.string(), type: z.string() },
  { readOnlyHint: false, destructiveHint: false },
  async (args) => {
    const pool = primaryPool;
    // ...
  }
);

Two-pool implementation

import { Pool } from 'pg';

// Primary — for all writes and reads that cannot tolerate lag
const primaryPool = new Pool({
  host: process.env.PGHOST_PRIMARY,
  port: parseInt(process.env.PGPORT ?? '5432'),
  database: process.env.PGDATABASE,
  user: process.env.PGUSER,
  password: process.env.PGPASSWORD,
  max: 20,
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 3_000,
});

// Replica — for read-only tools
const replicaPool = new Pool({
  host: process.env.PGHOST_REPLICA,
  port: parseInt(process.env.PGPORT ?? '5432'),
  database: process.env.PGDATABASE,
  user: process.env.PGUSER,
  password: process.env.PGPASSWORD,
  max: 30,             // Replicas can support more parallel read connections
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 3_000,
});

// Central routing function used by all tool handlers
async function withPool(
  toolName: string,
  fn: (client: PoolClient) => Promise
): Promise {
  const pool = isWriteTool(toolName) ? primaryPool : getActiveReadPool();
  const client = await pool.connect();
  try {
    return await fn(client);
  } finally {
    client.release();
  }
}

// Usage in tool handler
server.tool('search_items', SearchSchema.shape, async (args, context) => {
  const result = await withPool('search_items', async (client) => {
    return client.query('SELECT * FROM items WHERE name ILIKE $1 LIMIT $2',
      [`%${args.query}%`, args.limit]);
  });
  return { content: [{ type: 'text', text: JSON.stringify(result.rows) }] };
});

Replication lag detection and fallback

PostgreSQL streaming replication is asynchronous by default: the primary commits a transaction and only then sends the WAL to the replica. If the replica is under load, has a slow network connection to the primary, or is catching up from a restart, the lag can grow from milliseconds to minutes.

Measuring lag from the replica

// Query replica lag — call from health check and routing logic
async function getReplicaLagSeconds(): Promise {
  const client = await replicaPool.connect();
  try {
    const result = await client.query<{ lag: string }>(`
      SELECT extract(epoch FROM (now() - pg_last_xact_replay_timestamp()))
      AS lag
    `);
    const lag = parseFloat(result.rows[0]?.lag);
    return isNaN(lag) ? null : lag;
  } catch {
    return null;  // null = cannot measure = treat as lagging
  } finally {
    client.release();
  }
}

Lag-aware routing

// Maintain routing state in module scope
let replicaHealthy = true;
let replicaLagSeconds: number | null = null;
const LAG_THRESHOLD_SECONDS = 30;

// Background lag check — every 10 seconds
setInterval(async () => {
  const lag = await getReplicaLagSeconds();
  replicaLagSeconds = lag;
  replicaHealthy = lag !== null && lag < LAG_THRESHOLD_SECONDS;

  if (!replicaHealthy) {
    console.log(JSON.stringify({
      level: 'warn',
      event: 'replica_lag_exceeded',
      lag_seconds: lag,
      routing: 'falling_back_to_primary',
    }));
  }
}, 10_000);

// Routing considers replica health
function getActiveReadPool(): Pool {
  return replicaHealthy ? replicaPool : primaryPool;
}

The fallback to primary on high lag is the correct behavior: it avoids serving stale data at the cost of temporarily increased primary load. Alert on the fallback event — it means either the replica is under stress or the replication link is degraded. Investigate before primary load becomes a problem too.

The read-your-own-writes problem

An agent that writes a record and immediately reads it back must both operations hit the same server (the primary). Identify sequences where a write tool call is immediately followed by a read tool call in the same agent turn and route both to the primary:

// Write operation: always primary
const writeResult = await withPool('create_item', async (client) =>
  client.query('INSERT INTO items (...) VALUES (...) RETURNING id', [...])
);
const newId = writeResult.rows[0].id;

// Immediate read-after-write: also primary (replica may not have it yet)
// Pass explicit pool override for reads that follow writes
const readResult = await withPrimaryPool(async (client) =>
  client.query('SELECT * FROM items WHERE id = $1', [newId])
);

Health checks for both database paths

A health endpoint for a two-pool MCP server must check both paths separately. The primary being down is a full outage (all writes fail, all reads fall back to primary and may overload it). The replica being down is a partial outage (reads fall back to primary, increased primary load, but writes still work).

// /ready endpoint — checks both database paths
app.get('/ready', async (req, res) => {
  const checks: Record = {};
  let status = 'ok';
  let httpStatus = 200;

  // Primary check — must pass for server to serve any traffic
  try {
    const client = await primaryPool.connect();
    await client.query('SELECT 1');
    client.release();
    checks.primary = 'ok';
  } catch (err) {
    checks.primary = 'unreachable';
    status = 'unhealthy';
    httpStatus = 503;
  }

  // Replica check — degraded if down (not full outage)
  try {
    const client = await replicaPool.connect();
    await client.query('SELECT 1');
    client.release();
    checks.replica = 'ok';
    checks.replica_lag_seconds = replicaLagSeconds;
    checks.replica_routing = replicaHealthy ? 'active' : 'fallen_back_to_primary';
  } catch (err) {
    checks.replica = 'unreachable';
    if (status === 'ok') {
      status = 'degraded';
      httpStatus = 503;
    }
  }

  res.status(httpStatus).json({ status, checks });
});

In Kubernetes, use a readiness probe at /ready so that MCP servers with a downed primary are removed from load balancer rotation. The liveness probe at /live should remain separate and lightweight — it should not check the database, as a database outage does not mean the process should restart.

Canary validation with AliveMCP

AliveMCP's protocol probe validates that the MCP server is accepting connections and returning tool definitions. To validate that replication is actually working, add a write-then-read canary tool:

// Canary tool: write to primary, read from replica, verify propagation
server.tool('health_check_replication', {}, async () => {
  const canaryId = `canary-${Date.now()}`;

  // Write to primary
  const writeClient = await primaryPool.connect();
  try {
    await writeClient.query(
      'INSERT INTO health_canaries (id, created_at) VALUES ($1, now()) ON CONFLICT (id) DO UPDATE SET created_at = now()',
      [canaryId]
    );
  } finally {
    writeClient.release();
  }

  // Wait up to 5 seconds for replica to catch up
  const deadline = Date.now() + 5_000;
  while (Date.now() < deadline) {
    await new Promise(r => setTimeout(r, 500));
    const readClient = await replicaPool.connect();
    try {
      const result = await readClient.query(
        'SELECT id FROM health_canaries WHERE id = $1',
        [canaryId]
      );
      if (result.rows.length > 0) {
        return {
          content: [{
            type: 'text',
            text: JSON.stringify({ ok: true, replication_lag_seconds: replicaLagSeconds }),
          }],
        };
      }
    } finally {
      readClient.release();
    }
  }

  throw new Error(`replication_canary_timeout: row not visible on replica within 5s`);
});

Configure AliveMCP to call health_check_replication on each probe cycle. If replication breaks — replica disconnected from primary, WAL receiver stalled, disk full on replica — this canary fails and AliveMCP alerts within 60 seconds. Without this, the only way to detect a replication break is to notice stale data in production tool responses.

Frequently asked questions

What is the maximum safe replication lag for MCP read tools?

It depends on the freshness requirements of the data the tool returns. For inventory levels or deployment status that agents act on, even 5–10 seconds of lag can lead to incorrect decisions. For historical analytics or document search where records change infrequently, 30–60 seconds of lag is acceptable. Set the lag threshold at the point where stale data would cause an agent to take an incorrect action — then add a safety margin. When in doubt, start at 10 seconds and adjust based on operational experience.

Should the replica pool size be larger than the primary pool size?

Yes, typically. Read replicas are designed to serve high-concurrency read traffic. You can safely set the replica pool max higher than the primary pool because: (1) reads don't hold row locks, reducing contention; (2) replicas can be scaled vertically or horizontally without affecting write operations; (3) the replica's max_connections can be configured independently of the primary. A ratio of 1.5–2× replica pool size vs primary pool size is common for read-heavy MCP workloads.

What happens during a primary failover (replica promoted to primary)?

During a PostgreSQL failover (e.g., via Patroni, pg_auto_failover, or RDS Multi-AZ), there is a brief period where the old primary is unreachable and the new primary (promoted replica) is not yet accepting connections. Both pools will fail connection attempts during this window — typically 10–30 seconds. Your health endpoint will return 503 for both database checks. The MCP server should return isError: true for all tool calls during the failover window rather than queuing them indefinitely. Update the pool connection strings to point to the new primary after failover if they're address-based rather than DNS-resolved.

Do Drizzle ORM or Prisma support read replica routing?

Both support it. Drizzle: create two separate drizzle(primaryPool) and drizzle(replicaPool) instances and select based on tool name. Prisma: use the @prisma/read-replicas extension, which adds a $replica() client modifier that routes the query to a randomly-selected replica. The replica extension handles the routing at the client level without you managing two pool objects. Both approaches produce identical results; the ORM approach is more ergonomic but requires that your ORM supports the query type you need on the replica (Prisma's replica extension may not support all raw query patterns).

Can I use AWS RDS Proxy or similar managed proxies instead of managing two pools myself?

Yes. AWS RDS Proxy, Google Cloud SQL Auth Proxy, and Azure Database Flexible Server all support read/write endpoint splitting at the infrastructure level. You configure a read endpoint URL and a write endpoint URL; the proxy handles connection pooling, failover, and routing. This offloads the pool management to managed infrastructure. The tradeoff: you lose visibility into pool metrics (waiting count, utilization) that are available from pg.Pool. If you rely on those metrics for health checks and monitoring, instrument via the proxy's metrics API or add a secondary pool metrics endpoint.