Guide · Multi-Tenant SaaS

Multi-Tenant Database Patterns for MCP Servers — RLS, schema-per-tenant, and database-per-tenant

The first architectural decision in a multi-tenant MCP server is how to isolate tenant data at the database layer. The choice locks in operational complexity, migration strategy, connection pool architecture, and the blast radius of any data access bug for the life of the product. Three patterns dominate: shared tables with PostgreSQL row-level security (RLS), schema-per-tenant (one PostgreSQL schema per tenant in a shared database), and database-per-tenant (each tenant gets their own PostgreSQL database or cluster). Each offers different isolation guarantees, operational overhead, and scalability ceiling. This guide gives an honest comparison so you can choose the right pattern for your stage — and migrate to the next pattern as scale demands change.

TL;DR

Start with shared tables + RLS if you expect >500 tenants or want minimal operational overhead. Use schema-per-tenant if you need stronger migration isolation (per-tenant schema migrations run independently) and can manage pool complexity. Use database-per-tenant only for enterprise/compliance segments where data residency, independent backups, and physical isolation are contractual requirements. Hybrid is common: RLS for operational metadata, schema-per-tenant for tenant business data, database-per-tenant for enterprise tiers.

Pattern comparison

Dimension	Shared tables + RLS	Schema-per-tenant	Database-per-tenant
Data isolation	Database-enforced (RLS policies)	PostgreSQL namespace isolation	Full process isolation
Tenant limit (practical)	Unlimited (10,000+)	~500 with lazy pools	~50 per host (cost-limited)
Migration strategy	Single migration for all tenants	Per-schema migration (serial or parallel)	Per-database migration (fully parallel)
Cross-tenant queries	Easy (admin role, no RLS)	Requires cross-schema joins or aggregation	Requires federated query or ETL
Connection pool overhead	One shared pool	One pool per active tenant (LRU-evicted)	One pool per database instance
Noisy neighbor risk	High (shared autovacuum, WAL, IOPS)	Medium (shared vacuum but namespace isolated)	None (dedicated instance)
Backup granularity	Full database (all tenants)	Full database (pg_dump --schema for per-tenant)	Per-tenant database backup
Compliance (data residency)	Difficult (all tenants same region)	Difficult (same host)	Easy (per-tenant host placement)
Provisioning speed	Milliseconds (INSERT only)	Seconds (CREATE SCHEMA + migrations)	Minutes (new database instance)
MCP server pool design	Single pool, context via session var	Per-tenant pool with search_path	Per-tenant pool with separate connection string

Pattern 1: Shared tables with row-level security

All tenants share the same tables. A tenant_id column on every table, combined with PostgreSQL RLS policies, ensures each query only sees its tenant's rows. The MCP server sets a session variable before each query to identify the current tenant.

Best for: B2C SaaS with many small tenants, products with homogeneous data models across tenants, startups that want simple operations.

The connection architecture is the simplest:

// Single pool — all tenants use the same connections
const sharedPool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,
  // Connect as non-superuser app_user that has RLS policies applied
});

// Inject tenant context before every query
async function queryForTenant<T>(tenantId: string, sql: string, params: unknown[]): Promise<T[]> {
  const client = await sharedPool.connect();
  try {
    await client.query('BEGIN');
    await client.query(`SELECT set_config('app.current_tenant_id', $1, true)`, [tenantId]);
    const result = await client.query<T>(sql, params);
    await client.query('COMMIT');
    return result.rows;
  } finally {
    client.release();
  }
}

RLS policy enforcement gap: The most common mistake is running some queries as the database owner (superuser), which bypasses RLS. Audit all connection strings in your codebase — any that connect as postgres or the admin user bypass isolation. Admin queries are fine (billing, platform analytics), but tool handler queries must always use the app_user role.

Performance: With CREATE INDEX ON tool_calls (tenant_id), PostgreSQL treats the RLS policy as an additional WHERE clause and uses the index. Queries for one tenant in a 100-million-row table are as fast as if they had their own 100,000-row table — the planner sees an index scan over a narrow range.

Pattern 2: Schema-per-tenant

Each tenant gets their own PostgreSQL schema. Tables are identical across schemas; tool handlers connect using a pool whose search_path is set to the tenant's schema. No RLS needed — the schema boundary provides isolation.

Best for: B2B SaaS with medium tenant counts (10–500), products where tenants require different schema versions (progressive rollouts of migrations), or where per-tenant backup/restore is operationally important.

// Schema-per-tenant: pool registry with LRU eviction
import LRU from 'lru-cache';
import { Pool } from 'pg';

const poolCache = new LRU<string, Pool>({
  max: 200,     // max 200 active schemas × 5 connections = 1000 PostgreSQL connections
  dispose: async (pool) => { await pool.end(); },
  ttl: 30 * 60 * 1000,   // evict idle schemas after 30 minutes
});

async function getTenantPool(tenantId: string): Promise<Pool> {
  if (poolCache.has(tenantId)) return poolCache.get(tenantId)!;

  const schema = `tenant_${tenantId.replace(/-/g, '_')}`;
  const pool = new Pool({ connectionString: process.env.DATABASE_URL, max: 5 });
  pool.on('connect', client => client.query(`SET search_path TO ${schema}, public`));

  // Warm one connection to verify schema exists
  const client = await pool.connect();
  await client.query('SELECT 1');
  client.release();

  poolCache.set(tenantId, pool);
  return pool;
}

// Tool handler: just get the tenant's pool and query normally
server.tool('search_inventory', schema, async (args) => {
  const tenant = getTenantFromSession();
  const pool = await getTenantPool(tenant.id);
  const rows = await pool.query('SELECT * FROM inventory WHERE name ILIKE $1', [`%${args.q}%`]);
  return { content: [{ type: 'text', text: JSON.stringify(rows.rows) }] };
});

Migration strategy for schema-per-tenant: Run migrations serially across schemas with a migration job that iterates through all tenant schemas. Use a migration tracking table inside each schema (_migrations) so the runner can skip already-applied migrations. For large tenant counts, run migrations in parallel batches with a concurrency limit:

// migrations/run-all.ts
const tenants = await db.query('SELECT id FROM tenants WHERE status = $1', ['active']);
const CONCURRENCY = 10;

for (let i = 0; i < tenants.rows.length; i += CONCURRENCY) {
  const batch = tenants.rows.slice(i, i + CONCURRENCY);
  await Promise.all(batch.map(t => runMigrationsForTenant(t.id)));
}

Pattern 3: Database-per-tenant

Each tenant has their own PostgreSQL database (or cluster on a dedicated host). The MCP server maintains a connection string per tenant, creating a pool for each database.

Best for: Enterprise SaaS where data residency (GDPR, HIPAA, SOC 2) requires physical isolation, tenants with large datasets that would benefit from independent autovacuum tuning, or tenants that require independent point-in-time recovery.

// Database-per-tenant: connection string lookup
const tenantConnectionStrings = new Map<string, string>();
const tenantPools = new Map<string, Pool>();

async function getTenantPool(tenantId: string): Promise<Pool> {
  if (tenantPools.has(tenantId)) return tenantPools.get(tenantId)!;

  // Fetch connection string from secrets manager (per-tenant)
  let connStr = tenantConnectionStrings.get(tenantId);
  if (!connStr) {
    connStr = await secretsManager.getSecretString(`mcp-tenant-db/${tenantId}`);
    tenantConnectionStrings.set(tenantId, connStr);
  }

  const pool = new Pool({ connectionString: connStr, max: 5 });
  tenantPools.set(tenantId, pool);
  return pool;
}

// No search_path or RLS needed — the database itself is isolated
server.tool('search_inventory', schema, async (args) => {
  const tenant = getTenantFromSession();
  const pool = await getTenantPool(tenant.id);
  const rows = await pool.query('SELECT * FROM inventory WHERE name ILIKE $1', [`%${args.q}%`]);
  return { content: [{ type: 'text', text: JSON.stringify(rows.rows) }] };
});

Provisioning complexity: Creating a new database-per-tenant requires provisioning the database instance (RDS CreateDBInstance, Cloud SQL instance, etc.), which takes 3–10 minutes. New tenant onboarding must be fully asynchronous. Store provisioning status in a control plane database and poll until the instance is ready before running migrations and marking the tenant active.

Hybrid patterns

Most production multi-tenant MCP systems use a hybrid rather than a pure pattern. Common combinations:

Tier	Pattern	Rationale
Free / Starter tenants	Shared tables + RLS	Thousands of tenants, low per-tenant cost
Pro / Business tenants	Schema-per-tenant	Better isolation, per-tenant migration control
Enterprise tenants	Database-per-tenant	Compliance, data residency, dedicated SLA

The MCP server handles all three tiers by dispatching to the appropriate pool getter based on the tenant's tier — a strategy field in the tenants table tells the dispatcher which pattern to use for that tenant:

type IsolationStrategy = 'rls' | 'schema' | 'database';

async function queryTenant<T>(tenantId: string, sql: string, params: unknown[]): Promise<T[]> {
  const strategy = await getTenantStrategy(tenantId);  // cached

  if (strategy === 'rls') {
    return queryWithRls(tenantId, sql, params);
  } else if (strategy === 'schema') {
    const pool = await getSchemaPool(tenantId);
    return (await pool.query(sql, params)).rows;
  } else {
    const pool = await getDatabasePool(tenantId);
    return (await pool.query(sql, params)).rows;
  }
}

Health monitoring across isolation patterns

Each pattern introduces different failure modes. Your /health endpoint should check the infrastructure for the pattern you're using:

// /health: pattern-aware health checks
async function getDatabaseHealth() {
  const checks: Record<string, string> = {};

  // RLS check: verify context injection works with canary tenant
  if (USE_RLS) {
    const rlsOk = await checkRlsCanary();
    checks.rls_context = rlsOk ? 'ok' : 'broken';
  }

  // Schema-per-tenant check: verify active pool count and pool error rates
  if (USE_SCHEMA_PER_TENANT) {
    const poolCount = poolCache.size;
    const erroneousPools = await countPoolsWithErrors();
    checks.tenant_pools = erroneousPools > 0 ? 'degraded' : 'ok';
    checks.active_pool_count = String(poolCount);
  }

  // Database-per-tenant check: verify control plane connectivity
  if (USE_DB_PER_TENANT) {
    const controlPlaneOk = await pingControlPlane();
    checks.control_plane = controlPlaneOk ? 'ok' : 'down';
  }

  return checks;
}

AliveMCP polling your /health endpoint catches RLS context failures (silent data leaks), schema pool exhaustion (tenants blocked), and control plane disconnection (new tenants cannot be provisioned) — all before they surface as tool call errors in production agent sessions.

Frequently asked questions

When should I migrate from RLS to schema-per-tenant?

Migrate when: (1) you need to run schema migrations at different paces for different tenants (feature flags, progressive rollouts, beta groups), (2) tenants have meaningfully different data volumes that would benefit from per-schema autovacuum tuning, or (3) a single large tenant's query load (autovacuum, WAL generation, lock contention) is affecting other tenants. Migration path: provision schema-per-tenant infrastructure in parallel, add a strategy field to the tenants table, migrate tenants batch-by-batch using pg_dump --data-only --table per tenant into their new schema, and switch the dispatcher one tenant at a time. This can be done with zero downtime with a read-during-migration pattern.

Can I use Prisma or Drizzle with schema-per-tenant?

Yes, but with caveats. Prisma's client is generated against a fixed schema; to use schema-per-tenant you need to either use Prisma's datasources override per tenant (creating a new PrismaClient per tenant, which is expensive) or drop down to raw SQL for the search_path override and use Prisma for migrations only. Drizzle is more flexible: it works against a pg Pool and you can configure the pool's search_path on connect. For new multi-tenant MCP servers, Drizzle or raw pg with explicit search_path management is operationally cleaner than Prisma's schema-per-datasource overhead.

How do I run analytics queries across all tenants with RLS?

Analytics queries that aggregate across tenants (daily active tenants, cross-tenant usage reports, platform-level metrics) must bypass RLS. Use a dedicated analytics database role with no RLS policies applied (or a superuser connection from a secure backend service). Never expose a cross-tenant analytics endpoint via the MCP server's tool interface — tools run in the tenant's context and should only return that tenant's data. Keep cross-tenant analytics in a separate admin service with its own authentication layer, not co-mingled with MCP tool handlers that use the RLS-enforced application role.

What is the connection pool math for schema-per-tenant?

PostgreSQL's default max_connections is 100 (often set to 200 for production). With 5 connections per tenant pool and 20 reserved for admin/migrations: (200 - 20) / 5 = 36 simultaneously active tenant pools. With LRU eviction at 30 minutes, 36 active pools supports 36 tenants active in the last 30 minutes — typically fine for SaaS where tenant activity is bursty. If your tenant activation rate exceeds 36 simultaneously active, either increase PostgreSQL max_connections (each connection consumes ~10MB RAM), add PgBouncer as a connection multiplexer, or reduce per-tenant pool size to 2–3 connections (fine if tool calls are short-lived).

How do I do point-in-time recovery for a single tenant with RLS (shared tables)?

You cannot restore a single tenant from a PostgreSQL base backup with shared tables — PITR restores the entire cluster. For per-tenant PITR with RLS, use logical replication or change data capture (CDC) to maintain a per-tenant event log, and replay from that log to reconstruct tenant state at a point in time. This is complex and is one of the strongest arguments for migrating enterprise tenants (who often contractually require per-tenant PITR) to schema-per-tenant or database-per-tenant isolation, where pg_dump --schema or a dedicated backup can restore exactly one tenant's data.