Multi-Tenant SaaS guide · 2026-06-20 · MCP Server Multi-Tenant SaaS

Building a Multi-Tenant MCP Server: Data Isolation, Usage Metering, and Billing Integration

An MCP server that serves a single user is an agent tool. An MCP server that serves a hundred paying customers is a SaaS product — and the difference between those two things is three operational layers that every single-tenant MCP guide skips entirely: data isolation (each tenant's tool calls must touch only that tenant's data, enforced at the database layer, not by application logic), usage metering (quota enforcement that rejects over-limit tool calls in the hot path without adding a database round-trip), and billing integration (translating tool calls into Stripe usage records before the billing period closes). Miss any one layer and you get: cross-tenant data leaks that end your company, free tool calls that bankrupt your infrastructure, or revenue gaps that make your business unprofitable. This guide synthesizes all three into a cohesive architecture — with the automated onboarding pipeline that wires them together when a new customer signs up.

The three layers and why each one fails silently

Each layer has a characteristic silent failure mode — the kind that keeps the MCP server returning HTTP 200 and JSON-RPC responses while something fundamentally wrong is happening underneath:

Layer	What it provides	Silent failure mode	What catches it
Data isolation	Each tenant sees only their own data	RLS context variable unset — all queries return rows across all tenants, or return zero rows (fail-closed); superuser role bypass makes every query cross-tenant	RLS canary: count own tenant's rows with app_user role — zero rows when context injection is broken
Usage metering	Quota enforcement in the hot path	Redis failure — metering client returns allowed=true for every call; all tenants get unlimited free tool calls until Redis is restored	Redis connectivity check in `/health`; alert on Redis down rather than waiting for the billing gap to appear in Stripe
Billing integration	Usage events reported to Stripe	Background reporter stalled — usage events accumulate in `usage_events` table but are never reported; tenant gets the service without being billed	Billing health in `/health`: track time since last successful Stripe report; alert when unreported events older than 10 minutes

All three layers can fail while the MCP protocol layer stays green. initialize succeeds. tools/list returns the correct manifest. Every tools/call responds with valid JSON. None of that tells you whether tenant A can read tenant B's data, whether a tenant on the free plan is calling tools with no limit, or whether last month's usage was reported to Stripe. The three monitoring checks above run in your /health endpoint — and AliveMCP polls that endpoint every 60 seconds so you know within one check cycle when any layer breaks.

Layer 1: Database isolation — choosing and implementing the right pattern

The first decision in any multi-tenant MCP server is where to draw the tenant isolation boundary at the database layer. Three patterns dominate, and the choice locks in migration strategy, connection pool architecture, and operational overhead for the life of the product:

Pattern	Tenant limit	Migration strategy	Pool architecture	Best for
Shared tables + RLS	10,000+	One migration applies to all tenants	Single pool, `app_user` role, session variable per query	Free/starter tiers; >500 tenants
Schema-per-tenant	~500	Per-tenant migration, independent scheduling	LRU pool cache, `search_path` set on connect	Pro tier; independent migration schedules
Database-per-tenant	~100	Per-tenant migration, full independence	Per-tenant connection string from Secrets Manager	Enterprise tier; GDPR/HIPAA/SOC 2 data residency

A practical hybrid routes tenants by plan tier: free and starter use shared tables with RLS (thousands of tenants, minimal infrastructure cost), pro uses schema-per-tenant (independent migrations, stronger isolation, ~36 simultaneous active pools with max: 200 LRU cache and 30-minute TTL), and enterprise uses database-per-tenant (full process isolation, per-tenant backup granularity, compliance residency requirements).

Implementing PostgreSQL RLS for the shared-table tier

Row-level security enforces isolation at the database engine — not in application code — which means a missed WHERE tenant_id = ? in any query doesn't create a cross-tenant leak. The setup has three parts: table design, policy creation, and session variable injection.

First, enable RLS on every table that holds tenant data and create the application role:

-- Enable RLS and create non-superuser application role
ALTER TABLE tool_calls ENABLE ROW LEVEL SECURITY;
ALTER TABLE tool_events ENABLE ROW LEVEL SECURITY;
ALTER TABLE tenant_config ENABLE ROW LEVEL SECURITY;

-- Index on tenant_id — critical for performance; RLS filter runs on every query
CREATE INDEX CONCURRENTLY idx_tool_calls_tenant ON tool_calls (tenant_id);
CREATE INDEX CONCURRENTLY idx_tool_events_tenant ON tool_events (tenant_id);

-- Non-superuser role for application queries — superusers bypass RLS entirely
CREATE ROLE app_user NOINHERIT;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO app_user;

Next, create policies for all four DML operations:

-- RLS policies — all four DML operations
CREATE POLICY tenant_isolation_select ON tool_calls
  FOR SELECT USING (
    tenant_id = current_setting('app.current_tenant_id', true)::UUID
  );

CREATE POLICY tenant_isolation_insert ON tool_calls
  FOR INSERT WITH CHECK (
    tenant_id = current_setting('app.current_tenant_id', true)::UUID
  );

CREATE POLICY tenant_isolation_update ON tool_calls
  FOR UPDATE USING (
    tenant_id = current_setting('app.current_tenant_id', true)::UUID
  ) WITH CHECK (
    tenant_id = current_setting('app.current_tenant_id', true)::UUID
  );

CREATE POLICY tenant_isolation_delete ON tool_calls
  FOR DELETE USING (
    tenant_id = current_setting('app.current_tenant_id', true)::UUID
  );

The true second argument to current_setting is critical: it returns NULL instead of throwing an error when the variable is unset. Because NULL = NULL evaluates to NULL (not TRUE), an unset session variable causes every RLS policy to evaluate to NULL — which is fail-closed. Zero rows returned, no cross-tenant data exposed.

Finally, inject the session variable before every query using AsyncLocalStorage to carry the tenant context:

// Every query runs inside a transaction that sets the tenant context
async function queryWithTenantContext<T>(
  pool: pg.Pool,
  tenantId: string,
  fn: (client: pg.PoolClient) => Promise<T>
): Promise<T> {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    // transaction-local set_config: cleared automatically when transaction ends
    await client.query(
      "SELECT set_config('app.current_tenant_id', $1, true)",
      [tenantId]
    );
    const result = await fn(client);
    await client.query('COMMIT');
    return result;
  } catch (err) {
    await client.query('ROLLBACK');
    throw err;
  } finally {
    client.release();
  }
}

The true flag in set_config makes the variable transaction-local — it clears automatically when the transaction commits or rolls back. This is the correct pattern with PgBouncer in transaction-pooling mode: session-local variables would leak to the next transaction on the same physical connection.

Test the policies before deploying to production:

-- This query should return zero rows — if it returns any rows, the policy is broken
SET ROLE app_user;
SET LOCAL "app.current_tenant_id" = 'other-tenant-uuid';
SELECT * FROM tool_calls WHERE tenant_id = 'target-tenant-uuid';

Layer 2: Usage metering — quota enforcement in the hot path

Usage metering must happen in the hot path — every tool call — without adding a synchronous database round-trip. The implementation has two parts: a Redis sliding-window counter for real-time quota enforcement, and an asynchronous billing event queue for the Stripe reporting pipeline.

Intercepting at tool registration, not inside handlers

The metering layer wraps tool handlers at registration time, not inside each handler. This keeps the metering code in one place and prevents a missed import from silently bypassing quota checks for specific tools:

// withMetering — wraps any tool handler at registration time
export function withMetering(
  toolName: string,
  handler: ToolHandler,
): ToolHandler {
  return async (args) => {
    const tenant = getTenantFromSession();  // reads tenantId from AsyncLocalStorage
    if (!tenant) {
      return { isError: true, content: [{ type: 'text', text: 'Unauthenticated' }] };
    }

    const metering = getMeteringClient();
    const allowed = await metering.checkAndIncrement(tenant.id, toolName);

    if (!allowed) {
      return {
        isError: true,
        content: [{ type: 'text', text: `quota_exceeded: ${toolName} limit reached for plan ${tenant.plan}` }],
      };
    }

    // Billing event: enqueue async — never await in the hot path
    metering.enqueueUsageEvent({ tenantId: tenant.id, tool: toolName, timestamp: Date.now() });

    return handler(args);
  };
}

// Registration: wrap at the point of registration
server.tool('search_products', searchProductsSchema, withMetering('search_products', searchProductsHandler));
server.tool('generate_report', generateReportSchema, withMetering('generate_report', generateReportHandler));

Redis sliding-window counter

The quota check uses a Redis Lua script for atomic check-and-increment — a sliding window rather than a fixed-window counter, because fixed windows can be exhausted in the first few seconds of a new window, resetting to full quota for an aggressive agent:

-- Redis Lua script: atomic sliding-window check-and-increment
local key = KEYS[1]        -- e.g. "quota:tenant_abc:search_products:1h"
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])  -- 3600000ms = 1 hour
local limit = tonumber(ARGV[3])
local event_id = ARGV[4]

-- Remove events outside the sliding window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- Count remaining events in window
local count = redis.call('ZCARD', key)

if count >= limit then
  return 0  -- quota exceeded
end

-- Add this event and set TTL
redis.call('ZADD', key, now, event_id)
redis.call('PEXPIRE', key, window)
return 1  -- allowed

Per-tool quota weights handle expensive vs cheap tools: generate_report costs 20 units while search_products costs 1 unit. The weight is applied by incrementing by the tool's cost rather than 1 in the Lua script's ZADD call. Plan limits tier by plan: free=100/hr, starter=1000/hr, pro=10000/hr, enterprise=no limit (skip Redis entirely for enterprise tenants to avoid the round-trip).

Fail-open vs fail-closed on Redis outage

When Redis is unavailable, the metering client must choose a policy. The correct choice depends on the tenant's plan:

async checkAndIncrement(tenantId: string, tool: string): Promise<boolean> {
  try {
    return await this.runLuaScript(tenantId, tool);
  } catch (err) {
    // Redis is down — log and apply policy by plan
    const tenant = await this.getTenantPlan(tenantId);

    if (tenant.plan === 'free') {
      // Fail-closed: free tenants don't pay, so block them when metering is broken
      return false;
    }
    // Fail-open for paid plans: paid tenants have skin in the game
    // Log the event so it can be reconciled when Redis comes back
    this.logMeteringFailure({ tenantId, tool, reason: 'redis_unavailable' });
    return true;
  }
}

The /health endpoint must check Redis connectivity explicitly — a Redis failure that makes all free-tier tenants over-quota (fail-closed) is a billing correctness failure even though no paid tenant is affected. AliveMCP polling your /health endpoint catches the Redis outage within 60 seconds, before the billing gap accumulates.

Layer 3: Billing integration — Stripe metered usage

Stripe Billing models per-tool-call revenue as a subscription to a metered price. The price defines the unit cost; your background reporter calls createUsageRecord with the aggregated count; Stripe invoices the customer at period end. The key constraint: Stripe's API is too slow for the tool-call hot path. The architecture separates the billing pipeline into three stages.

Stage 1: Stripe product and price setup

// run once to create your Stripe pricing structure
const product = await stripe.products.create({
  name: 'MCP Pro',
  description: 'Managed MCP server with per-tool metering',
});

// Flat monthly base price + metered overage
const basePrice = await stripe.prices.create({
  product: product.id,
  currency: 'usd',
  unit_amount: 4900,  // $49/month base
  recurring: { interval: 'month' },
});

const meteredPrice = await stripe.prices.create({
  product: product.id,
  currency: 'usd',
  billing_scheme: 'per_unit',
  unit_amount_decimal: '0.1',  // $0.001 per tool call (1/10 cent)
  recurring: {
    interval: 'month',
    usage_type: 'metered',
    aggregate_usage: 'sum',
  },
});

Stage 2: Background usage reporter

The background reporter runs every 5 minutes. It reads unprocessed events from the usage_events table, aggregates by tenant and subscription item, calls Stripe's createUsageRecord, and marks the events as reported:

// Background reporter — runs every 5 minutes
async function reportUsageToStripe(): Promise<void> {
  const unreported = await db.query(`
    SELECT tenant_id, subscription_item_id, COUNT(*) as tool_calls
    FROM usage_events
    WHERE reported_at IS NULL
      AND created_at < NOW() - INTERVAL '1 minute'
    GROUP BY tenant_id, subscription_item_id
  `);

  for (const row of unreported.rows) {
    await stripe.subscriptionItems.createUsageRecord(
      row.subscription_item_id,
      {
        quantity: Number(row.tool_calls),
        timestamp: Math.floor(Date.now() / 1000),
        action: 'increment',
      }
    );

    await db.query(`
      UPDATE usage_events
      SET reported_at = NOW()
      WHERE tenant_id = $1
        AND subscription_item_id = $2
        AND reported_at IS NULL
    `, [row.tenant_id, row.subscription_item_id]);
  }
}

Stage 3: Webhook handlers for subscription lifecycle

Subscription changes from Stripe must sync to your database immediately — otherwise, a customer who upgrades their plan still gets old quotas enforced until the next polling cycle:

// Stripe webhook handler
app.post('/webhooks/stripe', express.raw({ type: 'application/json' }), async (req, res) => {
  const sig = req.headers['stripe-signature']!;
  const event = stripe.webhooks.constructEvent(req.body, sig, process.env.STRIPE_WEBHOOK_SECRET!);

  switch (event.type) {
    case 'customer.subscription.updated': {
      const sub = event.data.object as Stripe.Subscription;
      const tenantId = sub.metadata.tenant_id;
      const newPlan = sub.metadata.plan;

      await db.query(
        'UPDATE tenants SET plan = $1, subscription_status = $2 WHERE id = $3',
        [newPlan, sub.status, tenantId]
      );

      // Clear Redis plan cache so new quotas take effect immediately
      await redis.del(`plan:${tenantId}`);
      break;
    }

    case 'invoice.payment_failed': {
      const invoice = event.data.object as Stripe.Invoice;
      const tenantId = (invoice.subscription_details?.metadata as any)?.tenant_id;
      await db.query(
        "UPDATE tenants SET subscription_status = 'past_due' WHERE id = $1",
        [tenantId]
      );
      break;
    }
  }

  res.json({ received: true });
});

The billing /health check exposes three signals: time since last webhook received (gap may mean Stripe can't reach your endpoint), count of unreported events older than 10 minutes (reporter stalled), and time since last successful Stripe API call (API credential rotated or expired). AliveMCP catches a broken billing pipeline the same way it catches a broken metering layer — within 60 seconds of /health returning degraded.

The glue: automated tenant onboarding

The three layers are only useful when each new tenant gets them correctly provisioned. Tenant onboarding automation runs a five-step idempotent pipeline triggered by a Stripe checkout.session.completed webhook:

Step	What happens	Idempotency mechanism
1. Tenant row	`INSERT INTO tenants ... ON CONFLICT DO UPDATE`	Upsert on tenant UUID
2. Schema/table setup	`CREATE SCHEMA IF NOT EXISTS` + run migrations	`IF NOT EXISTS` on all DDL
3. Connection pool	Initialize LRU-cached pool for this tenant	Cache check before pool creation
4. Default config	`INSERT ... ON CONFLICT DO NOTHING` for defaults	Upsert semantics
5. Canary tool call	Full MCP SDK client call to verify data path	Status only set to `active` on canary success

Full idempotency is required because Stripe retries webhooks on delivery failures. A checkout.session.completed event can arrive twice — the second delivery must produce the same state as the first without creating duplicate schemas, duplicate pools, or duplicate default configuration rows.

// provisionTenant — fully idempotent, safe to call multiple times
export async function provisionTenant(tenantId: string, plan: string): Promise<void> {
  // Step 1: Create or update tenant record
  await db.query(`
    INSERT INTO tenants (id, plan, status, created_at)
    VALUES ($1, $2, 'provisioning', NOW())
    ON CONFLICT (id) DO UPDATE SET plan = EXCLUDED.plan
  `, [tenantId, plan]);

  // Step 2: Schema provisioning (schema-per-tenant tier)
  if (plan === 'pro' || plan === 'enterprise') {
    await db.query(`CREATE SCHEMA IF NOT EXISTS "tenant_${tenantId}"`);
    await runMigrations(tenantId);
  }

  // Step 3: Initialize connection pool (cached — safe to call multiple times)
  await getOrCreateTenantPool(tenantId);

  // Step 4: Seed default configuration
  await db.query(`
    INSERT INTO tenant_config (tenant_id, key, value)
    VALUES ($1, 'alert_threshold', '95')
    ON CONFLICT (tenant_id, key) DO NOTHING
  `, [tenantId]);

  // Step 5: Canary tool call — verify data path end-to-end
  const client = new Client({ transport: new StreamableHTTPClientTransport(tenantHealthUrl(tenantId)) });
  await client.connect();
  const result = await client.callTool({ name: 'health_check', arguments: {} });
  await client.close();

  if (result.isError) {
    throw new Error(`Canary failed for tenant ${tenantId}: ${JSON.stringify(result.content)}`);
  }

  // Only mark active after the canary succeeds
  await db.query(
    "UPDATE tenants SET status = 'active' WHERE id = $1",
    [tenantId]
  );

  // Register AliveMCP monitor for this tenant
  await registerAliveMCPMonitor(tenantId);
}

The last step — registering an AliveMCP monitor per tenant — is what gives operators per-tenant visibility. AliveMCP's monitoring is granular enough to probe /health?tenant=tenant_abc independently of every other tenant. When tenant ABC's RLS context injection breaks, or their connection pool exhausts, or their billing reporter stalls, you see it within 60 seconds — before any of ABC's tool calls silently return wrong data or before an agent gets free quota on a broken metering layer.

The complete /health endpoint

All three layers converge in a single /health endpoint that AliveMCP polls every 60 seconds:

app.get('/health', async (req, res) => {
  const tenantId = req.query.tenant as string | undefined;
  const checks: Record<string, string> = {};
  let degraded = false;

  // 1. RLS canary — verify tenant context injection is working
  if (tenantId) {
    try {
      const rows = await queryWithTenantContext(pool, tenantId, async (client) => {
        const r = await client.query(
          "SELECT COUNT(*) FROM tenant_config WHERE tenant_id = $1",
          [tenantId]
        );
        return Number(r.rows[0].count);
      });
      checks.rls_canary = rows > 0 ? 'ok' : 'broken_context';
      if (rows === 0) degraded = true;
    } catch {
      checks.rls_canary = 'error';
      degraded = true;
    }
  }

  // 2. Redis connectivity — metering layer
  try {
    await redis.ping();
    checks.redis = 'ok';
  } catch {
    checks.redis = 'down';
    degraded = true;
  }

  // 3. Billing pipeline — unreported events
  const stale = await db.query(`
    SELECT COUNT(*) FROM usage_events
    WHERE reported_at IS NULL AND created_at < NOW() - INTERVAL '10 minutes'
  `);
  const unreportedCount = Number(stale.rows[0].count);
  checks.billing_reporter = unreportedCount === 0 ? 'ok' : `stalled:${unreportedCount}`;
  if (unreportedCount > 0) degraded = true;

  // 4. Database pool
  checks.pool = pool.waitingCount > 0
    ? `saturated:${pool.waitingCount}`
    : `ok:${pool.idleCount}/${pool.totalCount}`;
  if (pool.waitingCount > 0) degraded = true;

  const status = degraded ? 503 : 200;
  res.status(status).json({
    status: degraded ? 'degraded' : 'ok',
    checks,
    ts: new Date().toISOString(),
  });
});

Each check targets a distinct failure class. When AliveMCP receives a 503 with checks.redis: "down", the alert routes to the oncall channel with enough information to skip the triage step — the failure reason is in the webhook payload, not buried in a log aggregator.

Deprovisioning and tenant lifecycle

Multi-tenant systems accumulate dead tenants faster than they're cleaned up. A complete lifecycle must include deprovisioning:

export async function deprovisionTenant(tenantId: string): Promise<void> {
  // 1. Drain active connection pool
  const pool = poolCache.get(tenantId);
  if (pool) {
    await pool.end();
    poolCache.delete(tenantId);
  }

  // 2. Archive usage data before schema drop (compliance requirement)
  await archiveUsageData(tenantId);

  // 3. Drop schema (schema-per-tenant) — CASCADE removes all tables
  await db.query(`DROP SCHEMA IF EXISTS "tenant_${tenantId}" CASCADE`);

  // 4. Remove tenant record (or soft-delete with deleted_at timestamp)
  await db.query(
    "UPDATE tenants SET status = 'deprovisioned', deleted_at = NOW() WHERE id = $1",
    [tenantId]
  );

  // 5. Cancel AliveMCP monitor for this tenant
  await cancelAliveMCPMonitor(tenantId);
}

The UUID must be archived even after deprovisioning — it appears in historical billing records, audit logs, and Stripe's customer metadata. A soft-delete with deleted_at timestamp is safer than a hard DELETE because it preserves the foreign key chain without exposing the tenant's operational data.

What this architecture covers — and what it doesn't

The three-layer architecture above handles the commercial operation of MCP-as-a-service:

Tenant data stays isolated even when application code forgets a WHERE tenant_id = ? clause.
Quota enforcement runs in under 5ms per tool call with no database round-trip in the hot path.
Usage is recorded before the billing period closes, with a health check that alerts when the reporter stalls.
Each new tenant gets all three layers wired automatically when their Stripe checkout completes.

What the architecture doesn't cover: protocol-level availability. A server with perfect RLS, real-time metering, and complete billing integration can still silently disappear from the network because of a TLS certificate expiry, a crashed Node process, or an OOM-killed container. None of the three /health checks detect that scenario — they run inside the process. External monitoring from AliveMCP runs outside: it sends the full initialize handshake plus a tools/list verification to the deployed endpoint, from the same network path your tenants' agents use. When the server goes dark, AliveMCP fires the alert within 60 seconds — before any tenant's agent session fails mid-run.

The monitoring stack for a multi-tenant MCP server has four layers: AliveMCP protocol probe (is the server reachable and speaking MCP?) + /health with RLS canary (is the data isolation layer working?) + /health with Redis check (is quota enforcement working?) + /health with billing reporter check (is revenue being captured?). Each layer catches a distinct failure class that the others miss. Operating without any one of them means a class of failure goes undetected until a tenant complains, an invoice is missed, or a data breach surfaces.

Monitor every tenant's MCP endpoint

AliveMCP probes your /health?tenant=:id endpoint every 60 seconds for every active tenant. When RLS context injection breaks for one tenant, or a billing reporter stalls, or Redis goes down — you know within one probe cycle, not when the first customer files a support ticket.

Start monitoring