Guide · Rate Limiting
MCP server quota management
Rate limits and quotas are complementary controls that operate at different time scales. A rate limit says "no more than 10 calls per second." A quota says "no more than 1,000 calls per day." Both are necessary: the rate limit prevents burst abuse, the quota prevents sustained overuse across a billing period. For MCP servers serving tiered plans or multiple tenants, quotas are the mechanism that ties usage to pricing.
TL;DR
Store a quota_usage table in SQLite (or your existing database) with columns (user_id, tool_name, period_start, call_count, cost_units). On each tool call, INSERT OR REPLACE to increment the counter and check it against the user's plan limit. Return isError: true with quota_exhausted when the limit is hit. Reset counters at the start of each billing period with a scheduled cron job.
Rate limit vs quota: what's the difference?
| Property | Rate limit | Quota |
|---|---|---|
| Time window | Short: seconds or minutes | Long: hours, days, months |
| Purpose | Burst control, server protection | Cost control, plan enforcement, fairness |
| Reset behavior | Continuous (sliding window or token refill) | Hard reset at period boundary (midnight, billing date) |
| State storage | In-memory (fast, ephemeral) | Persistent database (survives restarts) |
| Granularity | Per session, per tool | Per user, per plan tier, per tool category |
| Typical limits | 10/sec, 60/min | 1,000/day, 10,000/month |
| Error to return | rate_limited with retry hint | quota_exhausted with reset timestamp |
SQLite-backed quota tracking
For MCP servers using the factory's SQLite setup, quota tracking adds one table. The counter persists across server restarts and survives connection drops — unlike an in-memory rate limit counter.
-- migrations/004_quota_tracking.sql
CREATE TABLE IF NOT EXISTS quota_usage (
user_id TEXT NOT NULL,
tool_name TEXT NOT NULL DEFAULT '*', -- '*' = aggregate across all tools
period_start TEXT NOT NULL, -- ISO date: '2026-06-27'
call_count INTEGER NOT NULL DEFAULT 0,
cost_units REAL NOT NULL DEFAULT 0, -- for cost-weighted quotas
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (user_id, tool_name, period_start)
);
CREATE INDEX IF NOT EXISTS quota_usage_lookup
ON quota_usage(user_id, period_start);
// src/quota/quota-manager.ts
import Database from 'better-sqlite3';
interface QuotaConfig {
dailyCallLimit: number;
monthlyCostLimit?: number;
costPerTool?: Record<string, number>; // tool cost in cost_units
}
const PLAN_QUOTAS: Record<string, QuotaConfig> = {
free: { dailyCallLimit: 100 },
author: { dailyCallLimit: 2_000 },
team: { dailyCallLimit: 10_000 },
enterprise: { dailyCallLimit: Infinity },
};
export class QuotaManager {
private readonly db: Database.Database;
constructor(db: Database.Database) {
this.db = db;
}
// Returns true if the call is within quota
check(userId: string, toolName: string, userPlan: string): {
allowed: boolean;
remaining: number;
resetsAt: string;
} {
const today = new Date().toISOString().slice(0, 10); // 'YYYY-MM-DD'
const limit = PLAN_QUOTAS[userPlan]?.dailyCallLimit ?? 100;
// Get current usage (0 if no row yet)
const row = this.db.prepare(
`SELECT call_count FROM quota_usage
WHERE user_id = ? AND tool_name = '*' AND period_start = ?`
).get(userId, today) as { call_count: number } | undefined;
const current = row?.call_count ?? 0;
if (current >= limit) {
const tomorrow = new Date();
tomorrow.setUTCDate(tomorrow.getUTCDate() + 1);
tomorrow.setUTCHours(0, 0, 0, 0);
return { allowed: false, remaining: 0, resetsAt: tomorrow.toISOString() };
}
return {
allowed: true,
remaining: limit - current,
resetsAt: new Date(new Date().setUTCHours(24, 0, 0, 0)).toISOString(),
};
}
// Atomically increment usage. Call AFTER tool completes successfully.
increment(userId: string, toolName: string, costUnits = 1.0): void {
const today = new Date().toISOString().slice(0, 10);
this.db.prepare(`
INSERT INTO quota_usage (user_id, tool_name, period_start, call_count, cost_units, updated_at)
VALUES (?, ?, ?, 1, ?, datetime('now'))
ON CONFLICT(user_id, tool_name, period_start) DO UPDATE SET
call_count = call_count + 1,
cost_units = cost_units + excluded.cost_units,
updated_at = datetime('now')
`).run(userId, '*', today, costUnits);
}
// Return usage stats for a user
getStats(userId: string, days = 30): Array<{ date: string; calls: number; cost: number }> {
const since = new Date();
since.setUTCDate(since.getUTCDate() - days);
const sinceStr = since.toISOString().slice(0, 10);
return this.db.prepare(`
SELECT period_start as date, call_count as calls, cost_units as cost
FROM quota_usage
WHERE user_id = ? AND tool_name = '*' AND period_start >= ?
ORDER BY period_start ASC
`).all(userId, sinceStr) as Array<{ date: string; calls: number; cost: number }>;
}
}
Wiring quota into the MCP handler
Check the quota before executing the tool and increment after a successful call. Only charge quota for calls that actually ran — failed validation or rate-limited calls don't count against the quota.
// src/server.ts — quota middleware
import { QuotaManager } from './quota/quota-manager.js';
const quotaManager = new QuotaManager(db);
server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
const userId = (extra as any)?._meta?.userId ?? 'anonymous';
const userPlan = getUserPlan(userId); // look up from your users table
const toolName = request.params.name;
// 1. Rate limit check (fast, in-memory)
if (!rateLimiter.allow(toolName)) {
return { content: [{ type: 'text', text: JSON.stringify({ error: 'rate_limited', retryable: true, retry_after_ms: 1000 }) }], isError: true };
}
// 2. Quota check (slightly slower, hits SQLite)
const quota = quotaManager.check(userId, toolName, userPlan);
if (!quota.allowed) {
return {
content: [{
type: 'text',
text: JSON.stringify({
error: 'quota_exhausted',
message: `Daily call quota exhausted. Resets at ${quota.resetsAt}.`,
resets_at: quota.resetsAt,
retryable: false, // can't retry until reset — suggest upgrade or wait
upgrade_url: 'https://alivemcp.com/#pricing',
}),
}],
isError: true,
};
}
// 3. Execute tool
let result: MCPToolResult;
try {
result = await dispatchTool(toolName, request.params.arguments);
} catch (err) {
// Don't charge quota for server errors
return { content: [{ type: 'text', text: String(err) }], isError: true };
}
// 4. Charge quota only on success
const costUnits = TOOL_COSTS[toolName] ?? 1.0;
quotaManager.increment(userId, toolName, costUnits);
return result;
});
Cost-weighted quotas
Not all tool calls cost the same. A simple database lookup costs 1 unit; an LLM-calling tool that spends $0.01 in inference per call should cost more quota units to reflect the real cost. Cost-weighted quotas let you set a single monthly limit in "cost units" rather than a call count, and assign different weights to expensive tools.
// Cost weights by tool — tune to reflect real infrastructure cost
const TOOL_COSTS: Record<string, number> = {
search_documents: 1, // cheap — SQLite FTS query
read_file: 1, // cheap — filesystem read
run_sql_query: 2, // moderate — database round-trip
call_external_api: 5, // higher — upstream API cost
generate_with_llm: 20, // expensive — inference cost
analyze_image: 10, // expensive — vision model
};
// Monthly cost-based quota per plan (in cost units)
const PLAN_COST_LIMITS: Record<string, number> = {
free: 500,
author: 5_000,
team: 25_000,
enterprise: Infinity,
};
// Check cost-based monthly quota
checkMonthlyCostQuota(userId: string, plan: string, costUnits: number): boolean {
const monthStart = new Date().toISOString().slice(0, 7) + '-01'; // 'YYYY-MM-01'
const limit = PLAN_COST_LIMITS[plan] ?? 500;
const row = this.db.prepare(
`SELECT SUM(cost_units) as total FROM quota_usage
WHERE user_id = ? AND period_start >= ?`
).get(userId, monthStart) as { total: number | null };
const used = row?.total ?? 0;
return (used + costUnits) <= limit;
}
Exposing quota status to callers
LLM agents benefit from knowing their current quota status before they run out. Add a get_quota_status tool that callers can invoke to check their remaining budget without consuming it:
// A lightweight meta-tool that doesn't consume quota
{
name: 'get_quota_status',
description: 'Returns your current API quota usage and remaining budget for today. Does not consume quota.',
inputSchema: { type: 'object', properties: {}, required: [] },
}
// In the handler — note: no quota increment for this tool
if (request.params.name === 'get_quota_status') {
const quota = quotaManager.check(userId, '*', userPlan);
const stats = quotaManager.getStats(userId, 7); // last 7 days
return {
content: [{
type: 'text',
text: JSON.stringify({
remaining_today: quota.remaining,
resets_at: quota.resetsAt,
plan: userPlan,
last_7_days: stats,
}),
}],
};
}
Related questions
Should I charge quota before or after the tool runs?
Charge quota after a successful tool execution. Charging before execution means a server error (not the user's fault) counts against their quota, which creates a bad user experience and support burden. Charging after means a user could theoretically make simultaneous calls that each pass the quota check before any of them increments the counter (TOCTOU race). For most SQLite-backed servers with modest concurrency, the optimistic "check then increment after" is fine. For high-concurrency servers, use a database transaction that atomically checks and increments in one statement.
How do I handle monthly vs daily quotas together?
Run both checks sequentially: first check the daily limit (allows fine-grained control within a day), then check the monthly limit (caps total spend for a billing period). Return distinct error payloads for each — "quota_exhausted_daily" vs "quota_exhausted_monthly" — so the caller knows whether to wait until midnight or until next billing cycle. The monthly limit reset date should align with the user's billing cycle start date, not a fixed calendar month.
What happens when a user upgrades their plan mid-period?
On plan upgrade, update the user's plan in your database immediately. The quota check reads userPlan from the database on each call, so the next tool call after the upgrade will use the higher limit. You don't need to adjust or reset usage counters — the new limit simply applies to the remaining calls in the current period. Make sure your plan-lookup is fast (a cached query or an in-memory map refreshed periodically) because it runs on every tool call.
How should I communicate quota in tool descriptions?
Mention it for expensive tools only, not for every tool. A search tool that costs 1 quota unit out of a 1,000/day budget doesn't need a quota disclaimer in its description — the remaining 999 calls are unlikely to matter to the caller. An LLM-inference tool that costs 20 units out of 500/month should say "note: this tool consumes 20 quota units per call. Use sparingly." This guides the model to prefer cheaper tools when both options are available.
Further reading
- MCP server rate limiting — per-second and per-minute token buckets
- MCP server per-tool rate limiting — different limits per tool
- MCP server multi-tenancy — isolating tenants in shared infrastructure
- MCP server billing integration — Stripe metered billing for tool calls
- MCP server Redis — shared state for distributed quota counters