Guide · Deployment
MCP server feature flags
Feature flags for MCP servers solve a different problem than feature flags for web applications. A web page renders once per request — you can gate a UI feature behind a flag and only users who see the request see the feature. An MCP server exposes a tool surface that clients cache and depend on for the lifetime of a session. Changing which tools are registered, or changing a tool's schema, mid-flight requires care. Flags that control tool registration belong at session initialisation time. Flags that control tool behaviour can be evaluated per call. Understanding which is which prevents the most disruptive category of MCP flag-related bugs: a client that cached one tool list but is suddenly calling tools that no longer exist.
TL;DR
Evaluate tool-registration flags at initialize time, per session, so each session gets a consistent tool surface for its lifetime. Evaluate per-call behaviour flags inside the tool handler on each invocation. For simple deployments, use a comma-separated environment variable (ENABLED_FEATURES=export_pdf,v2_search) parsed at startup. For runtime flag changes without restart, use a Redis-backed flag store with a pub/sub invalidation channel. Per-tenant flags — where enterprise tenants have access to more tools — belong in the session context map, evaluated from the tenant's database row at initialize time. AliveMCP probes detect when a flag change silently breaks tool registration: if the server starts but tools/list returns an unexpected tool count, the probe latency profile changes.
Two categories of flags in MCP servers
The distinction that matters most for MCP servers is when the flag is evaluated:
| Flag category | When evaluated | What it controls | Example |
|---|---|---|---|
| Tool-registration flags | At initialize (once per session) | Which tools the session can call | enable_pdf_export, v2_search_tool |
| Behaviour flags | Per tool call | How a registered tool operates | use_semantic_search, verbose_output |
| Infrastructure flags | At process start | Which adapters and connections to open | use_redis_cache, enable_queue |
Infrastructure flags must be evaluated at startup because they determine what connections createDeps() opens. Changing them requires a restart. Tool-registration flags should be evaluated at initialize time — not at startup — so that different sessions (or different tenants sharing the same server) can have different tool surfaces without a restart. Behaviour flags can be evaluated on every call because they do not affect the schema that clients cache.
Simple environment-variable flags for single-tenant deployments
For servers where all sessions share the same flag state, parse flags from an environment variable at startup and evaluate them at initialize time:
// flags.ts — parse once at startup, evaluate at session time
const ENABLED_FEATURES = new Set(
(process.env.ENABLED_FEATURES ?? '')
.split(',')
.map(s => s.trim())
.filter(Boolean)
);
export function isEnabled(flag: string): boolean {
return ENABLED_FEATURES.has(flag);
}
// In createDeps() — infrastructure flags evaluated here:
export async function createDeps(): Promise<Deps> {
const config = parseConfig();
const useCache = isEnabled('redis_cache'); // infrastructure flag
const cache = useCache ? new Redis(config.REDIS_URL!) : null;
// ...
}
// registerTools.ts — tool-registration flags evaluated per session
export function registerToolsForSession(server: McpServer, flags: Set<string>) {
// Base tools registered for all sessions
registerSearchTools(server, deps);
registerReadTools(server, deps);
// Flagged tools — only registered when flag is on
if (flags.has('v2_search')) {
registerSearchV2Tools(server, deps);
}
if (flags.has('pdf_export')) {
registerPdfExportTools(server, deps);
}
}
// At initialize time, resolve the session's flag set and register tools:
session.on('initialize', async () => {
const sessionFlags = new Set(ENABLED_FEATURES); // copy; could add per-session overrides here
const sessionServer = new McpServer({ name: 'myserver', version: '1.0.0' });
registerToolsForSession(sessionServer, sessionFlags);
});
Runtime flag changes without restart: Redis-backed flags
Environment-variable flags require a restart to change. For flags that need to flip in production without a deployment, store them in Redis and subscribe to a pub/sub channel for invalidation:
// flag-store.ts — Redis-backed flags with in-memory cache and pub/sub invalidation
import Redis from 'ioredis';
const FLAG_KEY = 'feature-flags'; // Redis hash key
const FLAG_CHANNEL = 'flag-updates';
let cachedFlags: Record<string, boolean> = {};
export async function initFlagStore(redis: Redis): Promise<void> {
// Load current flags from Redis hash
const raw = await redis.hgetall(FLAG_KEY);
cachedFlags = Object.fromEntries(
Object.entries(raw).map(([k, v]) => [k, v === 'true'])
);
// Subscribe to invalidation channel on a dedicated connection
const subscriber = redis.duplicate();
await subscriber.subscribe(FLAG_CHANNEL, (message) => {
try {
const patch = JSON.parse(message) as Record<string, boolean>;
cachedFlags = { ...cachedFlags, ...patch };
console.info({ event: 'flags_updated', patch });
} catch {
console.error({ event: 'flag_update_parse_error', message });
}
});
}
export function isFlagEnabled(flag: string, defaultValue = false): boolean {
return cachedFlags[flag] ?? defaultValue;
}
// Flip a flag from any admin tool or CLI:
// redis-cli hset feature-flags pdf_export true
// redis-cli publish flag-updates '{"pdf_export": true}'
The pub/sub channel propagates the change to all server instances within milliseconds. The in-memory cache avoids a Redis round-trip on every tool call. The important constraint: changing a tool-registration flag via this mechanism does not change the tool surface of sessions that have already initialised — existing sessions keep their original tool list for their lifetime. Only new sessions pick up the new flag state. This is correct behaviour: it is safer than ejecting active sessions.
Per-tenant feature flags
Enterprise tiers often include tools that basic tiers do not. Per-tenant flags live in the tenant's database row and are resolved at initialize time into a Set<string> that drives registerToolsForSession:
// Per-tenant flag resolution at initialize
session.on('initialize', async (params) => {
const tenantId = extractTenantId(params); // from JWT, API key, or metadata
// Load tenant's enabled features from DB — one query per session, not per call
const { rows } = await deps.db.query<{ feature: string }>(
'SELECT feature FROM tenant_features WHERE tenant_id = $1 AND enabled = true',
[tenantId]
);
const tenantFlags = new Set(rows.map(r => r.feature));
// Merge global flags with tenant flags
const effectiveFlags = new Set([...globalFlags(), ...tenantFlags]);
// Register tools for this session only
const sessionServer = buildSessionServer(effectiveFlags);
tenantContexts.set(session.id, { tenantId, flags: effectiveFlags, server: sessionServer });
});
session.on('close', () => {
tenantContexts.delete(session.id);
});
Loading flags from the database on every initialize adds one query per new session. For servers with high session churn (many short sessions), cache the tenant flag set in Redis with a TTL of a few minutes — the slight staleness is acceptable because session tool-registration flags are intentionally per-session, and a tenant who just had a plan upgrade will get the new tools on their next session.
Gradual rollout with percentage-based flags
A percentage-based rollout gates a new tool for a random fraction of sessions. Use a stable hash of the session ID or tenant ID so the same session consistently sees the same flag state:
// Stable percentage rollout — same entity always gets the same bucket
import { createHash } from 'node:crypto';
function inRollout(entityId: string, flagName: string, percent: number): boolean {
const hash = createHash('sha256')
.update(`${flagName}:${entityId}`)
.digest('hex');
// Convert first 4 hex chars to a number in [0, 65535], then to [0, 100]
const bucket = parseInt(hash.slice(0, 4), 16) % 100;
return bucket < percent;
}
// Usage: gate the v2_search tool for 10% of tenants
if (inRollout(tenantId, 'v2_search', 10)) {
registerSearchV2Tools(sessionServer, deps);
}
The hash approach gives stable bucketing: increasing the percentage from 10% to 20% adds 10% of entities to the enabled bucket without flipping any of the original 10%. Entities that had the flag enabled continue to have it enabled. This is important for MCP sessions where clients build context using the current tool surface — you do not want a session to lose a tool mid-conversation because a random bucket assignment flipped.
Monitoring tool-surface changes with AliveMCP
AliveMCP's probe calls tools/list after initialize. If you change a tool-registration flag globally (e.g., enabling pdf_export for all sessions), the next probe will return a different tool count. Configure an AliveMCP alert on tools/list tool count changes to detect unintended tool-surface changes — a flag deployment that accidentally disabled a tool will fire the alert within one probe cycle. This is complementary to a schema snapshot CI gate: the CI gate catches changes before deployment, and AliveMCP catches unexpected changes in production.
Related questions
Can I change which tools are available mid-session?
The MCP spec includes a notifications/tools/list_changed notification that signals clients to re-fetch the tool list. However, many clients cache the tool list for the session lifetime and ignore the notification. Changing tool registration mid-session is unreliable in practice. The safe pattern is: new sessions get new flags; existing sessions keep their original tool surface. If you need to immediately revoke a tool from active sessions (e.g., a security response), a rolling restart is more reliable than the notification path.
How do I test flag-gated tools?
Use createTestDeps() with a flag set that explicitly enables the tool under test. Never rely on environment variables in tests for flag state — tests should be deterministic regardless of the environment. Write one test suite with the flag enabled and a separate assertion that the tool is absent when the flag is disabled. This confirms both code paths work correctly.
Should feature flags and authorization be the same mechanism?
No. Authorization (can this tenant call this tool?) and feature flags (is this tool rolled out to this tenant?) are different concerns that often get conflated. Authorization is a security boundary — failing it should return a clear error. Feature flags control rollout and experimentation — the tool should simply not appear in tools/list for sessions that do not have it. Conflating them means your rollout mechanism becomes a security primitive, which is fragile. Keep them separate: flags control registration, authorization controls access to registered tools.
Further reading
- MCP server configuration management — validating all config at startup
- MCP server multi-tenant — session context maps and per-tenant tool surfaces
- MCP server plugins — per-tenant plugin activation as tool-surface authorization
- MCP server integration testing — schema snapshot CI gate for tool changes
- MCP server versioning — v1/v2 tool coexistence during migrations
- AliveMCP — probe-based tool-surface change detection in production