Infrastructure guide · 2026-06-02 · Production operations
MCP Server Infrastructure Operations Guide: Dependency Injection, Testing, Load Balancing, Async Work, and Scheduled Automation
There is a gap between an MCP server that works in development and one that handles real production traffic reliably. The gap spans five infrastructure concerns that most tutorials never reach: how to structure shared resources so the code is testable and horizontally scalable (dependency injection), how to verify MCP protocol semantics without mocking the transport layer (integration testing), how to route traffic across replicas without breaking session state (load balancing), how to handle work that outlasts the tool-call timeout (message queues), and how to trigger periodic automation reliably in a multi-replica deployment (scheduled tasks). These five concerns are not independent topics — they are a coherent system built on a single architectural decision made at startup. This guide covers them as a system.
TL;DR
- The
Depsobject, created once at startup, is the backbone — every resource (database pool, cache, queue, logger, config) flows into tool handlers as a typed parameter, never through module scope. createTestDeps()enables real protocol testing —InMemoryTransport.createLinkedPair()runs the full MCP handshake in-process, no port binding, no mocks.- Session stickiness vs. stateless is the load balancing trade-off — sticky routing (header-based consistent hash) preserves per-session state; stateless mode (
enableSseResponse: false) allows round-robin but removes server-push. - Queue and Worker belong at module scope via
Deps— creating them per tool call is the most common message queue mistake; it exhausts ephemeral ports under load. - Leader election (Redis SET NX EX) is required for cron in replicated deployments — without it, all replicas fire the same task simultaneously every interval.
- External uptime monitoring covers the transport layer; the
health_checktool covers the application layer — together they close the full failure surface.
The Deps Pattern — The Architectural Backbone
All five infrastructure concerns share a root cause when they go wrong: module-scope infrastructure. A database pool opened at module load, a Redis client assigned to a module-level constant, a BullMQ queue created at the top of a tool file — these produce three failure modes that interact badly in production.
First, module-scope resources cannot be replaced in tests without reaching into module internals. Second, their lifecycle is tied to the module, not to the application — they cannot be created after startup validation or shut down in a controlled order. Third, when the same module is loaded by multiple logical contexts (multi-tenant, multiple workers, a test that imports a production module), they share the same connection pool or the same queue instance, with no way to provide test-specific or tenant-specific overrides.
Dependency injection for MCP servers solves all three by creating all infrastructure once, explicitly, at startup:
interface Deps {
db: Pool;
cache: Redis;
logger: Logger;
config: AppConfig;
queue?: Queue; // absent for servers without async work
}
async function createDeps(): Promise<Deps> {
const config = loadConfig();
const db = new Pool({ connectionString: config.databaseUrl, max: 10 });
const cache = new Redis(config.redisUrl);
// Fail fast: validate connectivity before app.listen
await db.query('SELECT 1');
await cache.ping();
return {
db,
cache,
logger: createStructuredLogger(),
config,
};
}
The await db.query('SELECT 1') before app.listen is not optional. A hung Pool.connect() prevents port binding — which means the process appears to start but cannot accept connections. An external uptime monitor that probes /healthz will correctly show an outage immediately; without fail-fast validation the process might start and the outage would be invisible until the first real tool call.
Tool registration functions receive Deps as a parameter:
function registerSearchTools(server: McpServer, deps: Deps): void {
server.tool('search', SearchInputSchema, async (input) => {
const rows = await deps.db.query(
'SELECT * FROM items WHERE content ILIKE $1 LIMIT 20',
[`%${input.query}%`]
);
return { content: [{ type: 'text', text: JSON.stringify(rows.rows) }] };
});
}
No import { db } from './db' inside tool files. The pool is acquired once per tool call, not once per session — that is the correct size of the acquire/release window. A session-scoped acquire would exhaust the pool at max concurrent sessions rather than at max concurrent tool calls.
Integration Testing Against the Real Protocol
With the Deps interface in place, testing becomes a matter of providing a createTestDeps() implementation. The key is that the test factory returns the same interface with test-appropriate implementations — real SQLite in-memory database (not a mock with stubbed query methods), no-op logger, stub config with safe test values.
async function createTestDeps(): Promise<Deps> {
// Real SQLite in-memory — not a mock
const db = new Pool({ connectionString: ':memory:' });
await db.query(fs.readFileSync('schema.sql', 'utf8'));
return {
db,
cache: new FakeRedis(), // In-memory Redis drop-in
logger: createNoOpLogger(),
config: { databaseUrl: ':memory:', redisUrl: '', apiKey: 'test-key' },
};
}
MCP server integration testing adds one more ingredient: InMemoryTransport.createLinkedPair(). This creates two linked transports that pass JSON-RPC messages in-process. The full MCP handshake — initialize, initialized notification, client ready — completes in milliseconds without binding to a TCP port. The test fixture wires everything together:
async function createTestServer() {
const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
const deps = await createTestDeps();
const server = new McpServer({ name: 'test-server', version: '0.1.0' });
registerAllTools(server, deps);
await server.connect(serverTransport);
const client = new Client({ name: 'test-client', version: '0.1.0' }, {});
await client.connect(clientTransport);
// initialize handshake is complete; client is ready
return { client, deps };
}
test('search tool returns results', async () => {
const { client, deps } = await createTestServer();
await deps.db.query("INSERT INTO items (content) VALUES ('hello world')");
const result = await client.callTool({ name: 'search', arguments: { query: 'hello' } });
const rows = JSON.parse(result.content[0].text);
expect(rows).toHaveLength(1);
});
This tests the full MCP protocol path: tool call serialized as JSON-RPC, deserialized by the server, handler executed against real SQLite, result serialized as JSON-RPC, deserialized by the client. Nothing is mocked. If the tool's Zod schema rejects valid input, the test catches it. If the handler throws, the test catches it. The transport is not mocked; the database is not mocked; the protocol is not bypassed.
The schema snapshot CI gate prevents silent regressions:
const toolList = await client.listTools();
const sorted = toolList.tools.sort((a, b) => a.name.localeCompare(b.name));
const hash = crypto.createHash('sha256')
.update(JSON.stringify(sorted))
.digest('hex');
const baseline = fs.readFileSync('test/schema-baseline.sha256', 'utf8').trim();
if (hash !== baseline) {
throw new Error(`Tool schema changed. Run: echo '${hash}' > test/schema-baseline.sha256`);
}
Any tool rename, addition, removal, or description change fails CI until the baseline is explicitly updated. The gate is cheap to maintain and catches the class of breakage that HTTP monitoring and unit tests both miss — an initialize succeeds and the server returns 200, but the tool surface that agents depend on has changed shape.
Load Balancing — Sticky vs. Stateless
MCP server load balancing presents a choice that the Deps pattern directly influences. The fundamental constraint: StreamableHTTP uses a session ID (mcp-session-id header) to correlate initialize, tool-call, and SSE-stream requests. If session state lives in process memory, all three must reach the same backend.
Sticky routing (session-aware horizontal scaling): Caddy's header-based consistent hash is the simplest correct implementation.
# Caddyfile
reverse_proxy localhost:3001 localhost:3002 localhost:3003 {
lb_policy header mcp-session-id
flush_interval -1 # required for SSE
}
lb_policy header mcp-session-id routes all requests sharing a session ID to the same backend. initialize POSTs without a session ID distribute round-robin. Flush interval -1 disables Caddy's response buffering — without it, SSE frames accumulate in Caddy's buffer and clients see long silences followed by bursts.
Stateless mode (true round-robin): Setting enableSseResponse: false on the StreamableHTTP transport tells the server not to establish long-lived SSE connections. Each POST is independent; the server cannot push notifications to the client. Round-robin load balancing works without sticky routing. The trade-off is real: no notifications/tools/list_changed, no streaming output mid-tool-call. For read-only tool servers and servers whose tools complete synchronously, stateless mode is the simpler path.
With Deps in place, moving from sticky to stateless is easier. The database pool handles concurrent access correctly; no per-session state in process memory means the session ID matters only for SSE routing, not for correctness. The architecture is already stateless from the database's perspective; stateless transport mode formalises it.
Health endpoints matter at the load balancer boundary. /healthz should return 503 until the server has completed createDeps() (including connectivity validation) and is ready to accept connections, and should return 503 again once httpServer.close() has been called during shutdown. This keeps the load balancer from routing new sessions to backends that are starting up or draining.
Long-Running Work — Message Queue Integration
Tool calls have an implicit timeout determined by the client's patience and any proxy-layer timeouts in the path. For work that outlasts that window, MCP server message queue integration provides a fire-and-return pattern:
- Under 30 seconds: block with
AbortSignal, return result directly. - 30 seconds to a few minutes: long-poll or streaming response (if transport supports it).
- Minutes or longer: enqueue, return a
job_id, let the agent poll with aget_statustool.
The Deps pattern makes the queue integration clean. The Queue and Worker live in createDeps():
async function createDeps(): Promise<Deps> {
// ... db, cache, logger, config ...
const queue = new Queue('exports', {
connection: new Redis(config.redisUrl, { maxRetriesPerRequest: null })
});
const worker = new Worker('exports', processExport, {
connection: new Redis(config.redisUrl, { maxRetriesPerRequest: null })
});
worker.on('failed', (job, err) => {
deps.logger.error({ jobId: job?.id, err }, 'export job failed');
});
return { db, cache, logger, config, queue, worker };
}
The anti-pattern is new Queue(...) inside a tool handler. Each call creates a Redis connection. Under moderate load — ten concurrent tool calls — that is ten connections per second. Under heavier load, ephemeral port exhaustion follows. BullMQ's internal connection management relies on long-lived connections; creating them per-call bypasses everything BullMQ does to manage reconnection and command queuing.
The tool surface:
server.tool('start_export', StartExportSchema, async (input) => {
const job = await deps.queue.add('export', input, {
attempts: 3,
backoff: { type: 'exponential', delay: 5000 }
});
return { content: [{ type: 'text', text: JSON.stringify({ job_id: job.id }) }] };
});
server.tool('get_export_status', GetStatusSchema, async (input) => {
const job = await Job.fromId(deps.queue, input.job_id);
if (!job) {
return { isError: true, content: [{ type: 'text', text: 'job not found' }] };
}
const state = await job.getState();
return {
content: [{
type: 'text',
text: JSON.stringify({
state, // 'waiting' | 'active' | 'completed' | 'failed'
result: state === 'completed' ? await job.returnvalue : undefined,
error: state === 'failed' ? job.failedReason : undefined,
})
}]
};
});
LLM clients poll naturally. An LLM agent that calls start_export and receives a job_id will call get_export_status in a reasoning loop until the state is completed. The agent does not need a push notification; it polls. The queue pattern is well-matched to how LLMs actually drive tool use.
For simpler deployments without Redis, a SQLite-backed queue using better-sqlite3 and setInterval polling handles hundreds of jobs per second with no external infrastructure dependency. The Deps interface stays the same; the implementation behind deps.queue changes.
Scheduled Tasks and Leader Election
MCP server scheduled tasks require one decision that catches many developers by surprise the first time they deploy more than one replica: cron fires on every process that has the cron installed. With three replicas and a cron that refreshes a cache every five minutes, the cache refresh runs three times simultaneously every five minutes. If the refresh involves a database write, three concurrent writes compete. If it involves an external API call, you hit the rate limit three times as fast.
The startScheduler(deps) pattern, receiving the same Deps object created at startup:
const taskRegistry = new Map<string, TaskRecord>();
function startScheduler(deps: Deps): void {
cron.schedule('*/5 * * * *', async () => {
await runWithLeaderElection('cache-refresh', deps, async () => {
await refreshCache(deps);
});
});
}
async function runWithLeaderElection(
taskName: string,
deps: Deps,
fn: () => Promise<void>
): Promise<void> {
const instanceId = process.env.INSTANCE_ID ?? os.hostname();
const lockKey = `scheduler:leader:${taskName}`;
const intervalSeconds = 300; // 5 minutes
const bufferSeconds = 10;
const acquired = await deps.cache.set(
lockKey, instanceId, 'NX', 'EX', intervalSeconds - bufferSeconds
);
if (!acquired) return; // Another replica holds the lock
const record: TaskRecord = { lastRunAt: new Date(), lastRunStatus: 'running' };
taskRegistry.set(taskName, record);
try {
await fn();
record.lastRunStatus = 'ok';
} catch (err) {
record.lastRunStatus = 'failed';
record.lastRunError = String(err);
deps.logger.error({ taskName, err }, 'scheduled task failed');
}
}
The TTL on the lock is interval - buffer: long enough to prevent a second leader from emerging mid-task, short enough to expire before the next cron fire even if the leader crashes. The NX flag makes the SET atomic — no race condition between checking and setting. This is the Redlock single-instance pattern applied to cron scheduling.
Cron + queue composition: for tasks that need both reliable scheduling and reliable execution, the cron leader enqueues a BullMQ job rather than running the work inline. The cron provides the trigger; BullMQ provides retries, backoff, deduplication, and a dead-letter queue:
cron.schedule('0 2 * * *', async () => {
await runWithLeaderElection('nightly-report', deps, async () => {
await deps.queue.add('nightly-report', { runDate: new Date().toISOString() }, {
jobId: `nightly-report:${format(new Date(), 'yyyy-MM-dd')}`, // deduplication key
attempts: 3,
backoff: { type: 'exponential', delay: 30000 }
});
});
});
The jobId is the deduplication key: if the cron fires twice (restart during the fire window, NTP drift, etc.), BullMQ ignores the second add. The work happens exactly once.
Exposing tasks as MCP tools is useful for testing and agent-driven workflows. trigger_task with a task_name enum parameter calls the same function as the cron — no duplicate implementation. An LLM agent can trigger the nightly report on demand without waiting for 2 AM.
The Health Check Tool — Closing the Monitoring Gap
External uptime monitoring — including AliveMCP's protocol-aware probes — confirms that the HTTP transport is up, that initialize succeeds, and that tools/list returns correctly. This is the transport layer. It cannot confirm what happens inside the application layer: whether the database pool has available connections, whether Redis is reachable from within the process, whether the BullMQ queue is draining or backing up, whether the last scheduled task ran successfully, or whether it ran at all.
The health_check tool bridges this gap:
server.tool('health_check', {}, async () => {
const [dbResult, cacheResult, queueResult] = await Promise.allSettled([
deps.db.query('SELECT 1').then(() => ({ ok: true })),
deps.cache.ping().then(() => ({ ok: true })),
deps.queue?.getWaitingCount().then(count => ({ ok: true, queueDepth: count })),
]);
const taskHealth = Array.from(taskRegistry.entries()).map(([name, record]) => {
const staleness = Date.now() - record.lastRunAt.getTime();
const expectedInterval = 300_000; // 5 minutes in ms
return {
name,
lastRunAt: record.lastRunAt.toISOString(),
lastRunStatus: record.lastRunStatus,
stale: staleness > expectedInterval * 2,
};
});
const anyFailed =
dbResult.status === 'rejected' ||
cacheResult.status === 'rejected' ||
taskHealth.some(t => t.lastRunStatus === 'failed' || t.stale);
return {
isError: anyFailed,
content: [{
type: 'text',
text: JSON.stringify({
db: dbResult.status === 'fulfilled' ? 'ok' : 'failed',
cache: cacheResult.status === 'fulfilled' ? 'ok' : 'failed',
queue: queueResult.status === 'fulfilled' ? queueResult.value : 'failed',
tasks: taskHealth,
})
}]
};
});
A synthetic monitor — a script or a second AliveMCP endpoint that calls health_check on a schedule — surfaces application-layer failures that transport-layer probes cannot reach. The two monitoring layers are complementary, not redundant: external probe guarantees the transport; synthetic health check guarantees the application internals. Together they cover the gap.
One subtle point about isError: true on the health check tool: a failing health check is an application-level error, not a protocol-level error. Returning isError: true with structured JSON in the content is the correct signal — the MCP session itself is alive and functioning (the protocol layer is healthy), but the application reports a problem. A monitoring client that inspects isError on the health_check response can page on application-layer failures independently of transport-layer failures.
The Startup and Shutdown Sequence
All five concerns share a lifecycle. The startup sequence:
async function main(): Promise<void> {
// 1. Create all infrastructure — validates connectivity
const deps = await createDeps();
// 2. Start the scheduler — leader election is active from here
startScheduler(deps);
// 3. Create and register the MCP server
const server = new McpServer({ name: 'my-mcp-server', version: '1.0.0' });
registerAllTools(server, deps);
// 4. Create the HTTP app with middleware
const app = createApp(server, deps);
// 5. Begin accepting connections
await app.listen(PORT);
deps.logger.info({ port: PORT }, 'MCP server ready');
}
main().catch(err => {
console.error(err);
process.exit(1);
});
The shutdown sequence reverses the startup order, with the critical constraint that no new work should start after process.on('SIGTERM') fires:
async function shutdown(
deps: Deps,
app: FastifyInstance,
cronTasks: cron.ScheduledTask[]
): Promise<void> {
// 1. Stop all cron tasks — no new scheduled fires
cronTasks.forEach(t => t.stop());
// 2. Stop accepting new HTTP connections — existing connections drain
await app.close();
// 3. Close the BullMQ worker — finish current job, then stop
await deps.worker?.close();
// 4. Close the queue — no new jobs can be added
await deps.queue?.close();
// 5. Close the cache and database — all in-flight queries must be done
await deps.cache.quit();
await deps.db.end();
deps.logger.info('shutdown complete');
process.exit(0);
}
process.on('SIGTERM', () => shutdown(deps, app, scheduledTasks));
The shared Deps object means shutdown is one function that knows about every resource. There are no scattered module-level db.end() calls in different files, no leaked connections because a cleanup handler was never registered. Every resource that was created in createDeps() is closed in shutdown().
The P99 tool-call duration determines the shutdown timeout. If most tool calls complete in under five seconds but the 99th percentile is 25 seconds (a database query under write pressure), the SIGTERM handler should wait up to 30 seconds before forcing exit. This timeout is the sum of: the time for app.close() to stop accepting connections, the time for in-flight tool calls to complete, and the time for queue jobs to finish or checkpoint. The production checklist — see MCP server production checklist — covers calibrating this timeout from your own P99 data.
How the Five Concerns Interact
The reason to understand all five concerns as a system rather than as independent topics is that the decisions in each constrain the others:
- DI enables testing. Without the
Depsinterface,createTestDeps()cannot exist. WithoutcreateTestDeps(), integration testing requires real infrastructure — a real Postgres instance, a real Redis, ports, network — which slows tests and makes them fragile. The testability of an MCP server is a direct function of how cleanly its infrastructure is injected. - DI enables stateless load balancing. If the session context lives in module scope, every replica has different state for the same session. With
Deps, session context lives in the database, shared by all replicas. Stateless mode becomes viable because there is no replica-local session state to preserve. - DI enables clean queue lifecycle. A
QueueandWorkerinDepsare created once, shut down in order, and visible to the shutdown sequence. Per-call queue creation bypasses all of this. The lifecycle discipline of the queue depends on the lifecycle discipline ofcreateDeps(). - DI enables scheduler access to all infrastructure.
startScheduler(deps)can reach the database, the cache for leader election, the queue for job enqueueing, and the logger for structured task output. WithoutDeps, the scheduler would need its own module-scope infrastructure, separate from the rest of the application. - Load balancing and scheduled tasks share a leader election constraint. In a three-replica deployment, cron fires three times simultaneously without leader election. In a sticky-routing deployment, session requests always reach the same replica — but the cron fires on all three regardless of session routing. Leader election is a replica-count concern, not a routing concern.
- Queues and scheduled tasks compose naturally. The cron leader enqueues a BullMQ job; BullMQ provides retries, backoff, and deduplication. The schedule triggers the work; the queue provides the execution guarantee. Neither alone is as reliable as both together for long-running periodic work.
The common thread: module-scope infrastructure is the source of friction in all five concerns. The Deps pattern is the single architectural decision that makes all five easier to implement correctly.
The Monitoring Layer Across All Five
Each of the five concerns produces state that external monitoring cannot see and that the health_check tool should surface:
- Database pool (from DI): pool exhaustion (all
maxconnections in use) will queue acquire calls. A tool call that should take 50ms stalls at pool acquisition. External monitoring sees a slow response;health_checksurfacesdb: 'pool_exhausted'before the timeout fires. - Schema drift (from testing): the schema snapshot gate catches tool renames in CI. Post-deploy, the probe script re-runs the snapshot check. AliveMCP's
tools/listprobe confirms the endpoint returns a list; the snapshot check confirms the list has the expected shape. - Backend health (from load balancing): AliveMCP at the load balancer level confirms the stack. Per-backend monitors confirm that no individual backend is silently unhealthy while the LB considers it healthy.
- Queue depth (from message queues): a queue depth growing unboundedly signals that workers cannot keep up with producers.
getWaitingCount()inhealth_checksurfaces this. External monitoring seeshealth_checkreturningisError: truebefore users notice that export jobs are taking hours instead of minutes. - Task staleness (from scheduled tasks): a cron that stopped firing (process restart that didn't restart the scheduler, leader election lock that was never released after a crash) surfaces as a stale
lastRunAt.health_checkreturnsisError: truewhenstaleness_ms > 2 × interval_ms.
The full picture: AliveMCP probes the transport layer on a configured cadence from external locations; the health_check tool exposes the application layer on demand. A synthetic monitor that calls health_check on the same cadence as AliveMCP's external probe gives continuous, complete visibility into both layers. For teams that already have AliveMCP watching their MCP server's transport, adding a second monitor that calls health_check closes the application-layer gap at zero additional infrastructure cost.
Putting It Together
The five infrastructure concerns are steps in a progression, not independent topics:
- Define the
Depsinterface andcreateDeps()factory with fail-fast startup validation. This is the foundation. - Write
createTestDeps()and theInMemoryTransporttest fixture before writing the first tool. Tests that verify protocol semantics are cheap to write now and expensive to retrofit later. - Add health endpoints (
/healthz) and configure the load balancer before adding the second replica. Sticky routing is one Caddyfile directive; forgetting it is a session-corruption bug that is hard to reproduce in staging. - Add the
queueandworkertoDepswhen the first tool call exceeds 30 seconds in production, not before. Over-engineering the queue path adds complexity; deferring it adds it exactly when the cost of adding it is lowest (you already haveDeps). - Add the scheduler when the first periodic task needs to run. Leader election from the start, even if you only have one replica — it costs one Redis SET per cron fire and prevents the bug that only appears when you add the second replica at 11 PM before a holiday.
At each step, the shared Deps object absorbs the new resource and makes it available to the rest of the application with no architectural refactoring. The health_check tool grows to cover each new concern as it is added. The shutdown sequence gains one more close call in the correct order.
For deeper coverage of each concern: dependency injection patterns, integration testing with InMemoryTransport, load balancing and session affinity, message queue integration with BullMQ, and scheduled tasks and leader election each cover their concern in full. The MCP server production checklist covers the full hardening sequence from development to production. AliveMCP's protocol-aware probes watch the transport layer so you can focus on the application layer — see pricing for monitor plans.