Guide · Operations
MCP server scheduled tasks
Most non-trivial MCP servers need background work that runs on a schedule: syncing tool data from upstream registries, warming caches before peak traffic, purging stale session records, generating periodic health snapshots. The question is where that scheduled work lives. Running cron jobs separately (a dedicated container, a Kubernetes CronJob, a cloud scheduler) is operationally cleaner but adds deployment complexity. Running cron alongside the MCP server in the same process is simpler — one deployment artifact — but requires care to avoid interfering with the responsiveness of the MCP request handler.
TL;DR
Use node-cron to schedule tasks in the same process as your MCP server. Start the scheduler after createDeps() completes and before app.listen(). For CPU-intensive tasks, offload to a worker thread or a message queue. In a load-balanced cluster, use a Redis-based leader lock so only one replica executes each scheduled task. Make tasks idempotent — safe to run twice if the leader lock is not acquired atomically. Expose tasks as MCP tools with a trigger_task pattern so they can be run on-demand. Add a health_check tool that reports the last successful run time for each task, and configure AliveMCP or a synthetic monitor to call it.
In-process scheduling with node-cron
node-cron runs cron expressions in the Node.js event loop. Install it and start tasks during server initialization, after createDeps() has completed:
// scheduler.ts
import cron from 'node-cron';
import type { Deps } from './deps.js';
interface TaskRecord {
name: string;
lastRunAt: Date | null;
lastRunStatus: 'ok' | 'error' | null;
lastRunError: string | null;
}
const taskRecords = new Map<string, TaskRecord>();
function registerTask(name: string, schedule: string, fn: (deps: Deps) => Promise<void>, deps: Deps) {
taskRecords.set(name, { name, lastRunAt: null, lastRunStatus: null, lastRunError: null });
cron.schedule(schedule, async () => {
const start = Date.now();
try {
await fn(deps);
const rec = taskRecords.get(name)!;
rec.lastRunAt = new Date();
rec.lastRunStatus = 'ok';
rec.lastRunError = null;
deps.logger.info('scheduled_task_ok', { task: name, duration_ms: Date.now() - start });
} catch (err: any) {
const rec = taskRecords.get(name)!;
rec.lastRunAt = new Date();
rec.lastRunStatus = 'error';
rec.lastRunError = err.message;
deps.logger.error('scheduled_task_error', { task: name, error: err.message, duration_ms: Date.now() - start });
}
});
}
export function startScheduler(deps: Deps): Map<string, TaskRecord> {
registerTask('registry_sync', '*/5 * * * *', syncRegistry, deps); // every 5 min
registerTask('cache_warm', '0 * * * *', warmCache, deps); // every hour
registerTask('session_cleanup', '0 2 * * *', cleanupSessions, deps); // daily at 2am
registerTask('health_snapshot', '*/15 * * * *', snapshotHealth, deps); // every 15 min
return taskRecords;
}
export { taskRecords };
Each call to registerTask also writes to taskRecords, which tracks the last run timestamp and status. The server entry point calls startScheduler(deps) after createDeps() returns, before app.listen():
// server.ts
async function main() {
const deps = await createDeps();
startScheduler(deps); // start tasks before accepting traffic
const app = createApp(deps);
app.listen(3000);
}
Tasks run concurrently with tool calls via the event loop. I/O-bound tasks (database queries, HTTP calls) do not block the event loop and are safe here. CPU-intensive tasks (compression, hashing large datasets) should be offloaded to a worker thread using worker_threads or queued via BullMQ so they don't delay MCP request processing.
Leader election across replicas
In a load-balanced cluster with multiple MCP server replicas, every replica runs the same node-cron schedule. Without coordination, every replica runs registry_sync every five minutes — three replicas hitting the upstream API three times simultaneously on the same schedule. For some tasks this is harmless (idempotent reads); for others (write operations, expensive API calls with rate limits) it causes duplicate work or quota exhaustion.
Redis-based leader election solves this: the first replica to acquire a lock at schedule time executes the task, the others skip it:
// leader-lock.ts
import { Redis } from 'ioredis';
import { hostname } from 'node:os';
const INSTANCE_ID = `${hostname()}-${process.pid}`;
export async function withLeaderLock(
redis: Redis,
lockKey: string,
ttlSeconds: number,
fn: () => Promise<void>
): Promise<boolean> {
// SET lock NX EX ttl — atomic, only one replica succeeds
const acquired = await redis.set(
`leader-lock:${lockKey}`,
INSTANCE_ID,
'EX', ttlSeconds,
'NX'
);
if (!acquired) return false; // another replica holds the lock
try {
await fn();
return true;
} finally {
// Only release if we still hold the lock (guard against TTL expiry mid-task)
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await redis.eval(script, 1, `leader-lock:${lockKey}`, INSTANCE_ID);
}
}
// Usage in scheduler
registerTask('registry_sync', '*/5 * * * *', async (deps) => {
const executed = await withLeaderLock(deps.cache, 'registry_sync', 240, async () => {
await syncRegistry(deps);
});
if (!executed) deps.logger.info('scheduled_task_skipped', { task: 'registry_sync', reason: 'not_leader' });
}, deps);
The TTL is set to 240 seconds (4 minutes) — slightly shorter than the 5-minute cron interval. This means the lock expires before the next scheduled run even if the task crashes and the finally block does not execute. The Lua script on release guards against a race where the lock's TTL expires and another replica acquires it before this instance's task finishes.
Task idempotency requirements
Leader election reduces duplicate runs; it does not eliminate them. Distributed locking has edge cases: clock skew, network partitions, and Redis replication lag can all cause two replicas to believe they hold the lock simultaneously. Every scheduled task must be idempotent — safe to run twice with the same outcome:
- Registry sync: upsert rows by primary key, not insert. A duplicate sync updates the same rows to the same values — no side effects.
- Cache warm: overwriting an existing cache key is idempotent. Use
SET key value EX ttl, notSET key value NX(which fails on duplicate, silently leaving a stale value). - Session cleanup:
DELETE FROM sessions WHERE expires_at < NOW()is idempotent — running it twice deletes the same rows (or finds nothing on the second run). - External API writes: include an idempotency key derived from the task type and scheduled time.
POST /reportswithIdempotency-Key: registry_sync_2026-06-02T14:00— the upstream deduplicates and returns the same response on retry.
Exposing tasks as triggerable tools
Scheduled tasks that are difficult to test by waiting for the schedule can be exposed as MCP tools that trigger them on-demand. This also lets agent workflows invoke tasks manually:
// tools/admin.ts — trigger tasks manually via MCP
import { z } from 'zod';
import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import type { Deps } from '../deps.js';
import { syncRegistry, warmCache } from '../tasks/index.js';
export function registerAdminTools(server: McpServer, deps: Deps) {
server.tool(
'trigger_task',
{
task_name: z.enum(['registry_sync', 'cache_warm', 'session_cleanup']),
reason: z.string().optional(),
},
async ({ task_name, reason }) => {
const start = Date.now();
try {
const task = { registry_sync: syncRegistry, cache_warm: warmCache, session_cleanup: cleanupSessions }[task_name];
await task(deps);
deps.logger.info('manual_task_trigger', { task: task_name, reason, duration_ms: Date.now() - start });
return { content: [{ type: 'text', text: JSON.stringify({ ok: true, task: task_name, duration_ms: Date.now() - start }) }] };
} catch (err: any) {
return { content: [{ type: 'text', text: JSON.stringify({ ok: false, task: task_name, error: err.message }) }], isError: true };
}
}
);
}
The task implementation (syncRegistry, etc.) is the same function called by both the cron schedule and the tool. No duplication — the scheduling and the triggering are separate concerns layered on top of the same task function.
Monitoring scheduled task health
AliveMCP probes the MCP protocol (initialize + tools/list) to confirm your server is up. It cannot see whether scheduled tasks are running correctly — a task that fails silently leaves taskRecords updated with lastRunStatus: 'error' while the MCP server continues to respond normally to the protocol probe. Add a health_check tool that surfaces task health:
server.tool('health_check', {}, async () => {
const taskHealth = Array.from(taskRecords.values()).map(rec => {
const staleness = rec.lastRunAt
? Date.now() - rec.lastRunAt.getTime()
: Infinity;
// Flag as unhealthy if last run was an error or if stale beyond 2× interval
const ok = rec.lastRunStatus === 'ok' && staleness < 600_000; // 10 min
return {
name: rec.name,
ok,
last_run_at: rec.lastRunAt?.toISOString() ?? null,
last_run_status: rec.lastRunStatus,
last_run_error: rec.lastRunError,
staleness_ms: Number.isFinite(staleness) ? staleness : null,
};
});
const healthy = taskHealth.every(t => t.ok);
return {
content: [{ type: 'text', text: JSON.stringify({ healthy, tasks: taskHealth }) }],
isError: !healthy,
};
});
Configure a synthetic monitor or AliveMCP's custom probe to call health_check every 5–15 minutes. A task that fails consistently — upstream API down, database constraint violation — shows up as isError: true in the health check tool response, which the monitoring tool surfaces as an alert. This is the same pattern as the message queue health_check tool for consumer monitoring.
Graceful shutdown and in-flight tasks
When SIGTERM arrives, stop the cron scheduler before draining MCP sessions. A task that runs after the database connection pool is closed will crash noisily:
// server.ts — graceful shutdown with scheduler
import cron from 'node-cron';
async function shutdown(deps: Deps, httpServer: http.Server) {
isShuttingDown = true;
// 1. Stop accepting new cron fires immediately
cron.getTasks().forEach(task => task.stop());
// 2. Stop accepting new HTTP connections
httpServer.close();
// 3. Wait for in-flight MCP sessions and any running tasks to complete
await new Promise(r => setTimeout(r, DRAIN_TIMEOUT_MS));
// 4. Close infrastructure
await deps.db.end();
await deps.cache.quit();
process.exit(0);
}
process.on('SIGTERM', () => shutdown(deps, httpServer));
The cron.getTasks().forEach(task => task.stop()) call prevents new task fires during the drain window. Tasks already in progress at SIGTERM time will continue until completion (or until the drain timeout elapses). The drain window should be sized to the maximum expected task duration plus a buffer — the same sizing rule as for graceful shutdown of MCP sessions.
Related questions
Should scheduled tasks run in the same process as the MCP server or in a separate worker?
Same process for I/O-bound tasks with short (<30s) run times. Separate process or container for CPU-intensive tasks, long-running tasks, or tasks that need to scale independently of the MCP server. The cleanest architecture for complex scheduled work is to have the cron trigger enqueue a job to BullMQ and have a separate worker process handle it — see the message queue guide. This decouples task execution from the MCP server's event loop and lets you scale workers and servers independently.
How does Kubernetes CronJob compare to in-process scheduling?
Kubernetes CronJob creates a pod on schedule and deletes it when the task completes. This is the cleanest separation of concerns — task code is isolated, failures don't affect the MCP server, and the pod's logs are clearly separate. The tradeoff: each CronJob pod starts cold (downloads the image, initializes deps), which adds latency for frequent tasks. In-process scheduling has no cold start and can reuse existing connections from the Deps object. For tasks running every few minutes, in-process is usually simpler; for tasks running hourly or less, Kubernetes CronJob is worth the isolation.
Can I use Bun's built-in cron instead of node-cron?
Yes, if your MCP server runs on Bun. Bun.cron(schedule, fn) is equivalent to cron.schedule(schedule, fn) with slightly less overhead. The patterns in this guide — leader lock, task records, health_check tool — apply regardless of the cron library. Bun also has built-in SQLite, which simplifies the single-process queue pattern from the message queue guide if you prefer SQLite over Redis.
Does AliveMCP alert me when a scheduled task fails?
Only indirectly. AliveMCP's standard probe confirms the MCP server is responding to initialize + tools/list — a failing cron task does not affect that probe. To get alerts on task failures, expose a health_check tool that returns isError: true when a task has failed, and configure AliveMCP's custom probe or an external synthetic monitor to call it periodically. This closes the monitoring gap between "MCP server is up" and "background tasks are healthy."
Further reading
- MCP server message queue — using BullMQ to offload scheduled task execution to background workers
- MCP server load balancing — leader election context and Redis-based distributed coordination
- MCP server graceful shutdown — drain window sizing for in-flight tasks and MCP sessions
- MCP server observability — instrumenting task duration and failure rates alongside MCP metrics
- MCP server dependency injection — sharing Deps (DB pool, Redis, logger) between MCP handlers and scheduled tasks
- AliveMCP — protocol-level uptime monitoring complementing your scheduled task health_check tool