Guide · Operations

MCP server scheduled tasks

Most non-trivial MCP servers need background work that runs on a schedule: syncing tool data from upstream registries, warming caches before peak traffic, purging stale session records, generating periodic health snapshots. The question is where that scheduled work lives. Running cron jobs separately (a dedicated container, a Kubernetes CronJob, a cloud scheduler) is operationally cleaner but adds deployment complexity. Running cron alongside the MCP server in the same process is simpler — one deployment artifact — but requires care to avoid interfering with the responsiveness of the MCP request handler.

TL;DR

Use node-cron to schedule tasks in the same process as your MCP server. Start the scheduler after createDeps() completes and before app.listen(). For CPU-intensive tasks, offload to a worker thread or a message queue. In a load-balanced cluster, use a Redis-based leader lock so only one replica executes each scheduled task. Make tasks idempotent — safe to run twice if the leader lock is not acquired atomically. Expose tasks as MCP tools with a trigger_task pattern so they can be run on-demand. Add a health_check tool that reports the last successful run time for each task, and configure AliveMCP or a synthetic monitor to call it.

In-process scheduling with node-cron

node-cron runs cron expressions in the Node.js event loop. Install it and start tasks during server initialization, after createDeps() has completed:

// scheduler.ts
import cron from 'node-cron';
import type { Deps } from './deps.js';

interface TaskRecord {
  name: string;
  lastRunAt: Date | null;
  lastRunStatus: 'ok' | 'error' | null;
  lastRunError: string | null;
}

const taskRecords = new Map<string, TaskRecord>();

function registerTask(name: string, schedule: string, fn: (deps: Deps) => Promise<void>, deps: Deps) {
  taskRecords.set(name, { name, lastRunAt: null, lastRunStatus: null, lastRunError: null });

  cron.schedule(schedule, async () => {
    const start = Date.now();
    try {
      await fn(deps);
      const rec = taskRecords.get(name)!;
      rec.lastRunAt = new Date();
      rec.lastRunStatus = 'ok';
      rec.lastRunError = null;
      deps.logger.info('scheduled_task_ok', { task: name, duration_ms: Date.now() - start });
    } catch (err: any) {
      const rec = taskRecords.get(name)!;
      rec.lastRunAt = new Date();
      rec.lastRunStatus = 'error';
      rec.lastRunError = err.message;
      deps.logger.error('scheduled_task_error', { task: name, error: err.message, duration_ms: Date.now() - start });
    }
  });
}

export function startScheduler(deps: Deps): Map<string, TaskRecord> {
  registerTask('registry_sync',      '*/5 * * * *',   syncRegistry,    deps); // every 5 min
  registerTask('cache_warm',         '0 * * * *',     warmCache,       deps); // every hour
  registerTask('session_cleanup',    '0 2 * * *',     cleanupSessions, deps); // daily at 2am
  registerTask('health_snapshot',    '*/15 * * * *',  snapshotHealth,  deps); // every 15 min
  return taskRecords;
}

export { taskRecords };

Each call to registerTask also writes to taskRecords, which tracks the last run timestamp and status. The server entry point calls startScheduler(deps) after createDeps() returns, before app.listen():

// server.ts
async function main() {
  const deps = await createDeps();
  startScheduler(deps);                // start tasks before accepting traffic

  const app = createApp(deps);
  app.listen(3000);
}

Tasks run concurrently with tool calls via the event loop. I/O-bound tasks (database queries, HTTP calls) do not block the event loop and are safe here. CPU-intensive tasks (compression, hashing large datasets) should be offloaded to a worker thread using worker_threads or queued via BullMQ so they don't delay MCP request processing.

Leader election across replicas

In a load-balanced cluster with multiple MCP server replicas, every replica runs the same node-cron schedule. Without coordination, every replica runs registry_sync every five minutes — three replicas hitting the upstream API three times simultaneously on the same schedule. For some tasks this is harmless (idempotent reads); for others (write operations, expensive API calls with rate limits) it causes duplicate work or quota exhaustion.

Redis-based leader election solves this: the first replica to acquire a lock at schedule time executes the task, the others skip it:

// leader-lock.ts
import { Redis } from 'ioredis';
import { hostname } from 'node:os';

const INSTANCE_ID = `${hostname()}-${process.pid}`;

export async function withLeaderLock(
  redis: Redis,
  lockKey: string,
  ttlSeconds: number,
  fn: () => Promise<void>
): Promise<boolean> {
  // SET lock NX EX ttl — atomic, only one replica succeeds
  const acquired = await redis.set(
    `leader-lock:${lockKey}`,
    INSTANCE_ID,
    'EX', ttlSeconds,
    'NX'
  );

  if (!acquired) return false; // another replica holds the lock

  try {
    await fn();
    return true;
  } finally {
    // Only release if we still hold the lock (guard against TTL expiry mid-task)
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;
    await redis.eval(script, 1, `leader-lock:${lockKey}`, INSTANCE_ID);
  }
}

// Usage in scheduler
registerTask('registry_sync', '*/5 * * * *', async (deps) => {
  const executed = await withLeaderLock(deps.cache, 'registry_sync', 240, async () => {
    await syncRegistry(deps);
  });
  if (!executed) deps.logger.info('scheduled_task_skipped', { task: 'registry_sync', reason: 'not_leader' });
}, deps);

The TTL is set to 240 seconds (4 minutes) — slightly shorter than the 5-minute cron interval. This means the lock expires before the next scheduled run even if the task crashes and the finally block does not execute. The Lua script on release guards against a race where the lock's TTL expires and another replica acquires it before this instance's task finishes.

Task idempotency requirements

Leader election reduces duplicate runs; it does not eliminate them. Distributed locking has edge cases: clock skew, network partitions, and Redis replication lag can all cause two replicas to believe they hold the lock simultaneously. Every scheduled task must be idempotent — safe to run twice with the same outcome:

Exposing tasks as triggerable tools

Scheduled tasks that are difficult to test by waiting for the schedule can be exposed as MCP tools that trigger them on-demand. This also lets agent workflows invoke tasks manually:

// tools/admin.ts — trigger tasks manually via MCP
import { z } from 'zod';
import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import type { Deps } from '../deps.js';
import { syncRegistry, warmCache } from '../tasks/index.js';

export function registerAdminTools(server: McpServer, deps: Deps) {
  server.tool(
    'trigger_task',
    {
      task_name: z.enum(['registry_sync', 'cache_warm', 'session_cleanup']),
      reason: z.string().optional(),
    },
    async ({ task_name, reason }) => {
      const start = Date.now();
      try {
        const task = { registry_sync: syncRegistry, cache_warm: warmCache, session_cleanup: cleanupSessions }[task_name];
        await task(deps);
        deps.logger.info('manual_task_trigger', { task: task_name, reason, duration_ms: Date.now() - start });
        return { content: [{ type: 'text', text: JSON.stringify({ ok: true, task: task_name, duration_ms: Date.now() - start }) }] };
      } catch (err: any) {
        return { content: [{ type: 'text', text: JSON.stringify({ ok: false, task: task_name, error: err.message }) }], isError: true };
      }
    }
  );
}

The task implementation (syncRegistry, etc.) is the same function called by both the cron schedule and the tool. No duplication — the scheduling and the triggering are separate concerns layered on top of the same task function.

Monitoring scheduled task health

AliveMCP probes the MCP protocol (initialize + tools/list) to confirm your server is up. It cannot see whether scheduled tasks are running correctly — a task that fails silently leaves taskRecords updated with lastRunStatus: 'error' while the MCP server continues to respond normally to the protocol probe. Add a health_check tool that surfaces task health:

server.tool('health_check', {}, async () => {
  const taskHealth = Array.from(taskRecords.values()).map(rec => {
    const staleness = rec.lastRunAt
      ? Date.now() - rec.lastRunAt.getTime()
      : Infinity;

    // Flag as unhealthy if last run was an error or if stale beyond 2× interval
    const ok = rec.lastRunStatus === 'ok' && staleness < 600_000; // 10 min

    return {
      name: rec.name,
      ok,
      last_run_at: rec.lastRunAt?.toISOString() ?? null,
      last_run_status: rec.lastRunStatus,
      last_run_error: rec.lastRunError,
      staleness_ms: Number.isFinite(staleness) ? staleness : null,
    };
  });

  const healthy = taskHealth.every(t => t.ok);

  return {
    content: [{ type: 'text', text: JSON.stringify({ healthy, tasks: taskHealth }) }],
    isError: !healthy,
  };
});

Configure a synthetic monitor or AliveMCP's custom probe to call health_check every 5–15 minutes. A task that fails consistently — upstream API down, database constraint violation — shows up as isError: true in the health check tool response, which the monitoring tool surfaces as an alert. This is the same pattern as the message queue health_check tool for consumer monitoring.

Graceful shutdown and in-flight tasks

When SIGTERM arrives, stop the cron scheduler before draining MCP sessions. A task that runs after the database connection pool is closed will crash noisily:

// server.ts — graceful shutdown with scheduler
import cron from 'node-cron';

async function shutdown(deps: Deps, httpServer: http.Server) {
  isShuttingDown = true;

  // 1. Stop accepting new cron fires immediately
  cron.getTasks().forEach(task => task.stop());

  // 2. Stop accepting new HTTP connections
  httpServer.close();

  // 3. Wait for in-flight MCP sessions and any running tasks to complete
  await new Promise(r => setTimeout(r, DRAIN_TIMEOUT_MS));

  // 4. Close infrastructure
  await deps.db.end();
  await deps.cache.quit();

  process.exit(0);
}

process.on('SIGTERM', () => shutdown(deps, httpServer));

The cron.getTasks().forEach(task => task.stop()) call prevents new task fires during the drain window. Tasks already in progress at SIGTERM time will continue until completion (or until the drain timeout elapses). The drain window should be sized to the maximum expected task duration plus a buffer — the same sizing rule as for graceful shutdown of MCP sessions.

Related questions

Should scheduled tasks run in the same process as the MCP server or in a separate worker?

Same process for I/O-bound tasks with short (<30s) run times. Separate process or container for CPU-intensive tasks, long-running tasks, or tasks that need to scale independently of the MCP server. The cleanest architecture for complex scheduled work is to have the cron trigger enqueue a job to BullMQ and have a separate worker process handle it — see the message queue guide. This decouples task execution from the MCP server's event loop and lets you scale workers and servers independently.

How does Kubernetes CronJob compare to in-process scheduling?

Kubernetes CronJob creates a pod on schedule and deletes it when the task completes. This is the cleanest separation of concerns — task code is isolated, failures don't affect the MCP server, and the pod's logs are clearly separate. The tradeoff: each CronJob pod starts cold (downloads the image, initializes deps), which adds latency for frequent tasks. In-process scheduling has no cold start and can reuse existing connections from the Deps object. For tasks running every few minutes, in-process is usually simpler; for tasks running hourly or less, Kubernetes CronJob is worth the isolation.

Can I use Bun's built-in cron instead of node-cron?

Yes, if your MCP server runs on Bun. Bun.cron(schedule, fn) is equivalent to cron.schedule(schedule, fn) with slightly less overhead. The patterns in this guide — leader lock, task records, health_check tool — apply regardless of the cron library. Bun also has built-in SQLite, which simplifies the single-process queue pattern from the message queue guide if you prefer SQLite over Redis.

Does AliveMCP alert me when a scheduled task fails?

Only indirectly. AliveMCP's standard probe confirms the MCP server is responding to initialize + tools/list — a failing cron task does not affect that probe. To get alerts on task failures, expose a health_check tool that returns isError: true when a task has failed, and configure AliveMCP's custom probe or an external synthetic monitor to call it periodically. This closes the monitoring gap between "MCP server is up" and "background tasks are healthy."

Further reading