Guide · Operations

MCP server scheduled tasks

Most non-trivial MCP servers need background work that runs on a schedule: syncing tool data from upstream registries, warming caches before peak traffic, purging stale session records, generating periodic health snapshots. The question is where that scheduled work lives. Running cron jobs separately (a dedicated container, a Kubernetes CronJob, a cloud scheduler) is operationally cleaner but adds deployment complexity. Running cron alongside the MCP server in the same process is simpler — one deployment artifact — but requires care to avoid interfering with the responsiveness of the MCP request handler.

TL;DR

Use node-cron to schedule tasks in the same process as your MCP server. Start the scheduler after createDeps() completes and before app.listen(). For CPU-intensive tasks, offload to a worker thread or a message queue. In a load-balanced cluster, use a Redis-based leader lock so only one replica executes each scheduled task. Make tasks idempotent — safe to run twice if the leader lock is not acquired atomically. Expose tasks as MCP tools with a trigger_task pattern so they can be run on-demand. Add a health_check tool that reports the last successful run time for each task, and configure AliveMCP or a synthetic monitor to call it.

In-process scheduling with node-cron

node-cron runs cron expressions in the Node.js event loop. Install it and start tasks during server initialization, after createDeps() has completed:

// scheduler.ts
import cron from 'node-cron';
import type { Deps } from './deps.js';

interface TaskRecord {
  name: string;
  lastRunAt: Date | null;
  lastRunStatus: 'ok' | 'error' | null;
  lastRunError: string | null;
}

const taskRecords = new Map<string, TaskRecord>();

function registerTask(name: string, schedule: string, fn: (deps: Deps) => Promise<void>, deps: Deps) {
  taskRecords.set(name, { name, lastRunAt: null, lastRunStatus: null, lastRunError: null });

  cron.schedule(schedule, async () => {
    const start = Date.now();
    try {
      await fn(deps);
      const rec = taskRecords.get(name)!;
      rec.lastRunAt = new Date();
      rec.lastRunStatus = 'ok';
      rec.lastRunError = null;
      deps.logger.info('scheduled_task_ok', { task: name, duration_ms: Date.now() - start });
    } catch (err: any) {
      const rec = taskRecords.get(name)!;
      rec.lastRunAt = new Date();
      rec.lastRunStatus = 'error';
      rec.lastRunError = err.message;
      deps.logger.error('scheduled_task_error', { task: name, error: err.message, duration_ms: Date.now() - start });
    }
  });
}

export function startScheduler(deps: Deps): Map<string, TaskRecord> {
  registerTask('registry_sync',      '*/5 * * * *',   syncRegistry,    deps); // every 5 min
  registerTask('cache_warm',         '0 * * * *',     warmCache,       deps); // every hour
  registerTask('session_cleanup',    '0 2 * * *',     cleanupSessions, deps); // daily at 2am
  registerTask('health_snapshot',    '*/15 * * * *',  snapshotHealth,  deps); // every 15 min
  return taskRecords;
}

export { taskRecords };

Each call to registerTask also writes to taskRecords, which tracks the last run timestamp and status. The server entry point calls startScheduler(deps) after createDeps() returns, before app.listen():

// server.ts
async function main() {
  const deps = await createDeps();
  startScheduler(deps);                // start tasks before accepting traffic

  const app = createApp(deps);
  app.listen(3000);
}

Tasks run concurrently with tool calls via the event loop. I/O-bound tasks (database queries, HTTP calls) do not block the event loop and are safe here. CPU-intensive tasks (compression, hashing large datasets) should be offloaded to a worker thread using worker_threads or queued via BullMQ so they don't delay MCP request processing.

Leader election across replicas

In a load-balanced cluster with multiple MCP server replicas, every replica runs the same node-cron schedule. Without coordination, every replica runs registry_sync every five minutes — three replicas hitting the upstream API three times simultaneously on the same schedule. For some tasks this is harmless (idempotent reads); for others (write operations, expensive API calls with rate limits) it causes duplicate work or quota exhaustion.

Redis-based leader election solves this: the first replica to acquire a lock at schedule time executes the task, the others skip it:

// leader-lock.ts
import { Redis } from 'ioredis';
import { hostname } from 'node:os';

const INSTANCE_ID = `${hostname()}-${process.pid}`;

export async function withLeaderLock(
  redis: Redis,
  lockKey: string,
  ttlSeconds: number,
  fn: () => Promise<void>
): Promise<boolean> {
  // SET lock NX EX ttl — atomic, only one replica succeeds
  const acquired = await redis.set(
    `leader-lock:${lockKey}`,
    INSTANCE_ID,
    'EX', ttlSeconds,
    'NX'
  );

  if (!acquired) return false; // another replica holds the lock

  try {
    await fn();
    return true;
  } finally {
    // Only release if we still hold the lock (guard against TTL expiry mid-task)
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;
    await redis.eval(script, 1, `leader-lock:${lockKey}`, INSTANCE_ID);
  }
}

// Usage in scheduler
registerTask('registry_sync', '*/5 * * * *', async (deps) => {
  const executed = await withLeaderLock(deps.cache, 'registry_sync', 240, async () => {
    await syncRegistry(deps);
  });
  if (!executed) deps.logger.info('scheduled_task_skipped', { task: 'registry_sync', reason: 'not_leader' });
}, deps);

The TTL is set to 240 seconds (4 minutes) — slightly shorter than the 5-minute cron interval. This means the lock expires before the next scheduled run even if the task crashes and the finally block does not execute. The Lua script on release guards against a race where the lock's TTL expires and another replica acquires it before this instance's task finishes.

Task idempotency requirements

Leader election reduces duplicate runs; it does not eliminate them. Distributed locking has edge cases: clock skew, network partitions, and Redis replication lag can all cause two replicas to believe they hold the lock simultaneously. Every scheduled task must be idempotent — safe to run twice with the same outcome:

Registry sync: upsert rows by primary key, not insert. A duplicate sync updates the same rows to the same values — no side effects.
Cache warm: overwriting an existing cache key is idempotent. Use SET key value EX ttl, not SET key value NX (which fails on duplicate, silently leaving a stale value).
Session cleanup: DELETE FROM sessions WHERE expires_at < NOW() is idempotent — running it twice deletes the same rows (or finds nothing on the second run).
External API writes: include an idempotency key derived from the task type and scheduled time. POST /reports with Idempotency-Key: registry_sync_2026-06-02T14:00 — the upstream deduplicates and returns the same response on retry.

Exposing tasks as triggerable tools

Scheduled tasks that are difficult to test by waiting for the schedule can be exposed as MCP tools that trigger them on-demand. This also lets agent workflows invoke tasks manually:

// tools/admin.ts — trigger tasks manually via MCP
import { z } from 'zod';
import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import type { Deps } from '../deps.js';
import { syncRegistry, warmCache } from '../tasks/index.js';

export function registerAdminTools(server: McpServer, deps: Deps) {
  server.tool(
    'trigger_task',
    {
      task_name: z.enum(['registry_sync', 'cache_warm', 'session_cleanup']),
      reason: z.string().optional(),
    },
    async ({ task_name, reason }) => {
      const start = Date.now();
      try {
        const task = { registry_sync: syncRegistry, cache_warm: warmCache, session_cleanup: cleanupSessions }[task_name];
        await task(deps);
        deps.logger.info('manual_task_trigger', { task: task_name, reason, duration_ms: Date.now() - start });
        return { content: [{ type: 'text', text: JSON.stringify({ ok: true, task: task_name, duration_ms: Date.now() - start }) }] };
      } catch (err: any) {
        return { content: [{ type: 'text', text: JSON.stringify({ ok: false, task: task_name, error: err.message }) }], isError: true };
      }
    }
  );
}

The task implementation (syncRegistry, etc.) is the same function called by both the cron schedule and the tool. No duplication — the scheduling and the triggering are separate concerns layered on top of the same task function.

Monitoring scheduled task health

AliveMCP probes the MCP protocol (initialize + tools/list) to confirm your server is up. It cannot see whether scheduled tasks are running correctly — a task that fails silently leaves taskRecords updated with lastRunStatus: 'error' while the MCP server continues to respond normally to the protocol probe. Add a health_check tool that surfaces task health:

server.tool('health_check', {}, async () => {
  const taskHealth = Array.from(taskRecords.values()).map(rec => {
    const staleness = rec.lastRunAt
      ? Date.now() - rec.lastRunAt.getTime()
      : Infinity;

    // Flag as unhealthy if last run was an error or if stale beyond 2× interval
    const ok = rec.lastRunStatus === 'ok' && staleness < 600_000; // 10 min

    return {
      name: rec.name,
      ok,
      last_run_at: rec.lastRunAt?.toISOString() ?? null,
      last_run_status: rec.lastRunStatus,
      last_run_error: rec.lastRunError,
      staleness_ms: Number.isFinite(staleness) ? staleness : null,
    };
  });

  const healthy = taskHealth.every(t => t.ok);

  return {
    content: [{ type: 'text', text: JSON.stringify({ healthy, tasks: taskHealth }) }],
    isError: !healthy,
  };
});

Configure a synthetic monitor or AliveMCP's custom probe to call health_check every 5–15 minutes. A task that fails consistently — upstream API down, database constraint violation — shows up as isError: true in the health check tool response, which the monitoring tool surfaces as an alert. This is the same pattern as the message queue health_check tool for consumer monitoring.

Graceful shutdown and in-flight tasks

When SIGTERM arrives, stop the cron scheduler before draining MCP sessions. A task that runs after the database connection pool is closed will crash noisily:

// server.ts — graceful shutdown with scheduler
import cron from 'node-cron';

async function shutdown(deps: Deps, httpServer: http.Server) {
  isShuttingDown = true;

  // 1. Stop accepting new cron fires immediately
  cron.getTasks().forEach(task => task.stop());

  // 2. Stop accepting new HTTP connections
  httpServer.close();

  // 3. Wait for in-flight MCP sessions and any running tasks to complete
  await new Promise(r => setTimeout(r, DRAIN_TIMEOUT_MS));

  // 4. Close infrastructure
  await deps.db.end();
  await deps.cache.quit();

  process.exit(0);
}

process.on('SIGTERM', () => shutdown(deps, httpServer));

The cron.getTasks().forEach(task => task.stop()) call prevents new task fires during the drain window. Tasks already in progress at SIGTERM time will continue until completion (or until the drain timeout elapses). The drain window should be sized to the maximum expected task duration plus a buffer — the same sizing rule as for graceful shutdown of MCP sessions.