Guide · Architecture

MCP server message queue

Tool calls in an MCP server are synchronous from the client's perspective — the client sends a request and waits for a response. For fast tools (database queries, API calls, calculations) this is fine; the response arrives in milliseconds. For slow tools (video transcoding, large-batch exports, long-running AI tasks that can take minutes) blocking the tool call until completion makes the client wait, ties up the MCP session, and risks hitting transport timeouts. Message queues decouple the trigger (the tool call) from the execution (the background worker), letting the tool return a job ID immediately while the work continues asynchronously.

TL;DR

For long-running tasks: the trigger tool enqueues the job and returns a job_id immediately (no waiting). A separate job_status tool lets the client poll the result. Use BullMQ for a battle-tested TypeScript queue over Redis, or SQLite for simpler single-process deployments. Create one queue connection and one worker at module scope — not per tool call. Add a health_check tool that pings the queue and returns consumer stats; configure AliveMCP (or a synthetic monitor) to call it, since AliveMCP's standard protocol probe only confirms the MCP server is up — it cannot see whether queue consumers are processing jobs.

When to use a message queue

Not every slow operation needs a queue. The decision tree:

Fire-and-return pattern with BullMQ

BullMQ is a Redis-backed job queue with TypeScript types, retries, and a dashboard. Install bullmq and ioredis. Create the queue and worker at module scope — not inside tool handlers:

// queue.ts — module-scope queue and worker
import { Queue, Worker, Job } from 'bullmq';
import { Redis } from 'ioredis';

const connection = new Redis(process.env.REDIS_URL!, { maxRetriesPerRequest: null });

export const exportQueue = new Queue('exports', { connection });

export const exportWorker = new Worker<ExportJobData, ExportJobResult>(
  'exports',
  async (job: Job<ExportJobData>) => {
    // The actual long-running work
    const result = await runExport(job.data);
    return result;
  },
  {
    connection,
    concurrency: 3,        // process up to 3 jobs at a time
    removeOnComplete: { count: 1000 },
    removeOnFail: { count: 200 },
  }
);

exportWorker.on('failed', (job, err) => {
  console.error(JSON.stringify({ event: 'job_failed', jobId: job?.id, error: err.message }));
});

The connection is created once at module scope and shared between the Queue (for enqueuing) and the Worker (for processing). BullMQ requires maxRetriesPerRequest: null on the Redis connection for its blocking XREAD calls to work correctly. Creating a new connection inside each tool call would open a Redis connection on every request — exhausting file descriptors quickly under load.

The tool handler enqueues and returns the job ID:

// tools/export.ts
import { z } from 'zod';
import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import type { Deps } from '../deps.js';
import { exportQueue } from '../queue.js';

export function registerExportTools(server: McpServer, deps: Deps) {
  server.tool(
    'start_export',
    { format: z.enum(['csv', 'json', 'xlsx']), filters: z.record(z.string()).optional() },
    async ({ format, filters }) => {
      const job = await exportQueue.add('export', { format, filters, requestedAt: new Date().toISOString() }, {
        attempts: 3,
        backoff: { type: 'exponential', delay: 5000 },
      });

      return {
        content: [{
          type: 'text',
          text: JSON.stringify({ job_id: job.id, status: 'queued', message: 'Export started. Use get_export_status to check progress.' }),
        }],
      };
    }
  );

  server.tool(
    'get_export_status',
    { job_id: z.string() },
    async ({ job_id }) => {
      const job = await exportQueue.getJob(job_id);
      if (!job) {
        return { content: [{ type: 'text', text: JSON.stringify({ error: 'Job not found' }) }], isError: true };
      }

      const state = await job.getState();  // 'waiting' | 'active' | 'completed' | 'failed'
      const result = state === 'completed' ? await job.returnvalue : null;
      const failReason = state === 'failed' ? job.failedReason : null;

      return {
        content: [{
          type: 'text',
          text: JSON.stringify({ job_id, state, result, fail_reason: failReason }),
        }],
      };
    }
  );
}

The client calls start_export once and gets a job_id. It then calls get_export_status at intervals until the state is completed or failed. No long-running blocking tool call, no transport timeout.

SQLite-backed queue for simpler deployments

If Redis is not in your stack, a SQLite-backed queue works well for single-process or low-throughput deployments:

// simple-queue.ts — SQLite-backed job queue using better-sqlite3
import Database from 'better-sqlite3';

export class SimpleQueue {
  private db: Database.Database;
  private poll: NodeJS.Timeout | null = null;

  constructor(private handler: (jobId: string, data: unknown) => Promise<unknown>) {
    this.db = new Database('./data.db');
    this.db.exec(`
      CREATE TABLE IF NOT EXISTS jobs (
        id TEXT PRIMARY KEY,
        queue TEXT NOT NULL,
        data TEXT NOT NULL,
        status TEXT NOT NULL DEFAULT 'pending',
        result TEXT,
        error TEXT,
        attempts INTEGER NOT NULL DEFAULT 0,
        created_at INTEGER NOT NULL DEFAULT (unixepoch()),
        updated_at INTEGER NOT NULL DEFAULT (unixepoch())
      )
    `);
  }

  enqueue(queue: string, data: unknown): string {
    const id = crypto.randomUUID();
    this.db.prepare(
      'INSERT INTO jobs (id, queue, data) VALUES (?, ?, ?)'
    ).run(id, queue, JSON.stringify(data));
    return id;
  }

  getStatus(id: string): { status: string; result?: unknown; error?: string } | null {
    const row = this.db.prepare('SELECT status, result, error FROM jobs WHERE id = ?').get(id) as any;
    if (!row) return null;
    return { status: row.status, result: row.result ? JSON.parse(row.result) : undefined, error: row.error };
  }

  start(queue: string, intervalMs = 1000): void {
    this.poll = setInterval(async () => {
      const job = this.db.prepare(
        'SELECT id, data FROM jobs WHERE queue = ? AND status = ? AND attempts < 3 LIMIT 1'
      ).get(queue, 'pending') as any;

      if (!job) return;

      this.db.prepare('UPDATE jobs SET status = ?, attempts = attempts + 1, updated_at = unixepoch() WHERE id = ?')
        .run('active', job.id);

      try {
        const result = await this.handler(job.id, JSON.parse(job.data));
        this.db.prepare('UPDATE jobs SET status = ?, result = ?, updated_at = unixepoch() WHERE id = ?')
          .run('completed', JSON.stringify(result), job.id);
      } catch (err: any) {
        this.db.prepare('UPDATE jobs SET status = ?, error = ?, updated_at = unixepoch() WHERE id = ?')
          .run('failed', err.message, job.id);
      }
    }, intervalMs);
  }

  stop(): void {
    if (this.poll) clearInterval(this.poll);
  }
}

SQLite with better-sqlite3 handles hundreds of jobs per second with no external infrastructure. The tradeoff vs. BullMQ: no multi-process consumer fan-out (the worker runs in the same Node.js process as the MCP server), no dashboard, and manual retry logic. For hobby MCP servers and small teams, this is often the right choice.

Dead-letter queues and error handling

Jobs that fail all retry attempts need a destination: either delete them (losing the data) or move them to a dead-letter queue (DLQ) for inspection. BullMQ moves exhausted jobs to a failed state automatically, preserving the failure reason. Add a monitor:

// Monitor failed jobs and alert
exportWorker.on('failed', (job, err) => {
  if (job && job.attemptsMade >= (job.opts.attempts ?? 1)) {
    // Job has exhausted all attempts — alert
    deps.logger.error('job_dead_letter', {
      jobId: job.id,
      queue: 'exports',
      failedReason: err.message,
      data: job.data,
    });
    // Optionally: send to a webhook, PagerDuty, Slack, etc.
  }
});

Tools should surface DLQ status to clients. A get_export_status call for a failed job returns state: 'failed' with fail_reason — the client can decide whether to retry by calling start_export again or surface the error to the user.

Monitoring queue health alongside MCP uptime

AliveMCP's standard probe confirms the MCP server is up and responding to initialize + tools/list. It cannot see whether queue consumers are processing jobs, whether the Redis connection is healthy, or whether the DLQ is filling up. Add a health_check tool that surfaces this:

server.tool('health_check', {}, async () => {
  const checks = await Promise.allSettled([
    // MCP server is up (trivially true — if this runs, the server is up)

    // Queue: can we reach Redis?
    exportQueue.client.ping().then(() => ({ name: 'queue_redis', ok: true })),

    // Worker: is the worker connected?
    Promise.resolve({ name: 'worker_running', ok: !exportWorker.closing }),

    // DLQ depth: how many failed jobs?
    exportQueue.getFailedCount().then(count => ({
      name: 'dlq_depth', ok: count < 50, count
    })),

    // Active jobs: are consumers keeping up?
    exportQueue.getActiveCount().then(active =>
      exportQueue.getWaitingCount().then(waiting => ({
        name: 'queue_depth', ok: waiting < 1000, active, waiting
      }))
    ),
  ]);

  const results = checks.map((c, i) =>
    c.status === 'fulfilled' ? c.value : { name: `check_${i}`, ok: false, error: (c.reason as Error).message }
  );
  const allOk = results.every(r => r.ok);

  return {
    content: [{ type: 'text', text: JSON.stringify({ healthy: allOk, checks: results }) }],
    isError: !allOk,
  };
});

Configure a synthetic monitor (or AliveMCP's custom probe feature) to call health_check every few minutes. This gives you observability over the queue layer that the standard protocol probe can't provide.

Related questions

Should the BullMQ worker run in the same process as the MCP server?

For low-to-medium volume: yes, same process is fine. The worker runs concurrently via the event loop alongside the MCP request handler. The benefit is simpler deployment — one process, one container. The risk is that a CPU-intensive job blocks the event loop and delays MCP tool call responses. If your jobs are I/O-bound (fetching data, calling APIs), same-process is safe. If your jobs are CPU-bound (image processing, compression, ML inference), run the worker in a separate process or use Node.js worker threads.

Can clients receive job completion notifications via SSE instead of polling?

Yes. When a job completes, the worker can publish a notification event that the MCP server relays to the client via the SSE stream. This requires the worker and MCP server to share a pub/sub channel (Redis pub/sub or an EventEmitter in same-process mode). The MCP server listens for completion events and calls server.notification() to push the result to the client. This is more complex than polling but avoids repeated get_export_status calls.

How do I handle queue jobs across a load-balanced MCP cluster?

If you run multiple MCP server instances, the queue is shared (all instances enqueue to the same Redis queue), and the workers compete to process jobs (each job is picked up by exactly one worker). get_export_status reads from the shared queue, so any MCP instance can answer a status query regardless of which instance enqueued the job. The queue is the shared state that makes horizontal scaling work without sticky sessions for job status queries.

What's the difference between BullMQ and Temporal for MCP background jobs?

BullMQ is a Redis-backed job queue — simple, fast, no external infrastructure beyond Redis. Temporal is a workflow orchestration platform — durable execution across process restarts, multi-step workflows, activity retries with state preservation. For MCP servers, BullMQ covers 90% of use cases. Temporal is worth the operational complexity when jobs span multiple steps that each need individual retry semantics, or when jobs must survive process crashes mid-execution with state preserved.

Further reading