Guide · Architecture

MCP server message queue

Tool calls in an MCP server are synchronous from the client's perspective — the client sends a request and waits for a response. For fast tools (database queries, API calls, calculations) this is fine; the response arrives in milliseconds. For slow tools (video transcoding, large-batch exports, long-running AI tasks that can take minutes) blocking the tool call until completion makes the client wait, ties up the MCP session, and risks hitting transport timeouts. Message queues decouple the trigger (the tool call) from the execution (the background worker), letting the tool return a job ID immediately while the work continues asynchronously.

TL;DR

For long-running tasks: the trigger tool enqueues the job and returns a job_id immediately (no waiting). A separate job_status tool lets the client poll the result. Use BullMQ for a battle-tested TypeScript queue over Redis, or SQLite for simpler single-process deployments. Create one queue connection and one worker at module scope — not per tool call. Add a health_check tool that pings the queue and returns consumer stats; configure AliveMCP (or a synthetic monitor) to call it, since AliveMCP's standard protocol probe only confirms the MCP server is up — it cannot see whether queue consumers are processing jobs.

When to use a message queue

Not every slow operation needs a queue. The decision tree:

Under 30 seconds, deterministic latency: block and await. Return the result directly. Add a timeout with AbortSignal and return isError: true if it exceeds the budget.
30 seconds to a few minutes, known completion time: consider long-polling — return the result when ready, keep the HTTP connection open (the MCP transport supports this). Useful for AI inference where the model response is streaming.
Minutes to hours, or bursty arrival rate: use a queue. The tool enqueues the job and returns immediately. The worker processes at its own pace. This handles back-pressure naturally — excess jobs queue up rather than overwhelming the worker pool.
Fan-out to multiple consumers: use a queue with multiple worker instances. Each job is processed by exactly one worker (competing consumers), or delivered to all workers (pub/sub, depending on the queue semantics).

Fire-and-return pattern with BullMQ

BullMQ is a Redis-backed job queue with TypeScript types, retries, and a dashboard. Install bullmq and ioredis. Create the queue and worker at module scope — not inside tool handlers:

// queue.ts — module-scope queue and worker
import { Queue, Worker, Job } from 'bullmq';
import { Redis } from 'ioredis';

const connection = new Redis(process.env.REDIS_URL!, { maxRetriesPerRequest: null });

export const exportQueue = new Queue('exports', { connection });

export const exportWorker = new Worker<ExportJobData, ExportJobResult>(
  'exports',
  async (job: Job<ExportJobData>) => {
    // The actual long-running work
    const result = await runExport(job.data);
    return result;
  },
  {
    connection,
    concurrency: 3,        // process up to 3 jobs at a time
    removeOnComplete: { count: 1000 },
    removeOnFail: { count: 200 },
  }
);

exportWorker.on('failed', (job, err) => {
  console.error(JSON.stringify({ event: 'job_failed', jobId: job?.id, error: err.message }));
});

The connection is created once at module scope and shared between the Queue (for enqueuing) and the Worker (for processing). BullMQ requires maxRetriesPerRequest: null on the Redis connection for its blocking XREAD calls to work correctly. Creating a new connection inside each tool call would open a Redis connection on every request — exhausting file descriptors quickly under load.

The tool handler enqueues and returns the job ID:

// tools/export.ts
import { z } from 'zod';
import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import type { Deps } from '../deps.js';
import { exportQueue } from '../queue.js';

export function registerExportTools(server: McpServer, deps: Deps) {
  server.tool(
    'start_export',
    { format: z.enum(['csv', 'json', 'xlsx']), filters: z.record(z.string()).optional() },
    async ({ format, filters }) => {
      const job = await exportQueue.add('export', { format, filters, requestedAt: new Date().toISOString() }, {
        attempts: 3,
        backoff: { type: 'exponential', delay: 5000 },
      });

      return {
        content: [{
          type: 'text',
          text: JSON.stringify({ job_id: job.id, status: 'queued', message: 'Export started. Use get_export_status to check progress.' }),
        }],
      };
    }
  );

  server.tool(
    'get_export_status',
    { job_id: z.string() },
    async ({ job_id }) => {
      const job = await exportQueue.getJob(job_id);
      if (!job) {
        return { content: [{ type: 'text', text: JSON.stringify({ error: 'Job not found' }) }], isError: true };
      }

      const state = await job.getState();  // 'waiting' | 'active' | 'completed' | 'failed'
      const result = state === 'completed' ? await job.returnvalue : null;
      const failReason = state === 'failed' ? job.failedReason : null;

      return {
        content: [{
          type: 'text',
          text: JSON.stringify({ job_id, state, result, fail_reason: failReason }),
        }],
      };
    }
  );
}

The client calls start_export once and gets a job_id. It then calls get_export_status at intervals until the state is completed or failed. No long-running blocking tool call, no transport timeout.

SQLite-backed queue for simpler deployments

If Redis is not in your stack, a SQLite-backed queue works well for single-process or low-throughput deployments:

// simple-queue.ts — SQLite-backed job queue using better-sqlite3
import Database from 'better-sqlite3';

export class SimpleQueue {
  private db: Database.Database;
  private poll: NodeJS.Timeout | null = null;

  constructor(private handler: (jobId: string, data: unknown) => Promise<unknown>) {
    this.db = new Database('./data.db');
    this.db.exec(`
      CREATE TABLE IF NOT EXISTS jobs (
        id TEXT PRIMARY KEY,
        queue TEXT NOT NULL,
        data TEXT NOT NULL,
        status TEXT NOT NULL DEFAULT 'pending',
        result TEXT,
        error TEXT,
        attempts INTEGER NOT NULL DEFAULT 0,
        created_at INTEGER NOT NULL DEFAULT (unixepoch()),
        updated_at INTEGER NOT NULL DEFAULT (unixepoch())
      )
    `);
  }

  enqueue(queue: string, data: unknown): string {
    const id = crypto.randomUUID();
    this.db.prepare(
      'INSERT INTO jobs (id, queue, data) VALUES (?, ?, ?)'
    ).run(id, queue, JSON.stringify(data));
    return id;
  }

  getStatus(id: string): { status: string; result?: unknown; error?: string } | null {
    const row = this.db.prepare('SELECT status, result, error FROM jobs WHERE id = ?').get(id) as any;
    if (!row) return null;
    return { status: row.status, result: row.result ? JSON.parse(row.result) : undefined, error: row.error };
  }

  start(queue: string, intervalMs = 1000): void {
    this.poll = setInterval(async () => {
      const job = this.db.prepare(
        'SELECT id, data FROM jobs WHERE queue = ? AND status = ? AND attempts < 3 LIMIT 1'
      ).get(queue, 'pending') as any;

      if (!job) return;

      this.db.prepare('UPDATE jobs SET status = ?, attempts = attempts + 1, updated_at = unixepoch() WHERE id = ?')
        .run('active', job.id);

      try {
        const result = await this.handler(job.id, JSON.parse(job.data));
        this.db.prepare('UPDATE jobs SET status = ?, result = ?, updated_at = unixepoch() WHERE id = ?')
          .run('completed', JSON.stringify(result), job.id);
      } catch (err: any) {
        this.db.prepare('UPDATE jobs SET status = ?, error = ?, updated_at = unixepoch() WHERE id = ?')
          .run('failed', err.message, job.id);
      }
    }, intervalMs);
  }

  stop(): void {
    if (this.poll) clearInterval(this.poll);
  }
}

SQLite with better-sqlite3 handles hundreds of jobs per second with no external infrastructure. The tradeoff vs. BullMQ: no multi-process consumer fan-out (the worker runs in the same Node.js process as the MCP server), no dashboard, and manual retry logic. For hobby MCP servers and small teams, this is often the right choice.

Dead-letter queues and error handling

Jobs that fail all retry attempts need a destination: either delete them (losing the data) or move them to a dead-letter queue (DLQ) for inspection. BullMQ moves exhausted jobs to a failed state automatically, preserving the failure reason. Add a monitor:

// Monitor failed jobs and alert
exportWorker.on('failed', (job, err) => {
  if (job && job.attemptsMade >= (job.opts.attempts ?? 1)) {
    // Job has exhausted all attempts — alert
    deps.logger.error('job_dead_letter', {
      jobId: job.id,
      queue: 'exports',
      failedReason: err.message,
      data: job.data,
    });
    // Optionally: send to a webhook, PagerDuty, Slack, etc.
  }
});

Tools should surface DLQ status to clients. A get_export_status call for a failed job returns state: 'failed' with fail_reason — the client can decide whether to retry by calling start_export again or surface the error to the user.

Monitoring queue health alongside MCP uptime

AliveMCP's standard probe confirms the MCP server is up and responding to initialize + tools/list. It cannot see whether queue consumers are processing jobs, whether the Redis connection is healthy, or whether the DLQ is filling up. Add a health_check tool that surfaces this:

server.tool('health_check', {}, async () => {
  const checks = await Promise.allSettled([
    // MCP server is up (trivially true — if this runs, the server is up)

    // Queue: can we reach Redis?
    exportQueue.client.ping().then(() => ({ name: 'queue_redis', ok: true })),

    // Worker: is the worker connected?
    Promise.resolve({ name: 'worker_running', ok: !exportWorker.closing }),

    // DLQ depth: how many failed jobs?
    exportQueue.getFailedCount().then(count => ({
      name: 'dlq_depth', ok: count < 50, count
    })),

    // Active jobs: are consumers keeping up?
    exportQueue.getActiveCount().then(active =>
      exportQueue.getWaitingCount().then(waiting => ({
        name: 'queue_depth', ok: waiting < 1000, active, waiting
      }))
    ),
  ]);

  const results = checks.map((c, i) =>
    c.status === 'fulfilled' ? c.value : { name: `check_${i}`, ok: false, error: (c.reason as Error).message }
  );
  const allOk = results.every(r => r.ok);

  return {
    content: [{ type: 'text', text: JSON.stringify({ healthy: allOk, checks: results }) }],
    isError: !allOk,
  };
});

Configure a synthetic monitor (or AliveMCP's custom probe feature) to call health_check every few minutes. This gives you observability over the queue layer that the standard protocol probe can't provide.