Guide · Architecture
MCP server message queue
Tool calls in an MCP server are synchronous from the client's perspective — the client sends a request and waits for a response. For fast tools (database queries, API calls, calculations) this is fine; the response arrives in milliseconds. For slow tools (video transcoding, large-batch exports, long-running AI tasks that can take minutes) blocking the tool call until completion makes the client wait, ties up the MCP session, and risks hitting transport timeouts. Message queues decouple the trigger (the tool call) from the execution (the background worker), letting the tool return a job ID immediately while the work continues asynchronously.
TL;DR
For long-running tasks: the trigger tool enqueues the job and returns a job_id immediately (no waiting). A separate job_status tool lets the client poll the result. Use BullMQ for a battle-tested TypeScript queue over Redis, or SQLite for simpler single-process deployments. Create one queue connection and one worker at module scope — not per tool call. Add a health_check tool that pings the queue and returns consumer stats; configure AliveMCP (or a synthetic monitor) to call it, since AliveMCP's standard protocol probe only confirms the MCP server is up — it cannot see whether queue consumers are processing jobs.
When to use a message queue
Not every slow operation needs a queue. The decision tree:
- Under 30 seconds, deterministic latency: block and await. Return the result directly. Add a timeout with
AbortSignaland returnisError: trueif it exceeds the budget. - 30 seconds to a few minutes, known completion time: consider long-polling — return the result when ready, keep the HTTP connection open (the MCP transport supports this). Useful for AI inference where the model response is streaming.
- Minutes to hours, or bursty arrival rate: use a queue. The tool enqueues the job and returns immediately. The worker processes at its own pace. This handles back-pressure naturally — excess jobs queue up rather than overwhelming the worker pool.
- Fan-out to multiple consumers: use a queue with multiple worker instances. Each job is processed by exactly one worker (competing consumers), or delivered to all workers (pub/sub, depending on the queue semantics).
Fire-and-return pattern with BullMQ
BullMQ is a Redis-backed job queue with TypeScript types, retries, and a dashboard. Install bullmq and ioredis. Create the queue and worker at module scope — not inside tool handlers:
// queue.ts — module-scope queue and worker
import { Queue, Worker, Job } from 'bullmq';
import { Redis } from 'ioredis';
const connection = new Redis(process.env.REDIS_URL!, { maxRetriesPerRequest: null });
export const exportQueue = new Queue('exports', { connection });
export const exportWorker = new Worker<ExportJobData, ExportJobResult>(
'exports',
async (job: Job<ExportJobData>) => {
// The actual long-running work
const result = await runExport(job.data);
return result;
},
{
connection,
concurrency: 3, // process up to 3 jobs at a time
removeOnComplete: { count: 1000 },
removeOnFail: { count: 200 },
}
);
exportWorker.on('failed', (job, err) => {
console.error(JSON.stringify({ event: 'job_failed', jobId: job?.id, error: err.message }));
});
The connection is created once at module scope and shared between the Queue (for enqueuing) and the Worker (for processing). BullMQ requires maxRetriesPerRequest: null on the Redis connection for its blocking XREAD calls to work correctly. Creating a new connection inside each tool call would open a Redis connection on every request — exhausting file descriptors quickly under load.
The tool handler enqueues and returns the job ID:
// tools/export.ts
import { z } from 'zod';
import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import type { Deps } from '../deps.js';
import { exportQueue } from '../queue.js';
export function registerExportTools(server: McpServer, deps: Deps) {
server.tool(
'start_export',
{ format: z.enum(['csv', 'json', 'xlsx']), filters: z.record(z.string()).optional() },
async ({ format, filters }) => {
const job = await exportQueue.add('export', { format, filters, requestedAt: new Date().toISOString() }, {
attempts: 3,
backoff: { type: 'exponential', delay: 5000 },
});
return {
content: [{
type: 'text',
text: JSON.stringify({ job_id: job.id, status: 'queued', message: 'Export started. Use get_export_status to check progress.' }),
}],
};
}
);
server.tool(
'get_export_status',
{ job_id: z.string() },
async ({ job_id }) => {
const job = await exportQueue.getJob(job_id);
if (!job) {
return { content: [{ type: 'text', text: JSON.stringify({ error: 'Job not found' }) }], isError: true };
}
const state = await job.getState(); // 'waiting' | 'active' | 'completed' | 'failed'
const result = state === 'completed' ? await job.returnvalue : null;
const failReason = state === 'failed' ? job.failedReason : null;
return {
content: [{
type: 'text',
text: JSON.stringify({ job_id, state, result, fail_reason: failReason }),
}],
};
}
);
}
The client calls start_export once and gets a job_id. It then calls get_export_status at intervals until the state is completed or failed. No long-running blocking tool call, no transport timeout.
SQLite-backed queue for simpler deployments
If Redis is not in your stack, a SQLite-backed queue works well for single-process or low-throughput deployments:
// simple-queue.ts — SQLite-backed job queue using better-sqlite3
import Database from 'better-sqlite3';
export class SimpleQueue {
private db: Database.Database;
private poll: NodeJS.Timeout | null = null;
constructor(private handler: (jobId: string, data: unknown) => Promise<unknown>) {
this.db = new Database('./data.db');
this.db.exec(`
CREATE TABLE IF NOT EXISTS jobs (
id TEXT PRIMARY KEY,
queue TEXT NOT NULL,
data TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
result TEXT,
error TEXT,
attempts INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
updated_at INTEGER NOT NULL DEFAULT (unixepoch())
)
`);
}
enqueue(queue: string, data: unknown): string {
const id = crypto.randomUUID();
this.db.prepare(
'INSERT INTO jobs (id, queue, data) VALUES (?, ?, ?)'
).run(id, queue, JSON.stringify(data));
return id;
}
getStatus(id: string): { status: string; result?: unknown; error?: string } | null {
const row = this.db.prepare('SELECT status, result, error FROM jobs WHERE id = ?').get(id) as any;
if (!row) return null;
return { status: row.status, result: row.result ? JSON.parse(row.result) : undefined, error: row.error };
}
start(queue: string, intervalMs = 1000): void {
this.poll = setInterval(async () => {
const job = this.db.prepare(
'SELECT id, data FROM jobs WHERE queue = ? AND status = ? AND attempts < 3 LIMIT 1'
).get(queue, 'pending') as any;
if (!job) return;
this.db.prepare('UPDATE jobs SET status = ?, attempts = attempts + 1, updated_at = unixepoch() WHERE id = ?')
.run('active', job.id);
try {
const result = await this.handler(job.id, JSON.parse(job.data));
this.db.prepare('UPDATE jobs SET status = ?, result = ?, updated_at = unixepoch() WHERE id = ?')
.run('completed', JSON.stringify(result), job.id);
} catch (err: any) {
this.db.prepare('UPDATE jobs SET status = ?, error = ?, updated_at = unixepoch() WHERE id = ?')
.run('failed', err.message, job.id);
}
}, intervalMs);
}
stop(): void {
if (this.poll) clearInterval(this.poll);
}
}
SQLite with better-sqlite3 handles hundreds of jobs per second with no external infrastructure. The tradeoff vs. BullMQ: no multi-process consumer fan-out (the worker runs in the same Node.js process as the MCP server), no dashboard, and manual retry logic. For hobby MCP servers and small teams, this is often the right choice.
Dead-letter queues and error handling
Jobs that fail all retry attempts need a destination: either delete them (losing the data) or move them to a dead-letter queue (DLQ) for inspection. BullMQ moves exhausted jobs to a failed state automatically, preserving the failure reason. Add a monitor:
// Monitor failed jobs and alert
exportWorker.on('failed', (job, err) => {
if (job && job.attemptsMade >= (job.opts.attempts ?? 1)) {
// Job has exhausted all attempts — alert
deps.logger.error('job_dead_letter', {
jobId: job.id,
queue: 'exports',
failedReason: err.message,
data: job.data,
});
// Optionally: send to a webhook, PagerDuty, Slack, etc.
}
});
Tools should surface DLQ status to clients. A get_export_status call for a failed job returns state: 'failed' with fail_reason — the client can decide whether to retry by calling start_export again or surface the error to the user.
Monitoring queue health alongside MCP uptime
AliveMCP's standard probe confirms the MCP server is up and responding to initialize + tools/list. It cannot see whether queue consumers are processing jobs, whether the Redis connection is healthy, or whether the DLQ is filling up. Add a health_check tool that surfaces this:
server.tool('health_check', {}, async () => {
const checks = await Promise.allSettled([
// MCP server is up (trivially true — if this runs, the server is up)
// Queue: can we reach Redis?
exportQueue.client.ping().then(() => ({ name: 'queue_redis', ok: true })),
// Worker: is the worker connected?
Promise.resolve({ name: 'worker_running', ok: !exportWorker.closing }),
// DLQ depth: how many failed jobs?
exportQueue.getFailedCount().then(count => ({
name: 'dlq_depth', ok: count < 50, count
})),
// Active jobs: are consumers keeping up?
exportQueue.getActiveCount().then(active =>
exportQueue.getWaitingCount().then(waiting => ({
name: 'queue_depth', ok: waiting < 1000, active, waiting
}))
),
]);
const results = checks.map((c, i) =>
c.status === 'fulfilled' ? c.value : { name: `check_${i}`, ok: false, error: (c.reason as Error).message }
);
const allOk = results.every(r => r.ok);
return {
content: [{ type: 'text', text: JSON.stringify({ healthy: allOk, checks: results }) }],
isError: !allOk,
};
});
Configure a synthetic monitor (or AliveMCP's custom probe feature) to call health_check every few minutes. This gives you observability over the queue layer that the standard protocol probe can't provide.
Related questions
Should the BullMQ worker run in the same process as the MCP server?
For low-to-medium volume: yes, same process is fine. The worker runs concurrently via the event loop alongside the MCP request handler. The benefit is simpler deployment — one process, one container. The risk is that a CPU-intensive job blocks the event loop and delays MCP tool call responses. If your jobs are I/O-bound (fetching data, calling APIs), same-process is safe. If your jobs are CPU-bound (image processing, compression, ML inference), run the worker in a separate process or use Node.js worker threads.
Can clients receive job completion notifications via SSE instead of polling?
Yes. When a job completes, the worker can publish a notification event that the MCP server relays to the client via the SSE stream. This requires the worker and MCP server to share a pub/sub channel (Redis pub/sub or an EventEmitter in same-process mode). The MCP server listens for completion events and calls server.notification() to push the result to the client. This is more complex than polling but avoids repeated get_export_status calls.
How do I handle queue jobs across a load-balanced MCP cluster?
If you run multiple MCP server instances, the queue is shared (all instances enqueue to the same Redis queue), and the workers compete to process jobs (each job is picked up by exactly one worker). get_export_status reads from the shared queue, so any MCP instance can answer a status query regardless of which instance enqueued the job. The queue is the shared state that makes horizontal scaling work without sticky sessions for job status queries.
What's the difference between BullMQ and Temporal for MCP background jobs?
BullMQ is a Redis-backed job queue — simple, fast, no external infrastructure beyond Redis. Temporal is a workflow orchestration platform — durable execution across process restarts, multi-step workflows, activity retries with state preservation. For MCP servers, BullMQ covers 90% of use cases. Temporal is worth the operational complexity when jobs span multiple steps that each need individual retry semantics, or when jobs must survive process crashes mid-execution with state preserved.
Further reading
- MCP server dependency injection — injecting queue connections as shared deps rather than module scope
- MCP server scheduled tasks — cron jobs that enqueue work rather than executing directly
- MCP server error handling — isError patterns for queued job failures surfaced to clients
- MCP server load balancing — shared queue state that enables horizontal MCP server scaling
- MCP server observability — instrumenting queue depth and consumer lag alongside MCP metrics
- AliveMCP — uptime monitoring that confirms your MCP server is protocol-healthy, complementing internal queue health checks