Guide · Multi-Agent

MCP server multi-agent topologies

The MCP ecosystem is increasingly used for multi-agent orchestration — an orchestrator agent dispatching sub-agents that each call the same MCP server in parallel. Designing the MCP server to handle this gracefully requires understanding session isolation, connection pooling, and contention-free state management.

TL;DR

Multi-agent MCP topologies come in two flavors: orchestrator-dispatcher (one controller, N sub-agents) and peer swarms (agents coordinate via shared state). Either way, your MCP server must be stateless per-call — no in-memory state shared across sessions. Use connection pooling sized to your expected agent fan-out, keep tool handlers idempotent, and emit per-session metrics so AliveMCP can surface concurrent call spikes before they become p99 latency regressions. Fan-out via Promise.all on the orchestrator side; fan-in via reduce with explicit conflict resolution. See also: shared state patterns and rate limiting for per-session throttling.

Orchestrator-dispatcher vs. swarm topologies

The two dominant multi-agent patterns in MCP deployments look similar from the MCP server's perspective but have very different coordination semantics.

Orchestrator-dispatcher topology has a single controller agent that receives the top-level task, decomposes it into subtasks, dispatches each subtask to a sub-agent, waits for results, and synthesizes the final output. The orchestrator itself may or may not call MCP tools directly — often it only issues instructions and receives summaries. Sub-agents are the heavy callers. This topology is predictable: the orchestrator controls parallelism, sets deadlines, and decides when to retry a failed sub-agent. It is the right choice when tasks have a clear dependency graph and the orchestrator can express that graph.

Swarm topology (sometimes called a peer or mesh topology) has no single coordinator. Multiple agents observe a shared task queue, each picks up work items, calls MCP tools, writes results to a shared store, and publishes completion events for others to consume. Swarms are self-organizing and resilient to individual agent failure — if one agent dies, another picks up its pending work item. They are the right choice for embarrassingly parallel workloads (bulk document processing, web crawling, data transformation pipelines) where no single result depends on a specific agent's identity.

Property	Orchestrator-dispatcher	Swarm
Coordination model	Hierarchical (top-down)	Peer-to-peer (emergent)
Task dependency support	First-class (orchestrator enforces order)	Limited (requires external DAG scheduler)
Fault tolerance	Orchestrator is a single point of failure	Any agent can take over a failed peer's work
MCP server load pattern	Burst when orchestrator dispatches, then parallel sustained load	Sustained parallel load, fewer spikes
Best for	Complex reasoning tasks, conditional sub-tasks	Bulk processing, parallel data pipelines

In practice, most production systems start with the orchestrator-dispatcher pattern because it is easier to debug — every tool call can be traced to a specific sub-agent assignment from the orchestrator's decision log.

Session isolation in multi-agent calls

Every MCP client-server connection is a separate session with its own session ID. When an orchestrator spawns N sub-agents that each connect to the same MCP server, the server receives N distinct sessions — not N messages on a shared session. This is a critical design property: it means the MCP protocol itself gives you isolation for free, as long as your server does not break it by sharing state across sessions at the application layer.

Each session gets its own:

Connection — TCP connection (for HTTP/SSE transport) or WebSocket connection
Request ID namespace — request IDs are unique within a session; two sessions can have requests with the same numeric ID without collision
Initialization state — the initialize handshake is per-session; capability negotiation results are not shared

Session isolation means sub-agents cannot accidentally read each other's in-progress tool call state through the MCP protocol itself. The danger zone is your application code: a tool handler that reads from or writes to a module-level variable, a singleton cache, or an in-process Map is sharing state across all sessions handled by that process. Under single-agent load this is invisible. Under multi-agent parallel load it manifests as race conditions.

Track active session count as a metric. A spike in concurrent sessions is often the first signal that an orchestrator has ramped up its fan-out — you want to know about this before it saturates your connection pool.

Shared MCP server design for parallel agents

An MCP server designed for single-agent use often has implicit assumptions about being the only caller. When you lift it into a multi-agent deployment, those assumptions break. The three most common failure modes:

In-memory state mutation. A tool handler that modifies a module-level object — for example, a cached schema or an in-process rate limit counter — will produce torn reads when two sessions execute the handler concurrently. The fix: make every tool handler a pure function of its inputs plus external storage. Move shared state to Redis, PostgreSQL, or SQLite (with WAL mode). See the shared state guide for patterns.

Non-atomic read-modify-write. A handler that reads a record, modifies a field, and writes it back with three separate operations will lose updates when two agents run the same sequence concurrently. The fix: use database transactions or atomic operations (Redis GETSET, SQL UPDATE ... WHERE version = $expected_version).

Unbounded connection acquisition. If each of N sub-agents concurrently calls a tool that opens a database connection, and your pool has fewer than N connections, some agents will queue waiting for a connection. If the queue has no ceiling, memory grows without bound and p99 latency climbs linearly with fan-out. The fix: size the pool for expected max fan-out and reject rather than queue when the pool is exhausted.

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
import { Pool } from 'pg';

// Pool sized for expected concurrent sub-agent count
// If you fan out to 20 sub-agents, each calling 2 concurrent tools,
// you need a pool of at least 40 connections
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 40,          // matches peak fan-out
  idleTimeoutMillis: 10_000,
  connectionTimeoutMillis: 3_000,  // fail fast rather than queue indefinitely
});

const server = new McpServer({ name: 'shared-store', version: '1.0.0' });

// Stateless handler: all state in the database, none in process memory
server.tool(
  'get_task_result',
  'Read the result of a completed sub-task by ID',
  { taskId: z.string().uuid() },
  async ({ taskId }) => {
    const client = await pool.connect();
    try {
      const { rows } = await client.query(
        'SELECT result, status, completed_at FROM tasks WHERE id = $1',
        [taskId]
      );
      if (rows.length === 0) {
        return { content: [{ type: 'text', text: JSON.stringify({ found: false }) }] };
      }
      return { content: [{ type: 'text', text: JSON.stringify(rows[0]) }] };
    } finally {
      client.release();
    }
  }
);

The key discipline: pool.connect() is called inside the handler, not at module initialization into a shared variable. Each invocation acquires and releases its own connection. The pool manages concurrency.

Fan-out tool dispatch patterns

On the orchestrator side, dispatching sub-agents in parallel rather than sequentially is the primary performance lever. The wall-clock time of an N-way parallel fan-out is the time of the slowest sub-agent — versus N times the average time for sequential dispatch. For I/O-bound workloads, the speedup is close to N.

In a TypeScript orchestrator using the MCP SDK, fan-out via Promise.all across multiple client connections:

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';

interface SubTaskResult {
  agentIndex: number;
  toolResult: unknown;
  durationMs: number;
}

async function fanOut(
  serverUrl: string,
  tasks: Array<{ tool: string; args: Record<string, unknown> }>
): Promise<SubTaskResult[]> {
  // Create one MCP client per sub-task (each gets its own session)
  const clients = await Promise.all(
    tasks.map(async (_task, i) => {
      const transport = new SSEClientTransport(new URL(serverUrl));
      const client = new Client(
        { name: `orchestrator-sub-agent-${i}`, version: '1.0.0' },
        { capabilities: {} }
      );
      await client.connect(transport);
      return client;
    })
  );

  try {
    // Fan out: all tool calls fire simultaneously
    const results = await Promise.all(
      tasks.map(async (task, i) => {
        const start = Date.now();
        const result = await clients[i].callTool({
          name: task.tool,
          arguments: task.args,
        });
        return {
          agentIndex: i,
          toolResult: result,
          durationMs: Date.now() - start,
        } satisfies SubTaskResult;
      })
    );
    return results;
  } finally {
    // Always close sub-agent sessions — don't leak connections
    await Promise.allSettled(clients.map(c => c.close()));
  }
}

// Usage: orchestrator dispatches 10 parallel search tasks
const searchTasks = queries.map(q => ({
  tool: 'search_records',
  args: { query: q, limit: 20 },
}));

const results = await fanOut('https://mcp.example.com/sse', searchTasks);

Two important details: First, create one Client per task, not one shared client. Multiplexing tool calls over a single session means request IDs can collide and the server cannot distinguish which call belongs to which sub-task. Second, always close connections in a finally block — leaked MCP sessions hold server-side resources (connection pool slots, SSE response streams) until the server's idle timeout fires.

For large fan-out (>50 sub-agents), add a concurrency limit on the orchestrator side to avoid overwhelming the MCP server. Use p-limit to cap the number of simultaneously active sessions:

import pLimit from 'p-limit';

const limit = pLimit(20); // max 20 concurrent sub-agent sessions

const results = await Promise.all(
  tasks.map((task, i) =>
    limit(async () => {
      const transport = new SSEClientTransport(new URL(serverUrl));
      const client = new Client(
        { name: `sub-agent-${i}`, version: '1.0.0' },
        { capabilities: {} }
      );
      await client.connect(transport);
      try {
        const start = Date.now();
        const result = await client.callTool({ name: task.tool, arguments: task.args });
        return { agentIndex: i, toolResult: result, durationMs: Date.now() - start };
      } finally {
        await client.close();
      }
    })
  )
);

Fan-in result aggregation

Fan-in is the mirror of fan-out: collecting results from N parallel sub-agents and reducing them to a single output for the orchestrator's context. The reduce step is where subtle bugs live — particularly when two sub-agents produce conflicting results for the same resource.

For commutative aggregations (sums, unions, max/min), fan-in is straightforward:

interface SearchResult {
  id: string;
  score: number;
  content: string;
}

interface FanOutResult {
  toolResult: { content: Array<{ type: string; text: string }> };
}

function aggregateSearchResults(fanOutResults: FanOutResult[]): SearchResult[] {
  // Collect all results from all sub-agents
  const allResults: SearchResult[] = fanOutResults.flatMap(r => {
    const text = r.toolResult.content
      .filter(c => c.type === 'text')
      .map(c => c.text)
      .join('');
    return JSON.parse(text) as SearchResult[];
  });

  // Deduplicate by ID (last-writer wins if scores differ — use highest score)
  const byId = new Map<string, SearchResult>();
  for (const result of allResults) {
    const existing = byId.get(result.id);
    if (!existing || result.score > existing.score) {
      byId.set(result.id, result);
    }
  }

  // Return sorted by score, top 50
  return [...byId.values()]
    .sort((a, b) => b.score - a.score)
    .slice(0, 50);
}

For write operations where two sub-agents may have mutated the same record, conflict resolution requires a strategy decision made at design time. Common options:

Last-write-wins — the sub-agent with the highest timestamp (or highest version number) wins. Simple but can discard valid work.
Merge functions — define an associative, commutative merge function per field (e.g., take max of numeric fields, union of set fields). Requires domain knowledge but produces correct results for CRDT-like data.
Conflict detection + human escalation — detect when two agents wrote different values to the same field and route the conflict to a review queue rather than resolving automatically. Correct for high-stakes data; adds latency.
Pessimistic locking before write — have sub-agents acquire a distributed lock before reading the record they intend to mutate. Eliminates conflicts but serializes writes, reducing parallelism.

The right strategy depends on the cost of a wrong answer versus the cost of reduced throughput. Read-only fan-out (gathering information, summarizing) has no write conflict problem at all — it is the safest workload to parallelize.

Connection pooling for orchestrators

A multi-agent orchestrator that fans out to N sub-agents generates N simultaneous MCP sessions. Each session requires a server-side connection (TCP/SSE or WebSocket), and if your tool handlers query a database, each concurrent call competes for a connection pool slot. Size your infrastructure for the peak fan-out you intend to run.

Key sizing rules:

Database pool: max_connections ≥ (max_concurrent_sessions × avg_concurrent_queries_per_session). For a fan-out of 20 sub-agents each making one DB query per tool call, a pool of 20–25 connections is sufficient. Add 20% headroom for health checks and background jobs.
MCP server process count: if you run multiple MCP server instances behind a load balancer, distribute the sub-agent sessions evenly. An orchestrator that always connects to the same instance benefits from warm connections but creates a hot-spot. Use load balancing with least-connections routing.
SSE connection limits: Node.js can handle thousands of concurrent SSE connections, but each open SSE response holds a file descriptor. Check ulimit -n on your server — default is 1024 on many Linux distributions. Set it to at least 10× your expected peak concurrent sessions.

# /etc/systemd/system/mcp-server.service
[Service]
# Allow up to 65535 open file descriptors
# Each SSE session uses one; add headroom for database connections, logs, etc.
LimitNOFILE=65535

# Match Node.js memory to your expected peak load
Environment="NODE_OPTIONS=--max-old-space-size=2048"

# Run multiple instances if fan-out exceeds single-process capacity
# Pair with a load balancer (nginx, Caddy) for session distribution
ExecStart=/usr/bin/node dist/server.js

If you find sub-agent sessions timing out on connection establishment, the first thing to check is whether the database pool's connectionTimeoutMillis is shorter than your MCP request timeout. A pool timeout causes the tool handler to throw, which the MCP server returns as an error, which the sub-agent receives — but the orchestrator's fan-out logic may not distinguish a pool timeout from a genuine tool error. Log pool timeout errors explicitly with a distinct error code so you can alert on them separately from business logic errors.

See the connection pooling guide for detailed pool sizing formulas and health check patterns.

Monitoring multi-agent workloads with AliveMCP

Multi-agent deployments exhibit load patterns that look very different from single-agent traffic. An orchestrator that fans out to 20 sub-agents generates a burst of 20 simultaneous connections within milliseconds, followed by sustained parallel load for the duration of the task, followed by an abrupt drop to zero when all sub-agents disconnect. Standard uptime monitors that probe every minute will miss the connection burst entirely and may not catch a p99 regression that only appears under parallel load.

Instrument your MCP server to emit these metrics for multi-agent workloads:

Concurrent session count — gauge: how many MCP sessions are active right now. Alert if this exceeds your expected max fan-out (indicates a runaway orchestrator or a session leak).
Concurrent tool calls — gauge: how many tool handlers are executing simultaneously. The ceiling is your semaphore limit; alert when consistently near the ceiling.
p99 tool call latency under parallel load — histogram: latency at the 99th percentile degrades first under parallel load, before p50 is noticeably affected. An SLO of p99 < 2s under 20 concurrent sessions is a reasonable starting point.
Session duration distribution — histogram: orchestrator sub-agent sessions should be short-lived (seconds to minutes). Sessions that stay open for hours indicate a client-side bug or a hung tool call.
Fan-in conflict rate — counter: if your aggregation layer detects write conflicts, emit a counter per conflict type. A rising conflict rate under high fan-out indicates the sub-agent tasks are not properly partitioned.

AliveMCP probes your MCP server from outside your network on a regular schedule, catching availability regressions that internal metrics miss — for example, a process that is alive but has exhausted its connection pool and is returning 503 for all new sessions. Pair external probing with the internal session count and p99 latency metrics above for full observability of your multi-agent deployment.

Configure alerts to fire when:

External probe returns non-200 for two consecutive checks (server-level failure)
External probe latency exceeds 3× the baseline (overload indicator)
Internal concurrent session count exceeds expected max fan-out + 20%
Internal p99 latency exceeds SLO threshold during a fan-out window

See also: MCP server observability, MCP server metrics, and structured logging for building a complete monitoring stack for multi-agent deployments.