Guide · Agent Orchestration
MCP server agent handoff
In multi-step workflows, an agent often transitions from one MCP server to another — a routing server hands off to a specialized processor, or a long-running task is checkpointed and resumed by a different server instance. Designing clean handoffs between MCP servers prevents context loss, avoids duplicate work, and keeps the orchestrating agent's context window manageable.
TL;DR
Agent handoffs between MCP servers are reliable when you serialize context into a structured handoff envelope (session ID, continuation token, accumulated context, next-tool hint) and checkpoint it to a durable store — SQLite or Redis — before the originating server returns. The receiving server looks up the checkpoint by session ID and resumes from the last known state. Pair handoffs with idempotency tokens so retried handoffs do not double-execute: the receiving server deduplicates on the token before processing. Instrument both the sending and receiving server with structured observability, expose a health_check tool on each, and configure AliveMCP to probe them independently — a handoff that lands on a server whose probe is failing is a context-loss event waiting to happen. For multi-agent coordination beyond pairwise handoffs, see MCP server multi-agent orchestration; for shared mutable state across servers, see MCP server shared state.
When agent handoffs happen
A handoff is any transition where agent execution shifts from one MCP server to a different one, with context that must survive the boundary. The most common triggers:
- Routing and specialization. A front-end routing server receives a broad user intent and dispatches to a specialized server — a code-execution server, a document-retrieval server, or a database-query server. The routing server holds context (user preferences, prior search terms, constraint flags) that the specialist needs to produce a relevant result.
- Load distribution. When a primary server is saturated or at capacity, excess sessions are handed off to an overflow instance. The overflow server must resume from the same conversation state; starting fresh would undo completed work and confuse the agent.
- Failover. A server crashes mid-task. The orchestrating agent's retry logic routes to a standby. Without a checkpoint, the retry repeats work; with a checkpoint it resumes from the last durable state.
- Geographic handoff. A session that starts on a US-West server is transferred to EU-West to reduce latency for a user who has moved network segments, or to comply with data-residency requirements.
- Staged pipelines. A long document-processing pipeline moves context from an extraction server to a summarization server to a storage server. Each stage hands off the accumulated results of the previous stage.
What all of these have in common: the receiving server cannot reconstruct the sending server's state from scratch. It depends on context that was accumulated across previous tool calls — context that is too large or too sensitive to fit in the agent's prompt, or that the orchestrating agent does not have direct access to.
Context serialization format
The handoff envelope is the contract between the sending server and the receiving server. Define it as a TypeScript interface and validate it with Zod on both sides. A minimal but complete envelope:
// handoff.ts — shared between sending and receiving server (publish to an internal npm package)
import { z } from 'zod';
export const HandoffEnvelopeSchema = z.object({
// Correlation
session_id: z.string().uuid(), // stable across the entire multi-server conversation
handoff_id: z.string().uuid(), // unique per handoff event; used for idempotency
idempotency_token: z.string().min(1), // opaque token; receiving server deduplicates on this
// Routing
source_server: z.string(), // "routing-server-v2" — for tracing
target_server: z.string(), // "code-exec-server-v1" — for validation
next_tool_hint: z.string().optional(), // tool the receiving server should call first
// Context payload
accumulated_context: z.record(z.unknown()), // arbitrary key/value context blob
continuation_token: z.string().optional(), // opaque cursor into a previous result set
// Integrity
created_at: z.string().datetime(),
ttl_seconds: z.number().int().positive().default(300), // receiving server refuses stale envelopes
});
export type HandoffEnvelope = z.infer<typeof HandoffEnvelopeSchema>;
Keep accumulated_context bounded. It is not a transcript — it is a distilled summary of what matters for the next stage: resolved entity IDs, extracted parameters, constraint flags, and decisions made. Raw message history belongs in the agent prompt or in a separate conversation-history store, not in the handoff envelope. A good rule of thumb: if the envelope exceeds 64 KB uncompressed, the context payload is too large and should be stored externally with a reference key in the envelope.
The continuation_token is an opaque cursor that lets the receiving server resume pagination or streaming from a previous result set without repeating the initial query. The next_tool_hint tells the receiving server which tool to call first, saving a round-trip where the agent would otherwise have to discover the right tool via tools/list.
Checkpoint-and-resume pattern
The sending server checkpoints its state to a durable store before returning the handoff to the orchestrating agent. The receiving server reads the checkpoint immediately on receiving the handoff call — before doing any work — so that a crash between checkpoint write and actual handoff processing is recoverable.
// checkpoint.ts — durable checkpoint store backed by SQLite (or swap for Redis)
import Database from 'better-sqlite3';
import { HandoffEnvelope } from './handoff.js';
const db = new Database(process.env.CHECKPOINT_DB_PATH ?? './checkpoints.db');
db.exec(`
CREATE TABLE IF NOT EXISTS handoff_checkpoints (
session_id TEXT NOT NULL,
handoff_id TEXT PRIMARY KEY,
idempotency_token TEXT NOT NULL UNIQUE,
envelope_json TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending | received | completed | failed
created_at TEXT NOT NULL,
received_at TEXT,
completed_at TEXT
);
CREATE INDEX IF NOT EXISTS idx_session ON handoff_checkpoints(session_id);
`);
const insertCheckpoint = db.prepare(`
INSERT INTO handoff_checkpoints (session_id, handoff_id, idempotency_token, envelope_json, created_at)
VALUES (@session_id, @handoff_id, @idempotency_token, @envelope_json, @created_at)
`);
const markReceived = db.prepare(`
UPDATE handoff_checkpoints SET status = 'received', received_at = @received_at
WHERE handoff_id = @handoff_id AND status = 'pending'
`);
const markCompleted = db.prepare(`
UPDATE handoff_checkpoints SET status = 'completed', completed_at = @completed_at
WHERE handoff_id = @handoff_id
`);
export function writeCheckpoint(envelope: HandoffEnvelope): void {
insertCheckpoint.run({
session_id: envelope.session_id,
handoff_id: envelope.handoff_id,
idempotency_token: envelope.idempotency_token,
envelope_json: JSON.stringify(envelope),
created_at: envelope.created_at,
});
}
export function getCheckpoint(handoffId: string): HandoffEnvelope | null {
const row = db.prepare(
'SELECT envelope_json FROM handoff_checkpoints WHERE handoff_id = ?'
).get(handoffId) as { envelope_json: string } | undefined;
if (!row) return null;
return JSON.parse(row.envelope_json) as HandoffEnvelope;
}
export function acknowledgeCheckpoint(handoffId: string): void {
markReceived.run({ handoff_id: handoffId, received_at: new Date().toISOString() });
}
export function completeCheckpoint(handoffId: string): void {
markCompleted.run({ handoff_id: handoffId, completed_at: new Date().toISOString() });
}
On the sending server, write the checkpoint synchronously before returning the handoff envelope to the orchestrating agent. If the checkpoint write fails, do not issue the handoff — the receiving server would arrive at a state it cannot reconstruct:
// On the sending server: checkpoint before handing off
server.tool(
'handoff_to_code_exec',
'Hand off the current session to the code execution server',
{
task_description: z.string(),
code_snippet: z.string().optional(),
},
async (args, { session }) => {
const envelope: HandoffEnvelope = {
session_id: session.id,
handoff_id: crypto.randomUUID(),
idempotency_token: crypto.randomUUID(),
source_server: 'routing-server-v2',
target_server: 'code-exec-server-v1',
next_tool_hint: 'execute_code',
accumulated_context: {
task_description: args.task_description,
code_snippet: args.code_snippet,
user_preferences: session.context.userPreferences,
prior_results: session.context.priorResults,
},
continuation_token: session.context.continuationToken,
created_at: new Date().toISOString(),
ttl_seconds: 300,
};
// Write checkpoint BEFORE returning to orchestrating agent
writeCheckpoint(envelope);
return {
content: [{
type: 'text',
text: JSON.stringify({
status: 'handoff_ready',
handoff_id: envelope.handoff_id,
target_server: envelope.target_server,
envelope,
}),
}],
};
}
);
On the receiving server, the first thing the resume_from_handoff tool does is acknowledge the checkpoint. This updates the checkpoint status from pending to received, giving the sending server (and your monitoring) visibility into whether the handoff landed. After the task completes, mark it completed. Checkpoints that remain pending for more than a few minutes indicate a handoff that never arrived — an alert worth firing.
Cross-server tool calls
Some handoffs are not full context transfers — they are mid-execution calls where one MCP server needs to invoke a tool on a second MCP server as a dependency, then incorporate the result into its own response. This is a cross-server tool call rather than a session handoff.
The calling server instantiates an MCP client pointing at the dependency server. Connection pooling matters here: creating a new MCP client per tool call is expensive. Use a module-scope client instance, reconnecting on failure:
// cross-server-client.ts — reusable MCP client for downstream server calls
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
let client: Client | null = null;
async function getDownstreamClient(): Promise<Client> {
if (client) return client;
const transport = new StreamableHTTPClientTransport(
new URL(process.env.CODE_EXEC_SERVER_URL!)
);
client = new Client({ name: 'routing-server', version: '1.0.0' });
await client.connect(transport);
// Reset on disconnect so the next call reconnects
client.onclose = () => { client = null; };
return client;
}
export async function callDownstreamTool(
toolName: string,
args: Record<string, unknown>
): Promise<unknown> {
const downstream = await getDownstreamClient();
const result = await downstream.callTool({ name: toolName, arguments: args });
// Propagate isError from the downstream server
if (result.isError) {
throw Object.assign(
new Error(`Downstream tool ${toolName} returned isError`),
{ downstreamResult: result }
);
}
return result.content;
}
Error propagation from the downstream server must be explicit. When the downstream tool returns isError: true, treat it as an exception in the calling server's tool handler — either by throwing (which the calling server can catch and wrap) or by surfacing the downstream error content directly to the orchestrating agent. Do not silently swallow isError: true from the downstream result; the agent needs to know the call failed so it can retry or escalate.
Add a circuit breaker around callDownstreamTool. If the downstream server is having an outage, you do not want every tool call on the calling server to hang for the full timeout. The circuit breaker opens after enough consecutive failures and returns a fast error until the downstream recovers. Pair this with retry logic with exponential backoff for transient failures.
Failover handoffs
A failover handoff occurs when a primary server crashes or becomes unhealthy and a standby server must take over mid-session. Unlike a planned handoff (where the sending server initiates the envelope), a failover handoff is triggered by the infrastructure layer — a load balancer, a health check failure, or the orchestrating agent detecting that its MCP connection has dropped.
The key requirement for failover handoffs is that checkpoints are written to a store that is accessible to the standby server. A local SQLite file on the primary server fails this requirement; a network-accessible store (Redis, a shared database, or object storage) is required:
// redis-checkpoint.ts — checkpoint store backed by Redis for cross-instance access
import { createClient, RedisClientType } from 'redis';
import { HandoffEnvelope } from './handoff.js';
let redis: RedisClientType;
export async function initRedisCheckpoints(): Promise<void> {
redis = createClient({ url: process.env.REDIS_URL });
redis.on('error', (err) => console.error({ event: 'redis_error', err }));
await redis.connect();
}
export async function writeCheckpointRedis(envelope: HandoffEnvelope): Promise<void> {
const key = `handoff:${envelope.session_id}:${envelope.handoff_id}`;
await redis.set(key, JSON.stringify(envelope), { EX: envelope.ttl_seconds });
// Also write the latest checkpoint for this session (for failover lookup by session_id)
const latestKey = `session_latest_handoff:${envelope.session_id}`;
await redis.set(latestKey, envelope.handoff_id, { EX: envelope.ttl_seconds });
}
export async function getLatestCheckpointForSession(
sessionId: string
): Promise<HandoffEnvelope | null> {
const latestKey = `session_latest_handoff:${sessionId}`;
const handoffId = await redis.get(latestKey);
if (!handoffId) return null;
const key = `handoff:${sessionId}:${handoffId}`;
const raw = await redis.get(key);
if (!raw) return null;
return JSON.parse(raw) as HandoffEnvelope;
}
The standby server's resume_session tool accepts a session_id, looks up the latest checkpoint from Redis, and resumes. The orchestrating agent retries with the same session_id after detecting the primary server failure — the standby transparently continues from where the primary left off.
Monitor the Redis checkpoint store as part of your overall observability stack. A checkpoint store that becomes unavailable silently degrades failover reliability — sessions appear healthy until a failover is needed and no checkpoint is found. Add a checkpoint-store health check to each server's health_check tool, and configure AliveMCP to probe it.
Preventing duplicate execution across handoffs
Network failures between the sending server and the receiving server can cause the orchestrating agent to retry a handoff that was already received and processed. Without deduplication, the receiving server executes the same work twice: duplicate database writes, duplicate API calls, duplicate charges.
The idempotency token in the handoff envelope is the key. The receiving server checks for the token in its deduplication store before processing. If the token already exists and status is completed, return the cached result. If status is received (in-flight), return a processing response and let the agent poll. Only if the token is absent should the server process the handoff:
// On the receiving server: resume tool with idempotency check
server.tool(
'resume_from_handoff',
'Resume a session from a handoff envelope issued by another MCP server',
{
envelope: HandoffEnvelopeSchema,
},
async (args) => {
const { envelope } = args;
// TTL check — refuse stale envelopes
const age = Date.now() - new Date(envelope.created_at).getTime();
if (age > envelope.ttl_seconds * 1000) {
return {
isError: true,
content: [{ type: 'text', text: JSON.stringify({
error: 'envelope_expired',
message: `Handoff envelope expired after ${envelope.ttl_seconds}s`,
}) }],
};
}
// Idempotency check
const existing = getCheckpoint(envelope.handoff_id);
if (existing) {
const status = getCheckpointStatus(envelope.handoff_id);
if (status === 'completed') {
return {
content: [{ type: 'text', text: JSON.stringify({
status: 'already_completed',
handoff_id: envelope.handoff_id,
message: 'This handoff was already processed. Result is idempotent.',
}) }],
};
}
if (status === 'received') {
return {
content: [{ type: 'text', text: JSON.stringify({
status: 'processing',
handoff_id: envelope.handoff_id,
message: 'Handoff is currently being processed. Poll again shortly.',
}) }],
};
}
}
// First time we see this handoff — acknowledge and process
acknowledgeCheckpoint(envelope.handoff_id);
try {
const result = await processHandoff(envelope);
completeCheckpoint(envelope.handoff_id);
return {
content: [{ type: 'text', text: JSON.stringify({ status: 'completed', result }) }],
};
} catch (err: any) {
return {
isError: true,
content: [{ type: 'text', text: JSON.stringify({
error: 'handoff_failed',
message: err.message,
handoff_id: envelope.handoff_id,
}) }],
};
}
}
);
For multi-step handoff pipelines, carry the idempotency token through the chain. Each hop generates a new handoff_id (unique per hop) but should derive its idempotency_token deterministically from the original token plus the hop index — for example, HMAC(original_token, hop_index). This ensures that replaying the entire pipeline from the start produces idempotent behavior at every hop, not just the first.
Related: see MCP server message queue for idempotency patterns when handoffs trigger background jobs that must not run twice.
Monitoring handoff reliability with AliveMCP
Standard uptime monitoring — confirming that a server accepts connections and responds to initialize — does not tell you whether handoffs are working. A server can be perfectly healthy in isolation but fail to receive handoffs because its checkpoint store is down, its idempotency table is corrupt, or the receiving server's resume_from_handoff tool is throwing unhandled errors on all inputs.
The monitoring approach for handoff reliability has two parts:
- Probe both servers independently. Configure an AliveMCP monitor for each MCP server that participates in the handoff chain. If the receiving server fails its probe, alerts fire before the next real handoff lands on a broken instance.
- Expose handoff health in the health_check tool. The
health_checktool on each server should report: checkpoint store connectivity, count of pending checkpoints older than 5 minutes (a staleness signal), and the last successful handoff timestamp.
// health_check tool on the receiving server
server.tool(
'health_check',
'Report server health including handoff checkpoint store status',
{},
async () => {
const checkpointStoreOk = await pingCheckpointStore();
const staleCheckpoints = db.prepare(`
SELECT COUNT(*) as count FROM handoff_checkpoints
WHERE status = 'pending'
AND created_at < datetime('now', '-5 minutes')
`).get() as { count: number };
const lastCompleted = db.prepare(`
SELECT completed_at FROM handoff_checkpoints
WHERE status = 'completed'
ORDER BY completed_at DESC LIMIT 1
`).get() as { completed_at: string } | undefined;
const degraded = !checkpointStoreOk || staleCheckpoints.count > 0;
return {
isError: degraded,
content: [{
type: 'text',
text: JSON.stringify({
status: degraded ? 'degraded' : 'healthy',
checkpoint_store: checkpointStoreOk ? 'ok' : 'unreachable',
stale_pending_checkpoints: staleCheckpoints.count,
last_completed_handoff: lastCompleted?.completed_at ?? null,
timestamp: new Date().toISOString(),
}, null, 2),
}],
};
}
);
Configure AliveMCP to call health_check on both the sending and receiving server every minute. Set alert thresholds so that stale pending checkpoints (handoffs that were written but never acknowledged) trigger a page — they indicate a split-brain scenario where the sending server thinks it handed off but the receiving server never processed it. Catching this within minutes rather than hours is the difference between a recoverable incident and a permanent context loss.
For latency monitoring, add a handoff_latency_ms metric to your metrics pipeline: record the time from created_at (in the envelope) to received_at (in the checkpoint). Spikes in handoff latency often precede handoff failures — the receiving server is overloaded or the network between the servers is degraded.
See also: MCP server structured logging for logging handoff events as structured JSON so they can be searched and correlated across both servers in a centralized log aggregation system.
Related questions
Should the handoff envelope travel through the agent's context or through a side channel?
Through the agent's context when the envelope is small (under a few KB). The agent receives the envelope from the sending server's tool call result and passes it to the receiving server as a tool argument. This is the simplest architecture: no side-channel infrastructure required, the agent mediates the transfer. Use a side channel (a shared store keyed by handoff_id) when the context payload is large, sensitive (credentials the agent should not see), or when you want the orchestrating agent's context window to remain clean. In the side-channel pattern, only the handoff_id travels through the agent; the receiving server fetches the full envelope from the store directly.
What happens if the receiving server is down when the handoff arrives?
The orchestrating agent receives a connection error or an isError: true response from the receiving server. If the checkpoint has already been written to the durable store, the handoff can be retried — the idempotency token ensures the work is not duplicated when the receiving server comes back up. If the checkpoint was not written before the attempt (because the sending server itself failed), the session state is lost. This is why writing the checkpoint before returning the handoff envelope is a hard requirement, not an optimization.
How do I test handoff correctness in CI?
Spin up both the sending and receiving server as separate processes in your integration test environment. Issue a handoff via the sending server's tool, verify the checkpoint was written to the shared store, then call the receiving server's resume tool with the returned envelope. Assert that the accumulated context matches what was written, that a second call with the same idempotency token returns already_completed, and that a call with an expired TTL returns envelope_expired. See MCP server integration testing for test harness patterns that make this straightforward.
Further reading
- MCP server multi-agent orchestration — coordinating multiple agents across servers
- MCP server shared state — mutable state that multiple server instances can read and write
- MCP server circuit breaker — fast-failing cross-server calls when a dependency is down
- MCP server retry logic — backoff and jitter for transient handoff failures
- MCP server message queue — queuing handoff work for async processing
- MCP server structured logging — correlating handoff events across server boundaries
- MCP server observability — tracing requests across MCP server hops
- AliveMCP — monitor every server in your handoff chain independently