Guide · Streaming
MCP server streaming
MCP tool calls are synchronous by design — the client sends a tools/call request and waits for the response. For tool calls that take more than a few seconds, this creates a dead wait: the client has no signal that the server is working, and long HTTP connections through load balancers may time out before the result arrives. The MCP protocol addresses this with progress notifications — unsolicited server-to-client messages sent during a long tool call to report intermediate state. The StreamableHTTP transport delivers these notifications over Server-Sent Events (SSE), keeping the connection alive and giving clients real-time progress updates without changing the tool call's synchronous contract.
TL;DR
Use server.notification() to send progress updates during long tool calls. The client receives these as SSE events before the final tool result arrives. Set the progressToken in the tool call params if the client wants to associate notifications with a specific call. For tools that stream large outputs, return the result in multiple content array items. AliveMCP's probe sends an initialize request and then tools/list — neither uses streaming, so streaming failures are invisible to probe monitoring. Monitor streaming tool calls with structured logs (duration_ms, progress_notifications_sent).
MCP progress notifications
The MCP protocol defines a notifications/progress message type. A tool handler sends progress notifications by calling the server's notification method on the active request context. The client receives these notifications out-of-band over the SSE stream while waiting for the tool result:
// server.ts — progress notification in a long-running tool handler
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const server = new McpServer({ name: 'my-server', version: '1.0.0' });
server.tool(
'analyze_repository',
'Analyze a GitHub repository and return a summary report',
{
repo_url: z.string().url(),
// progressToken is a standard MCP field — the SDK passes it automatically
// if the client includes it in the request. No need to declare it here.
},
async (args, extra) => {
// extra.progressToken is set if the client requested progress notifications
const token = extra?.progressToken;
async function progress(message: string, percent: number) {
if (token !== undefined) {
await server.notification({
method: 'notifications/progress',
params: {
progressToken: token,
progress: percent,
total: 100,
message,
},
});
}
}
await progress('Fetching repository metadata...', 5);
const metadata = await fetchRepoMetadata(args.repo_url);
await progress('Cloning repository...', 20);
const files = await cloneAndListFiles(args.repo_url);
await progress(`Analyzing ${files.length} files...`, 40);
const analysis = await analyzeFiles(files, async (i) => {
await progress(`Analyzing file ${i + 1} of ${files.length}`, 40 + (i / files.length) * 50);
});
await progress('Generating report...', 95);
const report = await generateReport(metadata, analysis);
return {
content: [{ type: 'text', text: report }],
};
}
);
The progressToken is an opaque identifier the client sends in the tool call request. If the client does not send a token, skip sending notifications — the server has no way to associate them with the correct call. The SDK provides the token via the second argument to the tool handler (extra.progressToken).
StreamableHTTP transport and SSE configuration
Progress notifications travel over the SSE stream established by the StreamableHTTP transport. The transport opens a persistent HTTP connection in SSE mode for the duration of the session. Make sure your infrastructure does not have an aggressive HTTP timeout that kills the SSE connection before long tool calls complete:
| Layer | Setting to check | Recommended value |
|---|---|---|
| Express / Node.js HTTP server | server.timeout | 0 (disable) or match your max tool-call duration |
| Caddy reverse proxy | flush_interval | -1 (immediate flush for SSE) |
| nginx reverse proxy | proxy_read_timeout | Longer than max tool-call duration (e.g., 120s) |
| Kubernetes Ingress (nginx) | nginx.ingress.kubernetes.io/proxy-read-timeout | "120" annotation on the Ingress resource |
| Fly.io | No extra config needed | Fly's proxy handles SSE correctly by default |
| Cloudflare (proxied) | Response streaming timeout | 100 seconds max on free/pro; use Workers for longer streams |
// Caddy snippet for SSE-compatible MCP reverse proxy
reverse_proxy localhost:3001 {
flush_interval -1 # Send each SSE event immediately, no buffering
header_up Host {upstream_hostport}
}
The most common streaming bug in production: progress notifications fire correctly in development (no reverse proxy) but stop at the load balancer in production because the proxy buffers the SSE stream until the connection closes. If progress notifications never reach the client, the first thing to check is proxy response buffering.
Chunked content for large tool results
For tools that return large amounts of data (search results, log dumps, generated documents), the content array in the tool result can contain multiple items. Clients receive the entire result as a single response — this is not streaming delivery, but it allows structured segmentation of large outputs:
server.tool(
'search_logs',
'Search application logs for a query',
{ query: z.string(), limit: z.number().int().min(1).max(1000).default(100) },
async (args) => {
const rows = await db('logs')
.where('message', 'ilike', `%${args.query}%`)
.orderBy('timestamp', 'desc')
.limit(args.limit);
if (rows.length === 0) {
return { content: [{ type: 'text', text: 'No results found.' }] };
}
// Return metadata and results as separate content items
// so the client can parse them independently
return {
content: [
{
type: 'text',
text: `Found ${rows.length} results for "${args.query}"`,
},
{
type: 'text',
text: rows.map(r => `[${r.timestamp}] ${r.level}: ${r.message}`).join('\n'),
},
],
};
}
);
If your tool result exceeds the client's context window (common for log dumps), paginate rather than sending one giant result. Add cursor and limit parameters to the tool schema and return a next_cursor in the result so the client can fetch subsequent pages.
Streaming from upstream LLM APIs
A common pattern is an MCP tool that calls a language model API and streams the response back to the MCP client. The challenge: the upstream LLM stream arrives incrementally, but the MCP tool result is a single response. The options:
- Buffer + return — accumulate the full LLM response, then return it as a tool result. Simple, but the client waits for the entire response. Best for short outputs.
- Progress notifications with full result — send progress notifications with partial LLM output, then return the complete accumulated text as the final tool result. The client gets streaming feedback and a complete final result. Best for longer outputs (paragraphs, summaries).
- Paginated follow-up tools — return a cursor pointing to a stored partial result and expose a
continue_generationtool the client can call to fetch the next chunk. Complex but avoids per-session state on long generations.
// Progress-notification pattern for LLM streaming
server.tool(
'generate_text',
'Generate text using an LLM, streaming progress to the client',
{ prompt: z.string().min(1) },
async (args, extra) => {
const token = extra?.progressToken;
let accumulated = '';
let chunkCount = 0;
// Stream from LLM API
const stream = await llmClient.stream({ prompt: args.prompt });
for await (const chunk of stream) {
accumulated += chunk.text;
chunkCount++;
// Send a progress notification every 5 chunks to avoid flooding
if (token !== undefined && chunkCount % 5 === 0) {
await server.notification({
method: 'notifications/progress',
params: { progressToken: token, progress: -1, total: -1, message: accumulated },
});
}
}
return { content: [{ type: 'text', text: accumulated }] };
}
);
Monitoring streaming tool calls
AliveMCP's probe sends initialize + tools/list — neither uses streaming. Streaming failures are invisible to uptime probes. Monitor streaming tool calls with structured logs:
// Wrap tool handlers with streaming metrics
function withStreamingMetrics<T>(toolName: string, handler: () => Promise<T>): Promise<T> {
const start = Date.now();
return handler().then(
result => {
console.info({ event: 'tool_call_complete', tool: toolName, duration_ms: Date.now() - start });
return result;
},
err => {
console.error({ event: 'tool_call_error', tool: toolName, duration_ms: Date.now() - start, error: err.message });
throw err;
}
);
}
Alert on: (1) P95 tool-call duration > your target SLO; (2) streaming connections that stay open longer than max_tool_duration × 1.5 (indicates a hung stream); (3) progress notification count per tool call (a sudden drop indicates the SSE stream is being buffered by a proxy).
Related questions
Do all MCP clients support progress notifications?
Not all clients consume notifications/progress. Clients that do not support it simply ignore the notifications — they do not cause errors. However, clients that do not support progress notifications also typically do not send a progressToken in tool call requests. Check for the presence of progressToken before sending notifications, which is what the pattern above does. Claude Desktop and most SDK-based clients support progress notifications. Custom clients may not implement the notification handler.
Does streaming affect AliveMCP monitoring?
AliveMCP probes measure the time from sending the initialize request to receiving the complete initialize response, and separately measures tools/list response time. Neither involves streaming tool calls. If your streaming tools affect server performance (high CPU during generation, connection limit pressure), the effect will show up as increased initialize latency in probe metrics — an indirect signal that the server is under load.
How do I set a timeout on a streaming tool call?
Wrap the stream consumption in a race against a timeout signal: AbortSignal.timeout(MAX_DURATION_MS). Pass the signal to the upstream API call so the upstream connection is also aborted. When the timeout fires, return isError: true with a message like "Generation timed out after N seconds — please retry with a shorter prompt." Never let a streaming tool call hang indefinitely — it holds a session open and may exhaust server resources.
Further reading
- MCP server SDK — StreamableHTTPServerTransport and session lifecycle
- MCP server error handling — isError for streaming tool timeouts
- MCP server webhook — async result delivery as an alternative to streaming for very long jobs
- MCP server logging — structured metrics for streaming tool-call durations
- MCP server load testing — measuring session ceiling with concurrent streaming tools
- MCP server observability — tracing streaming tool calls end-to-end
- AliveMCP — uptime monitoring for MCP servers with streaming tool support