Guide · Streaming

MCP server streaming

MCP tool calls are synchronous by design — the client sends a tools/call request and waits for the response. For tool calls that take more than a few seconds, this creates a dead wait: the client has no signal that the server is working, and long HTTP connections through load balancers may time out before the result arrives. The MCP protocol addresses this with progress notifications — unsolicited server-to-client messages sent during a long tool call to report intermediate state. The StreamableHTTP transport delivers these notifications over Server-Sent Events (SSE), keeping the connection alive and giving clients real-time progress updates without changing the tool call's synchronous contract.

TL;DR

Use server.notification() to send progress updates during long tool calls. The client receives these as SSE events before the final tool result arrives. Set the progressToken in the tool call params if the client wants to associate notifications with a specific call. For tools that stream large outputs, return the result in multiple content array items. AliveMCP's probe sends an initialize request and then tools/list — neither uses streaming, so streaming failures are invisible to probe monitoring. Monitor streaming tool calls with structured logs (duration_ms, progress_notifications_sent).

MCP progress notifications

The MCP protocol defines a notifications/progress message type. A tool handler sends progress notifications by calling the server's notification method on the active request context. The client receives these notifications out-of-band over the SSE stream while waiting for the tool result:

// server.ts — progress notification in a long-running tool handler
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({ name: 'my-server', version: '1.0.0' });

server.tool(
  'analyze_repository',
  'Analyze a GitHub repository and return a summary report',
  {
    repo_url: z.string().url(),
    // progressToken is a standard MCP field — the SDK passes it automatically
    // if the client includes it in the request. No need to declare it here.
  },
  async (args, extra) => {
    // extra.progressToken is set if the client requested progress notifications
    const token = extra?.progressToken;

    async function progress(message: string, percent: number) {
      if (token !== undefined) {
        await server.notification({
          method: 'notifications/progress',
          params: {
            progressToken: token,
            progress: percent,
            total: 100,
            message,
          },
        });
      }
    }

    await progress('Fetching repository metadata...', 5);
    const metadata = await fetchRepoMetadata(args.repo_url);

    await progress('Cloning repository...', 20);
    const files = await cloneAndListFiles(args.repo_url);

    await progress(`Analyzing ${files.length} files...`, 40);
    const analysis = await analyzeFiles(files, async (i) => {
      await progress(`Analyzing file ${i + 1} of ${files.length}`, 40 + (i / files.length) * 50);
    });

    await progress('Generating report...', 95);
    const report = await generateReport(metadata, analysis);

    return {
      content: [{ type: 'text', text: report }],
    };
  }
);

The progressToken is an opaque identifier the client sends in the tool call request. If the client does not send a token, skip sending notifications — the server has no way to associate them with the correct call. The SDK provides the token via the second argument to the tool handler (extra.progressToken).

StreamableHTTP transport and SSE configuration

Progress notifications travel over the SSE stream established by the StreamableHTTP transport. The transport opens a persistent HTTP connection in SSE mode for the duration of the session. Make sure your infrastructure does not have an aggressive HTTP timeout that kills the SSE connection before long tool calls complete:

Layer	Setting to check	Recommended value
Express / Node.js HTTP server	`server.timeout`	`0` (disable) or match your max tool-call duration
Caddy reverse proxy	`flush_interval`	`-1` (immediate flush for SSE)
nginx reverse proxy	`proxy_read_timeout`	Longer than max tool-call duration (e.g., `120s`)
Kubernetes Ingress (nginx)	`nginx.ingress.kubernetes.io/proxy-read-timeout`	`"120"` annotation on the Ingress resource
Fly.io	No extra config needed	Fly's proxy handles SSE correctly by default
Cloudflare (proxied)	Response streaming timeout	100 seconds max on free/pro; use Workers for longer streams

// Caddy snippet for SSE-compatible MCP reverse proxy
reverse_proxy localhost:3001 {
    flush_interval -1   # Send each SSE event immediately, no buffering
    header_up Host {upstream_hostport}
}

The most common streaming bug in production: progress notifications fire correctly in development (no reverse proxy) but stop at the load balancer in production because the proxy buffers the SSE stream until the connection closes. If progress notifications never reach the client, the first thing to check is proxy response buffering.

Chunked content for large tool results

For tools that return large amounts of data (search results, log dumps, generated documents), the content array in the tool result can contain multiple items. Clients receive the entire result as a single response — this is not streaming delivery, but it allows structured segmentation of large outputs:

server.tool(
  'search_logs',
  'Search application logs for a query',
  { query: z.string(), limit: z.number().int().min(1).max(1000).default(100) },
  async (args) => {
    const rows = await db('logs')
      .where('message', 'ilike', `%${args.query}%`)
      .orderBy('timestamp', 'desc')
      .limit(args.limit);

    if (rows.length === 0) {
      return { content: [{ type: 'text', text: 'No results found.' }] };
    }

    // Return metadata and results as separate content items
    // so the client can parse them independently
    return {
      content: [
        {
          type: 'text',
          text: `Found ${rows.length} results for "${args.query}"`,
        },
        {
          type: 'text',
          text: rows.map(r => `[${r.timestamp}] ${r.level}: ${r.message}`).join('\n'),
        },
      ],
    };
  }
);

If your tool result exceeds the client's context window (common for log dumps), paginate rather than sending one giant result. Add cursor and limit parameters to the tool schema and return a next_cursor in the result so the client can fetch subsequent pages.

Streaming from upstream LLM APIs

A common pattern is an MCP tool that calls a language model API and streams the response back to the MCP client. The challenge: the upstream LLM stream arrives incrementally, but the MCP tool result is a single response. The options:

Buffer + return — accumulate the full LLM response, then return it as a tool result. Simple, but the client waits for the entire response. Best for short outputs.
Progress notifications with full result — send progress notifications with partial LLM output, then return the complete accumulated text as the final tool result. The client gets streaming feedback and a complete final result. Best for longer outputs (paragraphs, summaries).
Paginated follow-up tools — return a cursor pointing to a stored partial result and expose a continue_generation tool the client can call to fetch the next chunk. Complex but avoids per-session state on long generations.

// Progress-notification pattern for LLM streaming
server.tool(
  'generate_text',
  'Generate text using an LLM, streaming progress to the client',
  { prompt: z.string().min(1) },
  async (args, extra) => {
    const token = extra?.progressToken;
    let accumulated = '';
    let chunkCount = 0;

    // Stream from LLM API
    const stream = await llmClient.stream({ prompt: args.prompt });
    for await (const chunk of stream) {
      accumulated += chunk.text;
      chunkCount++;
      // Send a progress notification every 5 chunks to avoid flooding
      if (token !== undefined && chunkCount % 5 === 0) {
        await server.notification({
          method: 'notifications/progress',
          params: { progressToken: token, progress: -1, total: -1, message: accumulated },
        });
      }
    }

    return { content: [{ type: 'text', text: accumulated }] };
  }
);

Monitoring streaming tool calls

AliveMCP's probe sends initialize + tools/list — neither uses streaming. Streaming failures are invisible to uptime probes. Monitor streaming tool calls with structured logs:

// Wrap tool handlers with streaming metrics
function withStreamingMetrics<T>(toolName: string, handler: () => Promise<T>): Promise<T> {
  const start = Date.now();
  return handler().then(
    result => {
      console.info({ event: 'tool_call_complete', tool: toolName, duration_ms: Date.now() - start });
      return result;
    },
    err => {
      console.error({ event: 'tool_call_error', tool: toolName, duration_ms: Date.now() - start, error: err.message });
      throw err;
    }
  );
}

Alert on: (1) P95 tool-call duration > your target SLO; (2) streaming connections that stay open longer than max_tool_duration × 1.5 (indicates a hung stream); (3) progress notification count per tool call (a sudden drop indicates the SSE stream is being buffered by a proxy).