Guide · Streaming

MCP server streaming

MCP tool calls are synchronous by design — the client sends a tools/call request and waits for the response. For tool calls that take more than a few seconds, this creates a dead wait: the client has no signal that the server is working, and long HTTP connections through load balancers may time out before the result arrives. The MCP protocol addresses this with progress notifications — unsolicited server-to-client messages sent during a long tool call to report intermediate state. The StreamableHTTP transport delivers these notifications over Server-Sent Events (SSE), keeping the connection alive and giving clients real-time progress updates without changing the tool call's synchronous contract.

TL;DR

Use server.notification() to send progress updates during long tool calls. The client receives these as SSE events before the final tool result arrives. Set the progressToken in the tool call params if the client wants to associate notifications with a specific call. For tools that stream large outputs, return the result in multiple content array items. AliveMCP's probe sends an initialize request and then tools/list — neither uses streaming, so streaming failures are invisible to probe monitoring. Monitor streaming tool calls with structured logs (duration_ms, progress_notifications_sent).

MCP progress notifications

The MCP protocol defines a notifications/progress message type. A tool handler sends progress notifications by calling the server's notification method on the active request context. The client receives these notifications out-of-band over the SSE stream while waiting for the tool result:

// server.ts — progress notification in a long-running tool handler
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({ name: 'my-server', version: '1.0.0' });

server.tool(
  'analyze_repository',
  'Analyze a GitHub repository and return a summary report',
  {
    repo_url: z.string().url(),
    // progressToken is a standard MCP field — the SDK passes it automatically
    // if the client includes it in the request. No need to declare it here.
  },
  async (args, extra) => {
    // extra.progressToken is set if the client requested progress notifications
    const token = extra?.progressToken;

    async function progress(message: string, percent: number) {
      if (token !== undefined) {
        await server.notification({
          method: 'notifications/progress',
          params: {
            progressToken: token,
            progress: percent,
            total: 100,
            message,
          },
        });
      }
    }

    await progress('Fetching repository metadata...', 5);
    const metadata = await fetchRepoMetadata(args.repo_url);

    await progress('Cloning repository...', 20);
    const files = await cloneAndListFiles(args.repo_url);

    await progress(`Analyzing ${files.length} files...`, 40);
    const analysis = await analyzeFiles(files, async (i) => {
      await progress(`Analyzing file ${i + 1} of ${files.length}`, 40 + (i / files.length) * 50);
    });

    await progress('Generating report...', 95);
    const report = await generateReport(metadata, analysis);

    return {
      content: [{ type: 'text', text: report }],
    };
  }
);

The progressToken is an opaque identifier the client sends in the tool call request. If the client does not send a token, skip sending notifications — the server has no way to associate them with the correct call. The SDK provides the token via the second argument to the tool handler (extra.progressToken).

StreamableHTTP transport and SSE configuration

Progress notifications travel over the SSE stream established by the StreamableHTTP transport. The transport opens a persistent HTTP connection in SSE mode for the duration of the session. Make sure your infrastructure does not have an aggressive HTTP timeout that kills the SSE connection before long tool calls complete:

LayerSetting to checkRecommended value
Express / Node.js HTTP serverserver.timeout0 (disable) or match your max tool-call duration
Caddy reverse proxyflush_interval-1 (immediate flush for SSE)
nginx reverse proxyproxy_read_timeoutLonger than max tool-call duration (e.g., 120s)
Kubernetes Ingress (nginx)nginx.ingress.kubernetes.io/proxy-read-timeout"120" annotation on the Ingress resource
Fly.ioNo extra config neededFly's proxy handles SSE correctly by default
Cloudflare (proxied)Response streaming timeout100 seconds max on free/pro; use Workers for longer streams
// Caddy snippet for SSE-compatible MCP reverse proxy
reverse_proxy localhost:3001 {
    flush_interval -1   # Send each SSE event immediately, no buffering
    header_up Host {upstream_hostport}
}

The most common streaming bug in production: progress notifications fire correctly in development (no reverse proxy) but stop at the load balancer in production because the proxy buffers the SSE stream until the connection closes. If progress notifications never reach the client, the first thing to check is proxy response buffering.

Chunked content for large tool results

For tools that return large amounts of data (search results, log dumps, generated documents), the content array in the tool result can contain multiple items. Clients receive the entire result as a single response — this is not streaming delivery, but it allows structured segmentation of large outputs:

server.tool(
  'search_logs',
  'Search application logs for a query',
  { query: z.string(), limit: z.number().int().min(1).max(1000).default(100) },
  async (args) => {
    const rows = await db('logs')
      .where('message', 'ilike', `%${args.query}%`)
      .orderBy('timestamp', 'desc')
      .limit(args.limit);

    if (rows.length === 0) {
      return { content: [{ type: 'text', text: 'No results found.' }] };
    }

    // Return metadata and results as separate content items
    // so the client can parse them independently
    return {
      content: [
        {
          type: 'text',
          text: `Found ${rows.length} results for "${args.query}"`,
        },
        {
          type: 'text',
          text: rows.map(r => `[${r.timestamp}] ${r.level}: ${r.message}`).join('\n'),
        },
      ],
    };
  }
);

If your tool result exceeds the client's context window (common for log dumps), paginate rather than sending one giant result. Add cursor and limit parameters to the tool schema and return a next_cursor in the result so the client can fetch subsequent pages.

Streaming from upstream LLM APIs

A common pattern is an MCP tool that calls a language model API and streams the response back to the MCP client. The challenge: the upstream LLM stream arrives incrementally, but the MCP tool result is a single response. The options:

// Progress-notification pattern for LLM streaming
server.tool(
  'generate_text',
  'Generate text using an LLM, streaming progress to the client',
  { prompt: z.string().min(1) },
  async (args, extra) => {
    const token = extra?.progressToken;
    let accumulated = '';
    let chunkCount = 0;

    // Stream from LLM API
    const stream = await llmClient.stream({ prompt: args.prompt });
    for await (const chunk of stream) {
      accumulated += chunk.text;
      chunkCount++;
      // Send a progress notification every 5 chunks to avoid flooding
      if (token !== undefined && chunkCount % 5 === 0) {
        await server.notification({
          method: 'notifications/progress',
          params: { progressToken: token, progress: -1, total: -1, message: accumulated },
        });
      }
    }

    return { content: [{ type: 'text', text: accumulated }] };
  }
);

Monitoring streaming tool calls

AliveMCP's probe sends initialize + tools/list — neither uses streaming. Streaming failures are invisible to uptime probes. Monitor streaming tool calls with structured logs:

// Wrap tool handlers with streaming metrics
function withStreamingMetrics<T>(toolName: string, handler: () => Promise<T>): Promise<T> {
  const start = Date.now();
  return handler().then(
    result => {
      console.info({ event: 'tool_call_complete', tool: toolName, duration_ms: Date.now() - start });
      return result;
    },
    err => {
      console.error({ event: 'tool_call_error', tool: toolName, duration_ms: Date.now() - start, error: err.message });
      throw err;
    }
  );
}

Alert on: (1) P95 tool-call duration > your target SLO; (2) streaming connections that stay open longer than max_tool_duration × 1.5 (indicates a hung stream); (3) progress notification count per tool call (a sudden drop indicates the SSE stream is being buffered by a proxy).

Related questions

Do all MCP clients support progress notifications?

Not all clients consume notifications/progress. Clients that do not support it simply ignore the notifications — they do not cause errors. However, clients that do not support progress notifications also typically do not send a progressToken in tool call requests. Check for the presence of progressToken before sending notifications, which is what the pattern above does. Claude Desktop and most SDK-based clients support progress notifications. Custom clients may not implement the notification handler.

Does streaming affect AliveMCP monitoring?

AliveMCP probes measure the time from sending the initialize request to receiving the complete initialize response, and separately measures tools/list response time. Neither involves streaming tool calls. If your streaming tools affect server performance (high CPU during generation, connection limit pressure), the effect will show up as increased initialize latency in probe metrics — an indirect signal that the server is under load.

How do I set a timeout on a streaming tool call?

Wrap the stream consumption in a race against a timeout signal: AbortSignal.timeout(MAX_DURATION_MS). Pass the signal to the upstream API call so the upstream connection is also aborted. When the timeout fires, return isError: true with a message like "Generation timed out after N seconds — please retry with a shorter prompt." Never let a streaming tool call hang indefinitely — it holds a session open and may exhaust server resources.

Further reading