Guide · Performance

MCP server compression

HTTP compression for MCP servers is not a blanket setting — it is a per-route decision with one hard constraint: SSE streams must not be compressed by a buffering compressor. An MCP server using HTTP+SSE runs two distinct transport paths: the HTTP POST path (tool call requests and initialize) and the SSE GET path (the event stream back to the client). Applying gzip to the SSE path with a buffering compressor causes the compressor to buffer events before flushing, which delays every server-to-client notification. The pattern is simple: compress HTTP POST responses and static assets; exempt the SSE GET endpoint from the compression middleware entirely.

TL;DR

Add the Express compression middleware to your HTTP POST endpoint and static asset handler. Exempt the SSE GET path with a filter function that returns false for text/event-stream responses. Set a threshold of 1 KB — do not compress small JSON responses where compression overhead exceeds savings. Use Brotli (Node.js built-in zlib.createBrotliCompress) for static assets served at build time; use gzip (compression middleware) for dynamic JSON tool responses at runtime. AliveMCP probes are not affected by compression — the probe sends a standard initialize request and the server's response is decompressed transparently by the HTTP client.

Where compression adds value in an MCP server

Compression provides the most benefit on responses that are large and repetitive — properties that make them highly compressible. Tool responses that return document content, search results, or structured data are good candidates. Short JSON responses (a boolean status, a single number) compress to nearly the same size when overhead is included:

Response typeTypical sizeGzip ratioWorth compressing?
Tool response: structured JSON (search results, document list)5–100 KB60–80% reductionYes — high value
Tool response: prose text (document content, summaries)1–50 KB50–70% reductionYes — high value
Tool response: short status or scalar value< 200 bytes~10% reductionNo — overhead exceeds savings
SSE stream (event-by-event)Per-event: 50–200 bytesN/A (stream)No — incompatible with buffering compressor
Static assets (JS, CSS)10–500 KB60–80% reductionYes — pre-compress at build time

Express compression middleware with SSE exemption

The compression package wraps Node.js's built-in zlib and negotiates gzip or deflate based on the client's Accept-Encoding header. The critical configuration is the filter function, which prevents the middleware from applying to SSE responses:

import express from 'express';
import compression from 'compression';
import type { Request, Response } from 'express';

const app = express();

// Compression middleware — exempt SSE responses
app.use(compression({
  // Minimum response size to compress (bytes) — skip tiny JSON responses
  threshold: 1024, // 1 KB

  // Level 6 is the default; 1 = fastest/least compression, 9 = slowest/most compression
  // For dynamic JSON responses, level 6 is the right tradeoff
  level: 6,

  // Filter: do NOT compress SSE streams
  filter: (req: Request, res: Response) => {
    // If the response content-type is text/event-stream, skip compression
    const contentType = res.getHeader('Content-Type') as string | undefined;
    if (contentType?.includes('text/event-stream')) {
      return false;
    }
    // Otherwise, use the default compression filter
    return compression.filter(req, res);
  },
}));

// MCP transport handles both POST (tool calls) and GET (SSE stream)
// The filter above ensures the GET/SSE path is never compressed
app.use('/mcp', mcpTransport.requestHandler);

The compression middleware sets Content-Encoding: gzip and wraps the response write stream. For SSE responses, the transport sets Content-Type: text/event-stream and writes events incrementally — if the compressor buffers these writes, the client does not receive events until the buffer flushes (either when it fills or when the stream closes). Exempting text/event-stream responses from the filter prevents this.

Streaming compression for large tool responses

For tool handlers that stream large results (e.g., returning a paginated document corpus), Node.js's built-in zlib.createGzip() can compress a pipeline incrementally rather than buffering the entire response:

import { createGzip } from 'node:zlib';
import { pipeline } from 'node:stream/promises';

// Example: streaming a large file or database cursor as a compressed response
app.get('/export/documents', async (req, res) => {
  const acceptsGzip = req.headers['accept-encoding']?.includes('gzip');

  res.setHeader('Content-Type', 'application/json');
  if (acceptsGzip) {
    res.setHeader('Content-Encoding', 'gzip');
  }

  const dbCursor = deps.db.query('SELECT id, title, content FROM documents').cursor(100);
  const gzip = createGzip({ level: 6 });

  if (acceptsGzip) {
    await pipeline(dbCursor, gzip, res);
  } else {
    await pipeline(dbCursor, res);
  }
});

// For MCP tool responses (not streaming HTTP), the compression middleware handles
// the entire response body after the tool handler returns — no manual streaming needed

In most MCP servers, tool responses are returned as complete JSON objects from the handler — the compression middleware intercepts and compresses the response body automatically. Manual streaming compression is only needed for large file export endpoints that live outside the MCP tool layer.

Brotli for static assets

Brotli provides better compression ratios than gzip (typically 15–25% smaller) and is supported by all modern browsers and HTTP clients. The standard approach for static assets is to pre-compress at build time and serve the pre-compressed file when the client supports Brotli:

// Build step: pre-compress static assets with Brotli
import { createBrotliCompress, constants } from 'node:zlib';
import { createReadStream, createWriteStream } from 'node:fs';
import { pipeline } from 'node:stream/promises';

async function compressAsset(inputPath: string) {
  const brotli = createBrotliCompress({
    params: {
      [constants.BROTLI_PARAM_QUALITY]: 11, // max quality for static pre-compression
      [constants.BROTLI_PARAM_MODE]: constants.BROTLI_MODE_TEXT,
    },
  });
  await pipeline(
    createReadStream(inputPath),
    brotli,
    createWriteStream(`${inputPath}.br`)
  );
}

// Serve pre-compressed Brotli files with Express static middleware:
app.use('/assets', express.static('public/assets', {
  setHeaders: (res, path) => {
    if (path.endsWith('.br')) {
      res.setHeader('Content-Encoding', 'br');
      // Set correct Content-Type for the original file (not .br)
      if (path.includes('.js.br')) res.setHeader('Content-Type', 'application/javascript');
      if (path.includes('.css.br')) res.setHeader('Content-Type', 'text/css');
    }
  },
}));

For dynamic MCP tool responses, prefer gzip via the compression middleware — Brotli's quality-11 compression is too slow for request-time use (it adds 50–200ms of CPU time per large response). Use Brotli only for pre-compressed static assets where the compression is done once at build time and the compressed file is served repeatedly.

Stateless MCP mode and compression

If your MCP server uses stateless mode (enableSseResponse: false on the transport), all requests are HTTP POSTs and all responses are standard JSON. In this mode, there is no SSE stream to worry about — apply the compression middleware globally to the MCP endpoint without any filter exemptions. Stateless mode is compatible with round-robin load balancing and simplifies both the compression and the reverse proxy configuration.

// Stateless MCP server — no SSE, apply compression globally
const transport = new StreamableHTTPServerTransport({
  sessionIdGenerator: undefined, // stateless: no session IDs
  enableSseResponse: false,       // pure request/response, no SSE
});

// No SSE exemption needed in stateless mode
app.use('/mcp', compression({ threshold: 1024 }), mcpTransport.requestHandler);

Compression at the reverse proxy layer

If your MCP server runs behind Caddy, nginx, or a cloud load balancer, consider offloading compression to the reverse proxy rather than the application. This reduces CPU load on the application process and lets the proxy handle compression for multiple upstream services consistently:

# Caddy: compress everything except SSE at the proxy layer
route /mcp* {
  @sse {
    path /mcp*
    header Accept text/event-stream
  }
  # Reverse proxy SSE without encode (no compression for SSE)
  handle @sse {
    reverse_proxy localhost:3000 {
      flush_interval -1  # required for SSE: disable buffering
    }
  }
  # Compress all other /mcp requests at the proxy
  encode gzip
  reverse_proxy localhost:3000
}

Proxy-layer compression means the Node.js process does not need the compression middleware at all — the proxy handles it. If you use Caddy or nginx, check whether the default config already compresses all upstream responses, and add an explicit SSE exemption if so. A buffering proxy compressor on the SSE path is the same problem as a buffering application-layer compressor — the exemption is required at whichever layer does the compressing.

Related questions

Does AliveMCP's probe support compressed responses?

Yes. AliveMCP uses a standard HTTP client that automatically decompresses gzip, deflate, and Brotli responses based on the Content-Encoding header. The probe sends a standard initialize request with an appropriate Accept-Encoding header, and the response is decompressed before the JSON is parsed. Compression does not affect the probe's ability to measure latency or parse the MCP response.

How do I measure the impact of compression on my server's CPU?

Add a response-time logging middleware before the compression middleware and after it. The difference is the compression overhead per request. Alternatively, check the Node.js process CPU usage via process.cpuUsage() in a health_check tool before and after enabling compression under load. For most MCP servers with small-to-medium tool responses, gzip at level 6 adds under 1ms of CPU per response — well within acceptable bounds. If CPU is a concern, reduce the level to 3 (faster, slightly less compression) or increase the threshold to skip compression for responses under 10 KB.

Should I compress the MCP initialize and tools/list responses?

Yes, these are regular HTTP POST + JSON responses and benefit from compression. The tools/list response can be large (100+ KB for servers with many tools and rich schema descriptions). With gzip at level 6, a 50 KB tools/list response typically compresses to under 15 KB — meaningful bandwidth savings for clients on slower connections or mobile networks. The initialize response is small (under 1 KB), so the 1 KB threshold means it will usually not be compressed, which is correct.

Further reading