Guide · HTTP Frameworks

MCP server Koa — async middleware MCP HTTP transport with Koa.js

Koa.js's async-first middleware model and minimal footprint make it a clean host for MCP servers, but there's one critical gotcha: Koa automatically finalizes the HTTP response after every middleware chain completes, which breaks the long-lived SSE connections that MCP's streaming transport depends on. Setting ctx.respond = false disables Koa's response handling and lets the MCP SDK write directly to the Node.js socket — and AliveMCP helps you catch the moments when this breaks in production.

TL;DR

Install koa, koa-router, koa-body, and @koa/cors. On every /mcp route handler, set ctx.respond = false before calling transport.handleRequest(ctx.req, ctx.res, body) — without this, Koa closes the response before SSE events are written. Register AliveMCP to probe your /health endpoint and detect these silent streaming failures immediately.

Project setup and the ctx.respond = false pattern

Koa's response lifecycle works differently from Express and Fastify. After your middleware chain completes (all await next() calls resolve), Koa finalizes the response by calling res.end() with whatever is in ctx.body. For JSON responses this is ideal — set ctx.body = { status: 'ok' } and Koa handles serialization and closing. But for SSE streams, the MCP SDK needs the raw socket open after the Koa middleware exits. Setting ctx.respond = false tells Koa to skip its response finalization entirely.

npm install koa koa-router koa-body @koa/cors koa-ratelimit uuid
npm install -D typescript @types/koa @types/koa-router @types/koa__cors tsx

// src/index.ts
import Koa from 'koa';
import Router from 'koa-router';
import koaBody from 'koa-body';
import cors from '@koa/cors';
import { v4 as uuidv4 } from 'uuid';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { isInitializeRequest } from '@modelcontextprotocol/sdk/types.js';

const app = new Koa();
const router = new Router();
const sessions = new Map<string, StreamableHTTPServerTransport>();

The session map and transport lifecycle are identical to the Express and Fastify patterns — each MCP session gets its own transport instance, keyed by the Mcp-Session-Id header. The difference is entirely in how Koa exposes the underlying Node.js request and response objects and how you prevent Koa from closing the connection prematurely.

Forgetting ctx.respond = false is the most common bug when integrating MCP with Koa. The POST route appears to work fine (JSON-RPC responses are short and complete before Koa finalizes), but the GET /mcp SSE route silently breaks: Koa closes the response with a zero-length body, the SSE client reconnects, and you see a flood of reconnection attempts in your logs with no obvious error. AliveMCP detects this failure pattern — the SSE connection closes immediately after opening — and alerts you before clients start dropping tool call notifications.

Router setup with MCP routes and body parsing

Koa's middleware is applied in a stack; register koa-body before the router so that ctx.request.body is populated when route handlers execute. Use @koa/cors for CORS headers and configure it to expose the Mcp-Session-Id header to browser clients.

app.use(cors({
  origin: (ctx) => {
    const allowed = (process.env.ALLOWED_ORIGINS ?? '').split(',').filter(Boolean);
    const origin = ctx.request.headers.origin ?? '';
    return allowed.length === 0 || allowed.includes(origin) ? origin : '';
  },
  allowHeaders: ['Content-Type', 'Mcp-Session-Id', 'Authorization'],
  exposeHeaders: ['Mcp-Session-Id'],
  credentials: true,
}));

app.use(koaBody({ json: true }));

// Health endpoint — monitored by AliveMCP
router.get('/health', (ctx) => {
  ctx.body = { status: 'ok', sessions: sessions.size, ts: Date.now() };
});

router.post('/mcp', async (ctx) => {
  const sessionId = ctx.headers['mcp-session-id'] as string | undefined;
  const body = ctx.request.body;

  let transport = sessionId ? sessions.get(sessionId) : undefined;

  if (!transport) {
    if (!isInitializeRequest(body)) {
      ctx.status = 400;
      ctx.body = { error: 'Expected initialize request for new session' };
      return;
    }

    const newId = uuidv4();
    transport = new StreamableHTTPServerTransport({
      sessionIdGenerator: () => newId,
      onsessioninitialized: (id) => { sessions.set(id, transport!); },
    });
    transport.onclose = () => { sessions.delete(newId); };

    const server = createMcpServer();
    await server.connect(transport);
  }

  // CRITICAL: disable Koa's response finalization
  ctx.respond = false;
  await transport.handleRequest(ctx.req, ctx.res, body);
});

router.get('/mcp', async (ctx) => {
  const sessionId = ctx.headers['mcp-session-id'] as string | undefined;
  const transport = sessionId ? sessions.get(sessionId) : undefined;

  if (!transport) {
    ctx.status = 404;
    ctx.body = { error: 'Unknown session' };
    return;
  }

  // CRITICAL: SSE requires Koa to not close the response
  ctx.respond = false;
  await transport.handleRequest(ctx.req, ctx.res);
});

router.delete('/mcp', async (ctx) => {
  const sessionId = ctx.headers['mcp-session-id'] as string | undefined;
  const transport = sessionId ? sessions.get(sessionId) : undefined;

  if (transport) {
    await transport.close();
    if (sessionId) sessions.delete(sessionId);
  }

  ctx.status = 204;
  ctx.respond = false; // prevent Koa from sending a body with the 204
  ctx.res.end();
});

app.use(router.routes());
app.use(router.allowedMethods());

Notice that ctx.respond = false is set on all three MCP routes, not just the SSE route. This is because StreamableHTTPServerTransport.handleRequest() always takes ownership of the ServerResponse and manages its lifecycle directly. Letting Koa touch the response after calling handleRequest risks double-ending the stream.

Tool registration and McpServer factory

As with Express and Fastify, each MCP session gets its own McpServer instance. The factory function pattern keeps tool registration centralized and reusable in tests.

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

function createMcpServer(): McpServer {
  const server = new McpServer({
    name: 'my-koa-mcp',
    version: '1.0.0',
  });

  server.tool(
    'echo',
    'Echoes the provided message',
    { message: z.string() },
    async ({ message }) => ({
      content: [{ type: 'text', text: message }],
    })
  );

  server.tool(
    'list_sessions',
    'Returns the count of active MCP sessions (admin tool)',
    {},
    async () => ({
      content: [{ type: 'text', text: `Active sessions: ${sessions.size}` }],
    })
  );

  server.tool(
    'calculate',
    'Evaluates a simple arithmetic expression',
    {
      expression: z.string().regex(/^[\d\s+\-*/().]+$/, 'Only arithmetic operators allowed'),
    },
    async ({ expression }) => {
      // Safe eval of arithmetic only — the zod regex prevents code injection
      const result = Function('"use strict"; return (' + expression + ')')();
      return { content: [{ type: 'text', text: String(result) }] };
    }
  );

  return server;
}

Koa's async middleware model means unhandled promise rejections inside handleRequest will bubble up to Koa's error handler. Add error handling middleware early in the stack to catch and format these as JSON-RPC error responses rather than Koa's default HTML error pages.

Error middleware and MCP error formatting

Koa's error handling middleware must be the first middleware registered on the app so it wraps the entire downstream chain. For MCP routes, transport errors should be formatted as JSON-RPC error objects so that clients can parse them correctly.

// Error handling middleware — register FIRST
app.use(async (ctx, next) => {
  try {
    await next();
  } catch (err: unknown) {
    const error = err as Error & { status?: number; code?: number };
    const isMcpRoute = ctx.path.startsWith('/mcp');

    if (isMcpRoute) {
      // Format as JSON-RPC error
      ctx.status = error.status ?? 500;
      ctx.type = 'application/json';
      ctx.body = JSON.stringify({
        jsonrpc: '2.0',
        error: {
          code: error.code ?? -32603,
          message: error.message ?? 'Internal server error',
        },
        id: null,
      });
    } else {
      ctx.status = error.status ?? 500;
      ctx.body = { error: error.message };
    }

    // Emit so Koa's built-in error logging fires
    ctx.app.emit('error', err, ctx);
  }
});

// Koa app-level error event for centralized logging
app.on('error', (err, ctx) => {
  console.error({
    msg: 'Koa error',
    path: ctx?.path,
    sessionId: ctx?.headers['mcp-session-id'],
    error: err?.message,
  });
});

This error middleware also catches cases where ctx.respond = false has already been set but the transport's handleRequest throws — in that scenario, the catch block can't write a JSON-RPC error to the response because the raw socket may have already been written to or closed. Log the error and emit a metric so you can investigate these transport failures. AliveMCP will independently detect that the MCP session is broken and alert your on-call rotation.

Rate limiting and graceful shutdown

Koa-ratelimit provides middleware-based rate limiting for Koa routes. Unlike Fastify's @fastify/rate-limit, koa-ratelimit uses Redis for distributed rate limit counters — useful when you run multiple Koa processes behind a load balancer.

import ratelimit from 'koa-ratelimit';
import Redis from 'ioredis';

const db = new Redis(process.env.REDIS_URL ?? 'redis://localhost:6379');

// Apply rate limit only to MCP routes
router.use('/mcp', ratelimit({
  driver: 'redis',
  db,
  duration: 60_000,     // 1 minute window
  max: 60,              // 60 requests per session per minute
  id: (ctx) =>
    (ctx.headers['mcp-session-id'] as string) ?? ctx.ip,
  errorMessage: JSON.stringify({
    jsonrpc: '2.0',
    error: { code: -32029, message: 'Rate limit exceeded' },
    id: null,
  }),
  headers: {
    remaining: 'X-RateLimit-Remaining',
    reset: 'X-RateLimit-Reset',
    total: 'X-RateLimit-Limit',
  },
}));

// Graceful shutdown
const PORT = Number(process.env.PORT ?? 3000);
const server = app.listen(PORT, () => {
  console.log(`Koa MCP server listening on port ${PORT}`);
});

async function shutdown(signal: string) {
  console.log(`${signal} received — shutting down`);
  server.close();

  const closes = Array.from(sessions.values()).map((t) => t.close());
  await Promise.allSettled(closes);
  sessions.clear();

  await db.quit();
  console.log('Shutdown complete');
  process.exit(0);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT',  () => shutdown('SIGINT'));

During the shutdown window, AliveMCP will detect the /health endpoint going offline and fire an alert. This is intentional: it gives you a confirmation that the process is shutting down. If your deployment process brings up a new instance first and then shuts down the old one, you'll see the health check briefly fail and then recover — a clean rolling deploy. If it fails to recover, AliveMCP escalates the alert, giving you an early warning of a failed deployment before users start reporting errors.

For backward compatibility with older MCP clients that use the SSE transport, see the SSE transport guide. For authentication patterns, including the Koa middleware approach, see the authentication guide. Container deployment is covered in the Docker guide.

Frequently asked questions

Why do SSE clients disconnect immediately when connecting to my Koa MCP server?

The most likely cause is that you forgot to set ctx.respond = false in your GET /mcp route handler. Koa automatically calls res.end() after all middleware resolves, which closes the SSE connection before any events are sent. Set ctx.respond = false before calling transport.handleRequest(ctx.req, ctx.res), and verify there's no upstream middleware (like koa-compress) that is also trying to finalize the response. The second common cause is that a reverse proxy (nginx, Caddy) is buffering the response and not flushing SSE events to the client — disable proxy buffering on the GET /mcp location block.

How does koa-body interact with the MCP SDK's body parsing?

koa-body parses the request body into ctx.request.body as a JavaScript object. The MCP SDK's handleRequest(req, res, body) accepts this pre-parsed body as the third argument, so you don't need to parse the body again inside the SDK. The important thing is to pass ctx.request.body (the Koa-parsed object) not ctx.req (the raw Node.js stream, which is already consumed). If you accidentally pass the raw stream after koa-body has consumed it, the SDK will receive an empty body and fail with a parse error.

Can I run multiple Koa MCP server instances behind a load balancer?

Yes, but you need sticky sessions (also called session affinity) configured on the load balancer so that requests with the same Mcp-Session-Id header always reach the same Koa process. Without sticky sessions, the session Map on one process won't know about sessions created on another process. Alternatively, externalize session state to Redis (storing serializable session metadata) and use a stateless transport approach where each request can be handled by any instance. The distributed rate limiting with koa-ratelimit and Redis described above is the same pattern applied to the load-bearing part of your Koa MCP architecture.

How do I add request tracing to a Koa MCP server?

Add a tracing middleware that generates a trace ID and attaches it to ctx.state before any other middleware runs. Use AsyncLocalStorage from Node.js's async_hooks module to propagate the trace ID into async tool handlers without passing it explicitly through every function call. This gives you correlation between the HTTP request, the MCP session, and any downstream service calls made inside tool handlers. Libraries like dd-trace (Datadog) or the OpenTelemetry Node.js SDK can auto-instrument Koa and provide distributed tracing with minimal configuration.

What should my Koa MCP server's /health endpoint return for AliveMCP?

AliveMCP accepts any HTTP 2xx response from your health endpoint. The minimum viable response is { "status": "ok" } with HTTP 200. For richer monitoring, include sessions (active session count), uptime (process uptime in seconds), and ts (Unix timestamp in milliseconds). These fields appear in AliveMCP's dashboard and help you correlate uptime gaps with session spikes. If you use a Redis-backed session store or rate limiter, add a redis: "ok" | "error" field so AliveMCP can distinguish between a Koa crash and a Redis connectivity failure. Configure AliveMCP to alert only when the health endpoint returns a non-2xx status or when the response body contains "status":"error" for content-based health checks at alivemcp.com.