Guide · Vercel

MCP server on Vercel

Vercel's serverless and Edge Runtime functions can run MCP server logic, but the platform's stateless, short-lived execution model creates real constraints for MCP. The initialize/tools-list/tool-call sequence requires a session to persist across multiple HTTP requests — on Vercel, there's no in-process session state between function invocations. This guide covers exactly what works (stateless tool handlers, Edge SSE for simple cases), what requires workarounds (session state externalized to Redis or KV), and when to use a persistent-process platform instead.

TL;DR

Stateless MCP tool handlers (no session state, each call is independent) work well on Vercel. Use Next.js App Router API routes or Vercel Edge Functions. SSE streaming works on Edge Runtime. Session-stateful MCP servers require external storage (Vercel KV / Upstash Redis) to reconstruct state on each invocation — workable but adds latency. Function timeout (10s on Hobby, 60s on Pro) limits long tool executions. For MCP servers with complex session state or long-running tools, Railway, Render, or Fly.io are better fits. Use AliveMCP to monitor the public endpoint regardless of platform.

The stateless invocation problem

The MCP protocol is designed around a session: the client sends initialize, gets back the server's capabilities, then calls tools in sequence. In a persistent-process server (Node.js on Railway, for example), all of this happens in one process that lives as long as the session.

Vercel functions are invoked per request and then frozen or killed. Between the initialize call and the first tools/call, the function instance that handled initialize may be gone. In-memory state — the transport object, any session-scoped caches, accumulated context — is lost.

The practical consequence: you must externalize all session state. Every MCP request handler must load state from an external store (Redis, Vercel KV, Postgres) at the start of the request and write it back at the end. For truly stateless tool handlers (no session-scoped state needed), this overhead is zero.

Next.js App Router route handler

For a stateless MCP server (tool handlers that don't need session context beyond what the client sends with each request), a Next.js App Router route handler is a clean fit:

// app/api/mcp/route.ts
import { NextRequest } from 'next/server';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';

export const runtime = 'nodejs'; // or 'edge' for Edge Runtime

export async function POST(req: NextRequest) {
  const body = await req.json();

  const server = new McpServer({
    name: 'my-mcp-server',
    version: '1.0.0',
  });

  // Register stateless tools
  server.tool('get-weather', { location: z.string() }, async ({ location }) => {
    const data = await fetchWeather(location); // pure function, no session state
    return { content: [{ type: 'text', text: JSON.stringify(data) }] };
  });

  const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
  await server.connect(transport);

  const response = await transport.handleRequest(req, new Response());
  return response;
}

The key: sessionIdGenerator: undefined tells the transport not to manage session state. Each POST is stateless — the server is created fresh, tools are registered, the request is handled, and the function returns. This works for tool handlers that are pure functions of their inputs.

SSE streaming on Edge Runtime

Vercel's Edge Runtime supports streaming responses, which is required for SSE. The Streamable HTTP transport (the successor to the SSE transport in newer MCP SDK versions) works with Vercel's streaming infrastructure:

// app/api/mcp/route.ts — streaming version
export const runtime = 'edge';
export const maxDuration = 60; // Pro plan: 60s max

export async function GET(req: NextRequest) {
  const { readable, writable } = new TransformStream();
  const writer = writable.getWriter();
  const encoder = new TextEncoder();

  // Send SSE events for long-running tool operations
  const sendEvent = (data: unknown) => {
    writer.write(encoder.encode(`data: ${JSON.stringify(data)}\n\n`));
  };

  // Handle MCP session in background (Edge-compatible)
  (async () => {
    try {
      sendEvent({ type: 'connected' });
      // ... tool execution with progress events
      writer.close();
    } catch (err) {
      writer.abort(err);
    }
  })();

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  });
}

Edge Runtime has a hard limit on execution time (Vercel Hobby: 30s, Pro: 60s). Long-running tool executions that exceed this limit will be cut off mid-stream. For tools that routinely take more than 30 seconds, use an async pattern: queue the work, return a job ID immediately, and have the client poll for the result.

Session state with Vercel KV

For MCP servers that need session state (accumulated context, tool call history, per-user data), externalize state to Vercel KV (Upstash Redis under the hood):

import { kv } from '@vercel/kv';

interface McpSessionState {
  userId: string;
  toolCallHistory: Array<{ tool: string; result: unknown; ts: number }>;
  contextSummary: string;
}

export async function POST(req: NextRequest) {
  const sessionId = req.headers.get('mcp-session-id') ?? crypto.randomUUID();
  const body = await req.json();

  // Load session state from KV
  const state = await kv.get<McpSessionState>(`session:${sessionId}`) ?? {
    userId: body.params?.clientInfo?.name ?? 'anonymous',
    toolCallHistory: [],
    contextSummary: ''
  };

  // Handle the MCP request using loaded state
  const result = await handleMcpRequest(body, state);

  // Save updated state (30-minute TTL)
  await kv.set(`session:${sessionId}`, state, { ex: 1800 });

  return Response.json(result, {
    headers: { 'mcp-session-id': sessionId }
  });
}

Each invocation adds one KV read + one KV write. At Vercel KV pricing (~$0.0002/request), this is negligible for low-to-moderate traffic. At high request rates, the KV round-trip (5–15ms) becomes a noticeable addition to tool call latency.

Function timeout limits

Vercel function timeouts by plan:

Hobby: 10 seconds (Serverless), 30 seconds (Edge)
Pro: 60 seconds (Serverless), 60 seconds (Edge)
Enterprise: up to 900 seconds (15 minutes)

MCP tool executions that involve LLM calls, web scraping, or file processing can easily take 10–60 seconds. Design tool handlers to either:

Complete within the timeout — appropriate for fast, deterministic operations
Use the async job pattern — return a job ID immediately, let the client poll a separate /api/mcp/status/[jobId] endpoint
Stream partial results — use SSE to send incremental results before the timeout

Set maxDuration in your route configuration to the maximum you're willing to pay for, not the platform limit:

export const maxDuration = 30; // Don't let a runaway tool eat 60s

When to choose Vercel vs a persistent server

Use Vercel when:

Your tool handlers are stateless or nearly stateless
You already have a Next.js app and want to colocate the MCP server
Traffic is bursty and you benefit from Vercel's automatic scaling to zero
Tool executions are fast (under 10 seconds on Hobby, 60 seconds on Pro)

Use Railway, Render, or Fly.io instead when:

Your MCP server maintains per-session in-memory state (conversation context, tool call chains)
You have long-running tool executions (LLM inference, file processing, async workflows)
You need WebSocket transport or persistent SSE connections
You're running multiple MCP servers that communicate over a private network

The cost crossover: Vercel Pro is $20/month for 1M function invocations. Railway Starter is $5/month for a continuously-running instance. If your MCP server handles more than ~5,000 sessions per month with 10+ tool calls per session (50,000+ invocations), per-invocation pricing may exceed fixed-instance pricing.

External monitoring for Vercel MCP servers

Vercel provides function-level metrics (invocation count, error rate, p99 duration) in the dashboard. These show whether functions are being invoked and whether they're erroring — but they don't show whether the MCP protocol layer is functioning correctly from a client's perspective.

An MCP client that receives a 500 Internal Server Error on the initialize call sees a broken server; Vercel's metrics show one error event. An MCP client that receives a malformed JSON response sees a protocol failure; Vercel's metrics show a successful invocation. Add your Vercel deployment URL to AliveMCP to catch protocol failures that function metrics don't surface. See AliveMCP for setup.