Guide · Cloud Deployment

MCP server on AWS Lambda

AWS Lambda is the most popular serverless platform in production environments, but it was designed for short-lived request-response workloads — not for the long-lived SSE connections that MCP's HTTP transport establishes. Deploying an MCP server on Lambda requires the Lambda Web Adapter to bridge the gap between Lambda's invocation model and the streaming HTTP responses that MCP clients expect. This guide covers the complete deployment path: Lambda Web Adapter configuration, Function URL vs API Gateway decision, connection object lifecycle across cold and warm starts, cold-start mitigation with provisioned concurrency, SAM and CDK deployment patterns, and the monitoring blind spots that make external protocol probing essential for Lambda-hosted MCP servers.

TL;DR

Use the AWS Lambda Web Adapter extension layer — it runs your existing HTTP server (Express, Fastify, Hono) inside Lambda without modification and correctly handles SSE streaming. Use a Function URL with InvokeMode: RESPONSE_STREAM for direct SSE support; API Gateway v2 (HTTP API) works but adds ~2ms latency and requires careful timeout configuration. Initialize shared objects (AWS SDK clients, database connections) in the init scope outside the handler so they survive across warm invocations. Monitor with AliveMCP to catch Lambda execution failures that appear as silent timeouts to MCP clients — CloudWatch only shows invocation metrics, not protocol-level failure modes.

Why Lambda needs the Web Adapter

Lambda's native invocation model sends a JSON event payload to your handler and expects a JSON response object. Standard HTTP streaming (chunked transfer encoding, SSE) does not fit this model directly. Before the Lambda Web Adapter, developers worked around this with API Gateway's response streaming mode — a custom awslambda.streamifyResponse wrapper that bypasses the JSON event model and streams raw bytes. This approach works but requires restructuring your server code specifically for Lambda.

The AWS Lambda Web Adapter is a Lambda extension that solves the impedance mismatch cleanly: it starts your HTTP server as a subprocess, forwards Lambda invocations as HTTP requests to localhost:8080, and streams the HTTP response back to the caller. Your MCP server code stays identical to what you'd deploy on Railway or a VPS — the adapter handles the translation.

Approach	Code changes required	SSE streaming	Cold start overhead
Lambda native (JSON event)	Major rewrite — no HTTP framework	No	Lowest
`awslambda.streamifyResponse`	Moderate — wrap handler	Yes (raw bytes)	Low
Lambda Web Adapter	None — run existing HTTP server	Yes (full SSE)	~5ms extra (adapter boot)

For MCP servers where the codebase also runs on other platforms, the Lambda Web Adapter is the right choice: zero code changes, full SSE support, and the adapter cost (~5ms on cold start) is negligible compared to Lambda's baseline cold start time (500ms–2s for Node.js depending on memory allocation).

Adding Lambda Web Adapter to your deployment

The adapter ships as a Lambda layer. Add it to your function and set the required environment variable that tells it where your server listens:

# SAM template.yaml — Lambda Web Adapter deployment
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Runtime: nodejs22.x
    Architectures: [arm64]   # Graviton2 — 20% cheaper, faster cold starts
    MemorySize: 512
    Timeout: 30              # Lambda max for HTTP API; 15 min max for async

Resources:
  McpServerFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: dist/
      Handler: run.sh          # Shell script that starts your HTTP server
      Layers:
        # Lambda Web Adapter layer ARN — us-east-1, arm64, latest version
        - arn:aws:lambda:us-east-1:753240598075:layer:LambdaAdapterLayerArm64:24
      Environment:
        Variables:
          AWS_LAMBDA_EXEC_WRAPPER: /opt/bootstrap   # Tells Lambda to use the adapter
          PORT: "8080"                               # Adapter expects your server here
          MCP_SERVER_VERSION: "1.0.0"
      FunctionUrlConfig:
        AuthType: NONE         # Or AWS_IAM for authenticated access
        InvokeMode: RESPONSE_STREAM  # Required for SSE
        Cors:
          AllowOrigins: ["https://yourdomain.com"]
          AllowMethods: [GET, POST, DELETE, OPTIONS]
          AllowHeaders: [Content-Type, Mcp-Session-Id, Authorization]
          ExposeHeaders: [Mcp-Session-Id]

The run.sh handler script starts your Node.js server as a foreground process — Lambda Web Adapter keeps it running and proxies requests to it:

#!/bin/bash
# run.sh — handler script for Lambda Web Adapter
exec node server.js

Your server.js is unchanged from any other deployment target — it just binds to process.env.PORT (or 8080 if unset) and handles MCP requests via StreamableHTTPServerTransport.

Function URL vs API Gateway: which to use

MCP servers deployed on Lambda have two front-door options. The right choice depends on your SSE requirements, authentication needs, and traffic pattern:

Feature	Function URL	API Gateway v2 (HTTP API)	API Gateway v1 (REST API)
SSE streaming	Yes — with `RESPONSE_STREAM`	Yes — with response streaming enabled	No — 6MB response limit, no chunked encoding
Added latency	~0ms (direct)	~1–3ms	~5–10ms
Cost	Free (Lambda invocation only)	$1/million requests	$3.50/million requests
Auth options	None or AWS_IAM	JWT authorizer, Lambda authorizer, IAM	Cognito, Lambda authorizer, API keys, IAM
Custom domain	Not supported (URL is CloudFront-routable)	Yes — Route 53 + ACM	Yes — Route 53 + ACM
WAF integration	Via CloudFront only	Yes — native WAF v2 association	Yes — native WAF v1 association
Timeout max	15 minutes	29 seconds	29 seconds

For most MCP servers: use a Function URL for simplicity and zero added cost during development, then add an API Gateway HTTP API in front when you need a custom domain, WAF rules, or multi-tier auth. If you use API Gateway, configure a 29-second timeout (the maximum) — MCP initialize plus the first tool call must complete within this window, or clients will see a 504 Gateway Timeout that looks like a network failure.

Handler scope vs init scope: where to put connections

Lambda reuses warm execution environments across invocations — the code outside your handler function runs once on cold start and is cached. Code inside the handler runs on every invocation. Placing expensive objects in the wrong scope causes either redundant work (re-creating SDK clients on every request) or cold-start penalty amplification (creating slow connections inside the handler):

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { S3Client } from '@aws-sdk/client-s3';
import express from 'express';
import { z } from 'zod';

// ── INIT SCOPE: runs once on cold start, reused on warm invocations ──
const dynamoClient = new DynamoDBClient({ region: process.env.AWS_REGION });
const s3Client = new S3Client({ region: process.env.AWS_REGION });

// Session registry survives across warm invocations — intentional
const sessions = new Map();

const app = express();
app.use(express.json());

// Register MCP tools once — they reference the shared clients above
const mcpServer = new McpServer({ name: 'my-mcp-server', version: '1.0.0' });

mcpServer.tool('query_table', 'Query DynamoDB table', {
  tableName: z.string(),
  key: z.record(z.string()),
}, async ({ tableName, key }) => {
  // Uses the shared DynamoDBClient initialized in init scope
  const item = await dynamoClient.send(new GetItemCommand({ TableName: tableName, Key: key }));
  return { content: [{ type: 'text', text: JSON.stringify(item.Item ?? null) }] };
});

// ── HANDLER SCOPE: runs on every invocation ──
app.post('/mcp', async (req, res) => {
  const sessionId = req.headers['mcp-session-id'];
  let transport = sessions.get(sessionId);

  if (!transport) {
    transport = new StreamableHTTPServerTransport({ sessionIdGenerator: () => crypto.randomUUID() });
    const sessionServer = new McpServer({ name: 'my-mcp-server', version: '1.0.0' });
    // Tool registration references shared clients — lightweight
    registerTools(sessionServer, dynamoClient, s3Client);
    await sessionServer.connect(transport);
    if (transport.sessionId) sessions.set(transport.sessionId, transport);
  }

  await transport.handleRequest(req, res, req.body);
});

app.listen(process.env.PORT ?? 8080);

The critical rule: AWS SDK clients, database connection pools, and the Express app go in init scope. Per-session MCP transport objects go in the handler scope (or in a Map that lives in init scope, if you want to reuse sessions across warm invocations). An MCP server that creates a new DynamoDBClient on every request will appear slow on warm invocations — the SDK client initialization is ~50ms, which doubles the latency of fast tool calls.

Cold start management

Lambda cold starts for Node.js range from 200ms on small functions with minimal dependencies to 2s+ on functions with large SDK bundles. For MCP servers, cold starts cause two problems: the initialize handshake takes longer than the client's timeout, and clients that retry on timeout may exhaust their retry budget before the Lambda warms up.

Three techniques reduce cold start impact for MCP servers:

1. Tree-shake your bundle. Only import what you use from the AWS SDK — each service client package is ~200KB. Use esbuild or rollup with tree-shaking:

# package.json build script — bundle and tree-shake
{
  "scripts": {
    "build": "esbuild src/server.ts --bundle --platform=node --target=node22 --outfile=dist/server.js --external:@aws-sdk/* --minify"
  }
}

# Note: @aws-sdk/* is external because Lambda includes the v3 SDK in the runtime
# Only bundle packages NOT included in the Lambda runtime

2. Allocate more memory. Lambda CPU is proportional to memory. A 512MB function has twice the CPU of a 256MB function — cold starts are faster even if you never actually use the extra memory:

# CloudFormation / SAM
MemorySize: 1024  # Sweet spot for most MCP servers — halves cold start vs 256MB

3. Provisioned concurrency for zero cold starts. For MCP servers that need guaranteed low latency, provisioned concurrency pre-warms N Lambda instances so they're always ready:

McpServerAlias:
  Type: AWS::Lambda::Alias
  Properties:
    FunctionName: !Ref McpServerFunction
    FunctionVersion: !GetAtt McpServerFunction.Version
    Name: production
    ProvisionedConcurrencyConfig:
      ProvisionedConcurrentExecutions: 2  # Keep 2 warm instances; auto-scales above this

Provisioned concurrency costs approximately $0.015/hour per GB of memory per provisioned instance. For a 512MB function with 2 provisioned instances: ~$0.015/hour = $11/month — less than a small VPS, with none of the instance management.

Lambda@Edge limitations for MCP

Lambda@Edge runs at CloudFront edge locations — closer to users, sub-millisecond routing. But it has hard limits that make it incompatible with MCP servers:

Constraint	Lambda@Edge value	MCP server requirement
Max execution time (viewer response)	1 second	SSE sessions last minutes
Max execution time (origin response)	30 seconds	Tool calls can take 10s+
Response body streaming	Not supported	SSE requires streaming
VPC access	Not supported	RDS, ElastiCache need VPC
Environment variables size	4KB total	Secrets + config often exceed 4KB
Deployment region	us-east-1 only	Deploy from any region

Use CloudFront Functions for lightweight edge routing (authentication token validation, request rewriting) in front of a regional Lambda, not Lambda@Edge itself. The MCP server runs in a regional Lambda; CloudFront sits in front for CDN caching of static responses and TLS termination at edge locations.

CDK deployment pattern

For teams using AWS CDK, the Lambda Web Adapter simplifies deployment to a stack that's nearly identical to any other Lambda HTTP service:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as lambdaNodejs from 'aws-cdk-lib/aws-lambda-nodejs';
import { Construct } from 'constructs';

export class McpServerStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const mcpFunction = new lambdaNodejs.NodejsFunction(this, 'McpServer', {
      entry: 'src/server.ts',
      runtime: lambda.Runtime.NODEJS_22_X,
      architecture: lambda.Architecture.ARM_64,
      memorySize: 1024,
      timeout: cdk.Duration.seconds(30),
      bundling: {
        minify: true,
        externalModules: ['@aws-sdk/*'],  // Use Lambda's bundled SDK
      },
      environment: {
        MCP_SERVER_VERSION: '1.0.0',
        NODE_OPTIONS: '--enable-source-maps',
      },
    });

    // Lambda Web Adapter layer (arm64, us-east-1)
    const adapterLayer = lambda.LayerVersion.fromLayerVersionArn(
      this,
      'LambdaAdapterLayer',
      `arn:aws:lambda:${this.region}:753240598075:layer:LambdaAdapterLayerArm64:24`
    );
    mcpFunction.addLayers(adapterLayer);
    mcpFunction.addEnvironment('AWS_LAMBDA_EXEC_WRAPPER', '/opt/bootstrap');
    mcpFunction.addEnvironment('PORT', '8080');

    // Function URL with streaming for SSE support
    const functionUrl = mcpFunction.addFunctionUrl({
      authType: lambda.FunctionUrlAuthType.NONE,
      invokeMode: lambda.InvokeMode.RESPONSE_STREAM,
      cors: {
        allowedOrigins: ['https://yourdomain.com'],
        allowedMethods: [lambda.HttpMethod.GET, lambda.HttpMethod.POST, lambda.HttpMethod.DELETE],
        allowedHeaders: ['Content-Type', 'Mcp-Session-Id', 'Authorization'],
        exposedHeaders: ['Mcp-Session-Id'],
      },
    });

    new cdk.CfnOutput(this, 'McpServerUrl', {
      value: functionUrl.url,
      description: 'MCP server endpoint — add to AliveMCP for monitoring',
    });
  }
}

Monitoring Lambda-hosted MCP servers

CloudWatch provides Lambda invocation metrics (duration, error count, throttle count) but these are insufficient for MCP server monitoring. Three failure modes are invisible to CloudWatch:

Failure mode	CloudWatch metric	What MCP client sees
Lambda Web Adapter not installed	Duration normal, no errors	200 with non-MCP response body
MCP SDK version mismatch	No error — handler runs	`initialize` returns wrong `protocolVersion`
Tool registration fails silently	No error — handler runs	`tools/list` returns empty array
IAM role missing DynamoDB access	Error rate spike after deploy	Specific tool calls fail, others succeed
Cold start exceeds client timeout	Duration spikes shown	Client sees connection timeout, not the Lambda metric
Function URL streaming disabled	No error	SSE response body is empty

AliveMCP runs the full MCP protocol probe — initialize, tools/list, a sentinel tool call — every 60 seconds from outside AWS. This catches protocol-layer failures that look like HTTP 200s to CloudWatch. For Lambda functions with provisioned concurrency, AliveMCP also detects the brief window (~30 seconds) after a new Lambda version is deployed before provisioned concurrency warms up on the new version.

# Verify your Lambda MCP server is protocol-compliant after every deploy
# Add this to your CDK/SAM deployment pipeline:

FUNCTION_URL=$(aws lambda get-function-url-config \
  --function-name McpServer \
  --query 'FunctionUrl' \
  --output text)

curl -s -X POST "$FUNCTION_URL/mcp" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","clientInfo":{"name":"deploy-check","version":"1.0"}}}' \
  | node -e "
    const d = JSON.parse(require('fs').readFileSync(0, 'utf8'));
    if (!d.result?.protocolVersion) { console.error('MCP init failed:', JSON.stringify(d)); process.exit(1); }
    console.log('MCP protocol OK — version:', d.result.protocolVersion);
  "

Session management across warm invocations

Lambda warm invocations reuse the same execution environment — the sessions Map initialized in init scope persists. This means MCP sessions from a previous invocation are available on the next warm invocation. This is mostly desirable (clients can reconnect to an existing session on a warm Lambda) but creates one edge case: a session Map that grows unboundedly if sessions are not explicitly cleaned up.

Implement TTL-based eviction in the session registry to prevent memory growth on long-lived warm Lambdas:

const SESSION_TTL_MS = 30 * 60 * 1000; // 30 minutes

class SessionRegistry {
  private sessions = new Map<string, { transport: StreamableHTTPServerTransport; lastUsed: number }>();

  get(id: string): StreamableHTTPServerTransport | undefined {
    const entry = this.sessions.get(id);
    if (!entry) return undefined;
    entry.lastUsed = Date.now();
    return entry.transport;
  }

  set(id: string, transport: StreamableHTTPServerTransport): void {
    this.sessions.set(id, { transport, lastUsed: Date.now() });
    this.evictStale();
  }

  private evictStale(): void {
    const cutoff = Date.now() - SESSION_TTL_MS;
    for (const [id, entry] of this.sessions) {
      if (entry.lastUsed < cutoff) {
        entry.transport.close?.();
        this.sessions.delete(id);
      }
    }
  }
}

const sessionRegistry = new SessionRegistry(); // In init scope

Frequently asked questions

Can I use SSE transport without Lambda Web Adapter?

Yes, using awslambda.streamifyResponse — the native Lambda response streaming API. You wrap your handler in awslambda.streamifyResponse(async (event, responseStream, context) => { ... }) and write to responseStream directly. This approach requires restructuring your server to use the raw Lambda event format instead of Express/Fastify. The Lambda Web Adapter is simpler if your codebase also runs on other platforms, but streamifyResponse has lower cold-start overhead (~5ms less) because it skips the adapter subprocess.

How does session persistence work across Lambda invocations?

Within a single warm Lambda instance, a global sessions Map persists between invocations. Lambda scales horizontally — multiple concurrent instances each have their own Map. A client that reconnects may land on a different instance (especially after auto-scaling) and find its session missing. Two solutions: (1) use stateless per-request sessions with sessionIdGenerator: undefined, relying on the client to pass all context in each request; or (2) store session state in ElastiCache (Redis) or DynamoDB so any Lambda instance can reconstruct the session from the ID. For most MCP tool use cases, stateless sessions are sufficient.

What's the right Lambda memory size for an MCP server?

Start with 1024MB. CPU allocation scales linearly with memory, so 1024MB gives 4x the CPU of 256MB. For a Node.js MCP server with typical dependencies, this brings cold start from ~1.5s to ~400ms. The memory cost difference is negligible: at $0.0000166667/GB-second, the difference between 256MB and 1024MB on a 100ms request is $0.0000013 — less than a rounding error. Only reduce memory if profiling shows your function consistently uses <300MB; do not optimize memory downward before measuring actual usage.

How do I handle VPC connectivity for MCP tools that need RDS or ElastiCache?

Configure the Lambda function with VPC settings pointing to the same VPC as your database. Be aware that Lambda VPC cold starts add 200–500ms (from ENI attachment) unless you have warm instances via provisioned concurrency. Use RDS Proxy between Lambda and RDS to avoid exhausting database connections — Lambda can scale to thousands of concurrent instances, each trying to open a database connection, which exceeds RDS connection limits. ElastiCache is safe to connect directly from Lambda without a proxy.

What does AliveMCP detect that CloudWatch doesn't?

CloudWatch measures Lambda execution from inside AWS: invocation count, duration, error count (unhandled exceptions), throttle count. AliveMCP measures from outside: does the endpoint respond to the MCP initialize JSON-RPC call with a valid protocol response? The gap catches: Lambda Web Adapter misconfiguration (Lambda runs but adapter doesn't proxy correctly), MCP protocol version mismatches (Lambda returns 200 with wrong JSON structure), tool registration failures (tools/list returns empty), and Function URL streaming disabled (SSE events never arrive at the client). CloudWatch sees a successful Lambda invocation in all these cases; AliveMCP sees the protocol failure.