Guide · Cloud Deployment
MCP server on AWS Lambda
AWS Lambda is the most popular serverless platform in production environments, but it was designed for short-lived request-response workloads — not for the long-lived SSE connections that MCP's HTTP transport establishes. Deploying an MCP server on Lambda requires the Lambda Web Adapter to bridge the gap between Lambda's invocation model and the streaming HTTP responses that MCP clients expect. This guide covers the complete deployment path: Lambda Web Adapter configuration, Function URL vs API Gateway decision, connection object lifecycle across cold and warm starts, cold-start mitigation with provisioned concurrency, SAM and CDK deployment patterns, and the monitoring blind spots that make external protocol probing essential for Lambda-hosted MCP servers.
TL;DR
Use the AWS Lambda Web Adapter extension layer — it runs your existing HTTP server (Express, Fastify, Hono) inside Lambda without modification and correctly handles SSE streaming. Use a Function URL with InvokeMode: RESPONSE_STREAM for direct SSE support; API Gateway v2 (HTTP API) works but adds ~2ms latency and requires careful timeout configuration. Initialize shared objects (AWS SDK clients, database connections) in the init scope outside the handler so they survive across warm invocations. Monitor with AliveMCP to catch Lambda execution failures that appear as silent timeouts to MCP clients — CloudWatch only shows invocation metrics, not protocol-level failure modes.
Why Lambda needs the Web Adapter
Lambda's native invocation model sends a JSON event payload to your handler and expects a JSON response object. Standard HTTP streaming (chunked transfer encoding, SSE) does not fit this model directly. Before the Lambda Web Adapter, developers worked around this with API Gateway's response streaming mode — a custom awslambda.streamifyResponse wrapper that bypasses the JSON event model and streams raw bytes. This approach works but requires restructuring your server code specifically for Lambda.
The AWS Lambda Web Adapter is a Lambda extension that solves the impedance mismatch cleanly: it starts your HTTP server as a subprocess, forwards Lambda invocations as HTTP requests to localhost:8080, and streams the HTTP response back to the caller. Your MCP server code stays identical to what you'd deploy on Railway or a VPS — the adapter handles the translation.
| Approach | Code changes required | SSE streaming | Cold start overhead |
|---|---|---|---|
| Lambda native (JSON event) | Major rewrite — no HTTP framework | No | Lowest |
awslambda.streamifyResponse | Moderate — wrap handler | Yes (raw bytes) | Low |
| Lambda Web Adapter | None — run existing HTTP server | Yes (full SSE) | ~5ms extra (adapter boot) |
For MCP servers where the codebase also runs on other platforms, the Lambda Web Adapter is the right choice: zero code changes, full SSE support, and the adapter cost (~5ms on cold start) is negligible compared to Lambda's baseline cold start time (500ms–2s for Node.js depending on memory allocation).
Adding Lambda Web Adapter to your deployment
The adapter ships as a Lambda layer. Add it to your function and set the required environment variable that tells it where your server listens:
# SAM template.yaml — Lambda Web Adapter deployment
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Globals:
Function:
Runtime: nodejs22.x
Architectures: [arm64] # Graviton2 — 20% cheaper, faster cold starts
MemorySize: 512
Timeout: 30 # Lambda max for HTTP API; 15 min max for async
Resources:
McpServerFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: dist/
Handler: run.sh # Shell script that starts your HTTP server
Layers:
# Lambda Web Adapter layer ARN — us-east-1, arm64, latest version
- arn:aws:lambda:us-east-1:753240598075:layer:LambdaAdapterLayerArm64:24
Environment:
Variables:
AWS_LAMBDA_EXEC_WRAPPER: /opt/bootstrap # Tells Lambda to use the adapter
PORT: "8080" # Adapter expects your server here
MCP_SERVER_VERSION: "1.0.0"
FunctionUrlConfig:
AuthType: NONE # Or AWS_IAM for authenticated access
InvokeMode: RESPONSE_STREAM # Required for SSE
Cors:
AllowOrigins: ["https://yourdomain.com"]
AllowMethods: [GET, POST, DELETE, OPTIONS]
AllowHeaders: [Content-Type, Mcp-Session-Id, Authorization]
ExposeHeaders: [Mcp-Session-Id]
The run.sh handler script starts your Node.js server as a foreground process — Lambda Web Adapter keeps it running and proxies requests to it:
#!/bin/bash
# run.sh — handler script for Lambda Web Adapter
exec node server.js
Your server.js is unchanged from any other deployment target — it just binds to process.env.PORT (or 8080 if unset) and handles MCP requests via StreamableHTTPServerTransport.
Function URL vs API Gateway: which to use
MCP servers deployed on Lambda have two front-door options. The right choice depends on your SSE requirements, authentication needs, and traffic pattern:
| Feature | Function URL | API Gateway v2 (HTTP API) | API Gateway v1 (REST API) |
|---|---|---|---|
| SSE streaming | Yes — with RESPONSE_STREAM | Yes — with response streaming enabled | No — 6MB response limit, no chunked encoding |
| Added latency | ~0ms (direct) | ~1–3ms | ~5–10ms |
| Cost | Free (Lambda invocation only) | $1/million requests | $3.50/million requests |
| Auth options | None or AWS_IAM | JWT authorizer, Lambda authorizer, IAM | Cognito, Lambda authorizer, API keys, IAM |
| Custom domain | Not supported (URL is CloudFront-routable) | Yes — Route 53 + ACM | Yes — Route 53 + ACM |
| WAF integration | Via CloudFront only | Yes — native WAF v2 association | Yes — native WAF v1 association |
| Timeout max | 15 minutes | 29 seconds | 29 seconds |
For most MCP servers: use a Function URL for simplicity and zero added cost during development, then add an API Gateway HTTP API in front when you need a custom domain, WAF rules, or multi-tier auth. If you use API Gateway, configure a 29-second timeout (the maximum) — MCP initialize plus the first tool call must complete within this window, or clients will see a 504 Gateway Timeout that looks like a network failure.
Handler scope vs init scope: where to put connections
Lambda reuses warm execution environments across invocations — the code outside your handler function runs once on cold start and is cached. Code inside the handler runs on every invocation. Placing expensive objects in the wrong scope causes either redundant work (re-creating SDK clients on every request) or cold-start penalty amplification (creating slow connections inside the handler):
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { S3Client } from '@aws-sdk/client-s3';
import express from 'express';
import { z } from 'zod';
// ── INIT SCOPE: runs once on cold start, reused on warm invocations ──
const dynamoClient = new DynamoDBClient({ region: process.env.AWS_REGION });
const s3Client = new S3Client({ region: process.env.AWS_REGION });
// Session registry survives across warm invocations — intentional
const sessions = new Map();
const app = express();
app.use(express.json());
// Register MCP tools once — they reference the shared clients above
const mcpServer = new McpServer({ name: 'my-mcp-server', version: '1.0.0' });
mcpServer.tool('query_table', 'Query DynamoDB table', {
tableName: z.string(),
key: z.record(z.string()),
}, async ({ tableName, key }) => {
// Uses the shared DynamoDBClient initialized in init scope
const item = await dynamoClient.send(new GetItemCommand({ TableName: tableName, Key: key }));
return { content: [{ type: 'text', text: JSON.stringify(item.Item ?? null) }] };
});
// ── HANDLER SCOPE: runs on every invocation ──
app.post('/mcp', async (req, res) => {
const sessionId = req.headers['mcp-session-id'];
let transport = sessions.get(sessionId);
if (!transport) {
transport = new StreamableHTTPServerTransport({ sessionIdGenerator: () => crypto.randomUUID() });
const sessionServer = new McpServer({ name: 'my-mcp-server', version: '1.0.0' });
// Tool registration references shared clients — lightweight
registerTools(sessionServer, dynamoClient, s3Client);
await sessionServer.connect(transport);
if (transport.sessionId) sessions.set(transport.sessionId, transport);
}
await transport.handleRequest(req, res, req.body);
});
app.listen(process.env.PORT ?? 8080);
The critical rule: AWS SDK clients, database connection pools, and the Express app go in init scope. Per-session MCP transport objects go in the handler scope (or in a Map that lives in init scope, if you want to reuse sessions across warm invocations). An MCP server that creates a new DynamoDBClient on every request will appear slow on warm invocations — the SDK client initialization is ~50ms, which doubles the latency of fast tool calls.
Cold start management
Lambda cold starts for Node.js range from 200ms on small functions with minimal dependencies to 2s+ on functions with large SDK bundles. For MCP servers, cold starts cause two problems: the initialize handshake takes longer than the client's timeout, and clients that retry on timeout may exhaust their retry budget before the Lambda warms up.
Three techniques reduce cold start impact for MCP servers:
1. Tree-shake your bundle. Only import what you use from the AWS SDK — each service client package is ~200KB. Use esbuild or rollup with tree-shaking:
# package.json build script — bundle and tree-shake
{
"scripts": {
"build": "esbuild src/server.ts --bundle --platform=node --target=node22 --outfile=dist/server.js --external:@aws-sdk/* --minify"
}
}
# Note: @aws-sdk/* is external because Lambda includes the v3 SDK in the runtime
# Only bundle packages NOT included in the Lambda runtime
2. Allocate more memory. Lambda CPU is proportional to memory. A 512MB function has twice the CPU of a 256MB function — cold starts are faster even if you never actually use the extra memory:
# CloudFormation / SAM
MemorySize: 1024 # Sweet spot for most MCP servers — halves cold start vs 256MB
3. Provisioned concurrency for zero cold starts. For MCP servers that need guaranteed low latency, provisioned concurrency pre-warms N Lambda instances so they're always ready:
McpServerAlias:
Type: AWS::Lambda::Alias
Properties:
FunctionName: !Ref McpServerFunction
FunctionVersion: !GetAtt McpServerFunction.Version
Name: production
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 2 # Keep 2 warm instances; auto-scales above this
Provisioned concurrency costs approximately $0.015/hour per GB of memory per provisioned instance. For a 512MB function with 2 provisioned instances: ~$0.015/hour = $11/month — less than a small VPS, with none of the instance management.
Lambda@Edge limitations for MCP
Lambda@Edge runs at CloudFront edge locations — closer to users, sub-millisecond routing. But it has hard limits that make it incompatible with MCP servers:
| Constraint | Lambda@Edge value | MCP server requirement |
|---|---|---|
| Max execution time (viewer response) | 1 second | SSE sessions last minutes |
| Max execution time (origin response) | 30 seconds | Tool calls can take 10s+ |
| Response body streaming | Not supported | SSE requires streaming |
| VPC access | Not supported | RDS, ElastiCache need VPC |
| Environment variables size | 4KB total | Secrets + config often exceed 4KB |
| Deployment region | us-east-1 only | Deploy from any region |
Use CloudFront Functions for lightweight edge routing (authentication token validation, request rewriting) in front of a regional Lambda, not Lambda@Edge itself. The MCP server runs in a regional Lambda; CloudFront sits in front for CDN caching of static responses and TLS termination at edge locations.
CDK deployment pattern
For teams using AWS CDK, the Lambda Web Adapter simplifies deployment to a stack that's nearly identical to any other Lambda HTTP service:
import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as lambdaNodejs from 'aws-cdk-lib/aws-lambda-nodejs';
import { Construct } from 'constructs';
export class McpServerStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const mcpFunction = new lambdaNodejs.NodejsFunction(this, 'McpServer', {
entry: 'src/server.ts',
runtime: lambda.Runtime.NODEJS_22_X,
architecture: lambda.Architecture.ARM_64,
memorySize: 1024,
timeout: cdk.Duration.seconds(30),
bundling: {
minify: true,
externalModules: ['@aws-sdk/*'], // Use Lambda's bundled SDK
},
environment: {
MCP_SERVER_VERSION: '1.0.0',
NODE_OPTIONS: '--enable-source-maps',
},
});
// Lambda Web Adapter layer (arm64, us-east-1)
const adapterLayer = lambda.LayerVersion.fromLayerVersionArn(
this,
'LambdaAdapterLayer',
`arn:aws:lambda:${this.region}:753240598075:layer:LambdaAdapterLayerArm64:24`
);
mcpFunction.addLayers(adapterLayer);
mcpFunction.addEnvironment('AWS_LAMBDA_EXEC_WRAPPER', '/opt/bootstrap');
mcpFunction.addEnvironment('PORT', '8080');
// Function URL with streaming for SSE support
const functionUrl = mcpFunction.addFunctionUrl({
authType: lambda.FunctionUrlAuthType.NONE,
invokeMode: lambda.InvokeMode.RESPONSE_STREAM,
cors: {
allowedOrigins: ['https://yourdomain.com'],
allowedMethods: [lambda.HttpMethod.GET, lambda.HttpMethod.POST, lambda.HttpMethod.DELETE],
allowedHeaders: ['Content-Type', 'Mcp-Session-Id', 'Authorization'],
exposedHeaders: ['Mcp-Session-Id'],
},
});
new cdk.CfnOutput(this, 'McpServerUrl', {
value: functionUrl.url,
description: 'MCP server endpoint — add to AliveMCP for monitoring',
});
}
}
Monitoring Lambda-hosted MCP servers
CloudWatch provides Lambda invocation metrics (duration, error count, throttle count) but these are insufficient for MCP server monitoring. Three failure modes are invisible to CloudWatch:
| Failure mode | CloudWatch metric | What MCP client sees |
|---|---|---|
| Lambda Web Adapter not installed | Duration normal, no errors | 200 with non-MCP response body |
| MCP SDK version mismatch | No error — handler runs | initialize returns wrong protocolVersion |
| Tool registration fails silently | No error — handler runs | tools/list returns empty array |
| IAM role missing DynamoDB access | Error rate spike after deploy | Specific tool calls fail, others succeed |
| Cold start exceeds client timeout | Duration spikes shown | Client sees connection timeout, not the Lambda metric |
| Function URL streaming disabled | No error | SSE response body is empty |
AliveMCP runs the full MCP protocol probe — initialize, tools/list, a sentinel tool call — every 60 seconds from outside AWS. This catches protocol-layer failures that look like HTTP 200s to CloudWatch. For Lambda functions with provisioned concurrency, AliveMCP also detects the brief window (~30 seconds) after a new Lambda version is deployed before provisioned concurrency warms up on the new version.
# Verify your Lambda MCP server is protocol-compliant after every deploy
# Add this to your CDK/SAM deployment pipeline:
FUNCTION_URL=$(aws lambda get-function-url-config \
--function-name McpServer \
--query 'FunctionUrl' \
--output text)
curl -s -X POST "$FUNCTION_URL/mcp" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","clientInfo":{"name":"deploy-check","version":"1.0"}}}' \
| node -e "
const d = JSON.parse(require('fs').readFileSync(0, 'utf8'));
if (!d.result?.protocolVersion) { console.error('MCP init failed:', JSON.stringify(d)); process.exit(1); }
console.log('MCP protocol OK — version:', d.result.protocolVersion);
"
Session management across warm invocations
Lambda warm invocations reuse the same execution environment — the sessions Map initialized in init scope persists. This means MCP sessions from a previous invocation are available on the next warm invocation. This is mostly desirable (clients can reconnect to an existing session on a warm Lambda) but creates one edge case: a session Map that grows unboundedly if sessions are not explicitly cleaned up.
Implement TTL-based eviction in the session registry to prevent memory growth on long-lived warm Lambdas:
const SESSION_TTL_MS = 30 * 60 * 1000; // 30 minutes
class SessionRegistry {
private sessions = new Map<string, { transport: StreamableHTTPServerTransport; lastUsed: number }>();
get(id: string): StreamableHTTPServerTransport | undefined {
const entry = this.sessions.get(id);
if (!entry) return undefined;
entry.lastUsed = Date.now();
return entry.transport;
}
set(id: string, transport: StreamableHTTPServerTransport): void {
this.sessions.set(id, { transport, lastUsed: Date.now() });
this.evictStale();
}
private evictStale(): void {
const cutoff = Date.now() - SESSION_TTL_MS;
for (const [id, entry] of this.sessions) {
if (entry.lastUsed < cutoff) {
entry.transport.close?.();
this.sessions.delete(id);
}
}
}
}
const sessionRegistry = new SessionRegistry(); // In init scope
Frequently asked questions
Can I use SSE transport without Lambda Web Adapter?
Yes, using awslambda.streamifyResponse — the native Lambda response streaming API. You wrap your handler in awslambda.streamifyResponse(async (event, responseStream, context) => { ... }) and write to responseStream directly. This approach requires restructuring your server to use the raw Lambda event format instead of Express/Fastify. The Lambda Web Adapter is simpler if your codebase also runs on other platforms, but streamifyResponse has lower cold-start overhead (~5ms less) because it skips the adapter subprocess.
How does session persistence work across Lambda invocations?
Within a single warm Lambda instance, a global sessions Map persists between invocations. Lambda scales horizontally — multiple concurrent instances each have their own Map. A client that reconnects may land on a different instance (especially after auto-scaling) and find its session missing. Two solutions: (1) use stateless per-request sessions with sessionIdGenerator: undefined, relying on the client to pass all context in each request; or (2) store session state in ElastiCache (Redis) or DynamoDB so any Lambda instance can reconstruct the session from the ID. For most MCP tool use cases, stateless sessions are sufficient.
What's the right Lambda memory size for an MCP server?
Start with 1024MB. CPU allocation scales linearly with memory, so 1024MB gives 4x the CPU of 256MB. For a Node.js MCP server with typical dependencies, this brings cold start from ~1.5s to ~400ms. The memory cost difference is negligible: at $0.0000166667/GB-second, the difference between 256MB and 1024MB on a 100ms request is $0.0000013 — less than a rounding error. Only reduce memory if profiling shows your function consistently uses <300MB; do not optimize memory downward before measuring actual usage.
How do I handle VPC connectivity for MCP tools that need RDS or ElastiCache?
Configure the Lambda function with VPC settings pointing to the same VPC as your database. Be aware that Lambda VPC cold starts add 200–500ms (from ENI attachment) unless you have warm instances via provisioned concurrency. Use RDS Proxy between Lambda and RDS to avoid exhausting database connections — Lambda can scale to thousands of concurrent instances, each trying to open a database connection, which exceeds RDS connection limits. ElastiCache is safe to connect directly from Lambda without a proxy.
What does AliveMCP detect that CloudWatch doesn't?
CloudWatch measures Lambda execution from inside AWS: invocation count, duration, error count (unhandled exceptions), throttle count. AliveMCP measures from outside: does the endpoint respond to the MCP initialize JSON-RPC call with a valid protocol response? The gap catches: Lambda Web Adapter misconfiguration (Lambda runs but adapter doesn't proxy correctly), MCP protocol version mismatches (Lambda returns 200 with wrong JSON structure), tool registration failures (tools/list returns empty), and Function URL streaming disabled (SSE events never arrive at the client). CloudWatch sees a successful Lambda invocation in all these cases; AliveMCP sees the protocol failure.
Further reading
- MCP server on Cloudflare Workers — V8 isolates and Durable Objects
- MCP server on Railway — persistent containers and health checks
- MCP server on Fly.io — global regions and persistent volumes
- MCP server on Render — render.yaml and zero-downtime deploys
- MCP server zero-downtime deployment — session drain and rolling updates
- MCP server health checks — protocol probes and readiness verification
- AliveMCP — continuous protocol monitoring for MCP servers