Guide · AWS
MCP server on AWS
AWS offers three practical options for deploying MCP servers: ECS Fargate (persistent containers behind an Application Load Balancer), App Runner (managed container hosting with less configuration), and Lambda (serverless, but with fundamental limitations for MCP sessions). ECS Fargate is the right choice for production MCP servers that need session affinity, IAM-based credential access, and predictable performance. App Runner works well for simpler cases. Lambda works only for fully stateless, short-lived tool handlers.
TL;DR
Use ECS Fargate for production: define a task with your MCP server container, put an Application Load Balancer in front with target group stickiness enabled, use an IAM task role instead of hardcoded credentials, and store secrets in AWS Secrets Manager (not environment variables in the task definition). For simpler cases without custom networking requirements, App Runner auto-scales and manages the load balancer for you. Lambda is not suitable for MCP servers that maintain session state — use it only for stateless tool handlers with the same caveats as Vercel. Monitor the public ALB endpoint with AliveMCP for external protocol verification.
ECS Fargate task definition
A minimal ECS task definition for an MCP server:
{
"family": "mcp-server",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::ACCOUNT:role/mcp-server-task-role",
"containerDefinitions": [
{
"name": "mcp-server",
"image": "ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-server:latest",
"portMappings": [
{ "containerPort": 3000, "protocol": "tcp" }
],
"environment": [
{ "name": "NODE_ENV", "value": "production" },
{ "name": "PORT", "value": "3000" }
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:REGION:ACCOUNT:secret:mcp-server/database-url"
},
{
"name": "REDIS_URL",
"valueFrom": "arn:aws:secretsmanager:REGION:ACCOUNT:secret:mcp-server/redis-url"
}
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -sf http://localhost:3000/healthz || exit 1"],
"interval": 15,
"timeout": 5,
"retries": 3,
"startPeriod": 20
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/mcp-server",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"stopTimeout": 60
}
]
}
Key points: taskRoleArn grants permissions to your running container (e.g., S3 read, DynamoDB write) without hardcoded credentials — the container gets short-lived credentials from IMDS. executionRoleArn grants permissions to ECS to pull the image and read Secrets Manager values. secrets injects Secrets Manager values as environment variables at container start, with values never appearing in CloudTrail logs or the ECS console. stopTimeout: 60 gives the container 60 seconds to drain sessions after SIGTERM.
IAM task role — no hardcoded credentials
The taskRoleArn grants AWS permissions to your running MCP server without hardcoded access keys. Create a task role with least-privilege permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-mcp-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "arn:aws:secretsmanager:*:*:secret:mcp-server/*"
}
]
}
In your Node.js code, the AWS SDK automatically picks up task role credentials from IMDS — no configuration needed:
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';
// No credentials needed — SDK gets them from IMDS via task role
const s3 = new S3Client({ region: 'us-east-1' });
server.tool('get-document', { key: z.string() }, async ({ key }) => {
const cmd = new GetObjectCommand({ Bucket: 'my-mcp-bucket', Key: key });
const response = await s3.send(cmd);
const body = await response.Body?.transformToString();
return { content: [{ type: 'text', text: body ?? '' }] };
});
Application Load Balancer with session stickiness
For MCP servers that maintain per-session in-memory state, configure ALB target group stickiness so requests from the same session always route to the same ECS task:
# AWS CLI — enable stickiness on the target group
aws elbv2 modify-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:... \
--attributes \
Key=stickiness.enabled,Value=true \
Key=stickiness.type,Value=lb_cookie \
Key=stickiness.lb_cookie.duration_seconds,Value=3600
The ALB sets a cookie (AWSALB) on the first response. Subsequent requests from the same client include this cookie, and the ALB routes them to the same target. Session duration should match the longest expected MCP session — 3600 seconds (1 hour) is a reasonable default.
Configure the ALB health check on the target group to hit /healthz:
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:... \
--health-check-path /healthz \
--health-check-interval-seconds 15 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3
The ALB only sends traffic to healthy targets. If an ECS task fails its health check, the ALB stops routing to it — but existing sticky sessions will receive connection errors on the next request. Design your MCP clients to handle reconnection gracefully.
AWS App Runner — simpler alternative
AWS App Runner manages the container orchestration, load balancer, auto-scaling, and TLS certificate for you. It's less configurable than ECS Fargate but requires far less setup. Suitable for MCP servers that don't need custom VPC networking or fine-grained ALB configuration:
# apprunner.yaml (infrastructure as code via AWS CDK or CloudFormation)
Type: AWS::AppRunner::Service
Properties:
ServiceName: mcp-server
SourceConfiguration:
ImageRepository:
ImageIdentifier: ACCOUNT.dkr.ecr.REGION.amazonaws.com/mcp-server:latest
ImageRepositoryType: ECR
ImageConfiguration:
Port: "3000"
RuntimeEnvironmentVariables:
NODE_ENV: production
PORT: "3000"
InstanceConfiguration:
Cpu: 0.5 vCPU
Memory: 1 GB
HealthCheckConfiguration:
Protocol: HTTP
Path: /healthz
HealthyThreshold: 2
UnhealthyThreshold: 3
Interval: 10
App Runner limitations relevant to MCP servers: no session stickiness (all requests go to any healthy instance — requires stateless or externalized session state), no access to VPC resources by default (requires App Runner VPC Connector for private Redis/RDS), and no persistent volumes (use S3 or EFS for durable storage).
Why Lambda doesn't work for most MCP servers
Lambda functions are invoked per-request and frozen between requests. The MCP session model — initialize → tools/list → tool calls → session close — requires state to persist across multiple HTTP requests. Lambda has three problems for this:
- No persistent SSE connections — Lambda can return streaming responses (via Function URLs with
RESPONSE_STREAMmode), but SSE-over-Lambda has a hard 15-minute function timeout and each invocation is independent. - Cold starts add latency to initialize — A Lambda cold start for a Node.js MCP server is 100–800ms. This adds to every new session's initialize handshake.
- In-memory session state dies with the function — Between two tool calls in the same session, Lambda may invoke a different instance. Any in-memory state from the first call is gone.
Lambda works if your MCP server is genuinely stateless: each tool call is a pure function of its inputs, with no session context or accumulated history needed. See MCP server on Vercel for the same tradeoffs in a more developer-friendly serverless platform.
CloudWatch logging and metrics
The awslogs log driver in the task definition sends all container stdout/stderr to CloudWatch Logs. Structure your logs as JSON for easy filtering:
// Structured logging — CloudWatch can filter on these fields
console.log(JSON.stringify({
level: 'info',
event: 'tool_call',
tool: toolName,
sessionId,
durationMs: Date.now() - start,
success: true
}));
Create CloudWatch metric filters on the log group to extract metrics like tool call duration and error rate. Use these to set CloudWatch Alarms that notify your team when error rates spike.
ECS also publishes container-level CPU and memory metrics to CloudWatch automatically. Set up an alarm on memory utilization > 80% — MCP servers that accumulate session context can exhaust memory if session cleanup isn't working correctly.
External monitoring beyond CloudWatch
CloudWatch metrics and alarms tell you whether your ECS tasks are running, consuming CPU/memory, and logging errors. They don't tell you whether the MCP protocol is functioning correctly from outside AWS. A misconfigured ALB listener rule, an expired ACM certificate, or a DNS propagation issue causes all external MCP clients to fail while CloudWatch shows healthy tasks.
Add your ALB domain (https://mcp.yourdomain.com) or App Runner URL to AliveMCP. AliveMCP probes from outside AWS, running the full initialize → tools/list sequence over HTTPS, and alerts when the protocol layer fails — including infrastructure failures that CloudWatch can't see. See MCP server observability for combining CloudWatch, distributed tracing (X-Ray), and external monitoring into a complete picture.
Related questions
How do I set up auto-scaling for an ECS MCP server?
Use ECS Service Auto Scaling with a custom CloudWatch metric for active session count — or use ALB RequestCountPerTarget as a scaling metric proxy. Set the target value to leave headroom before the connection pool is exhausted. Add a scale-out cooldown of 60s (fast scale-out) and a scale-in cooldown of 300s (slow scale-in) to avoid oscillation. During scale-in, ECS sends SIGTERM to the container — your server's drain logic should close new sessions and wait for active ones to finish before exiting.
Should I use ECS on EC2 or Fargate?
Use Fargate for MCP servers unless you have a specific reason to manage EC2 instances (GPU workloads, specialized instance types). Fargate eliminates EC2 instance management: no patching, no capacity planning, no AMI updates. Fargate costs about 20–30% more per vCPU/memory than equivalent EC2, but the operational savings outweigh the cost difference for most teams. Reserved Fargate capacity (Fargate Savings Plans) reduces the cost premium to ~10% over EC2 On-Demand.
How do I share a Redis instance between multiple ECS services?
Provision an ElastiCache Redis cluster in the same VPC as your ECS tasks. Configure ECS tasks to run in private subnets with a security group that allows outbound port 6379 to the ElastiCache security group. Update the ElastiCache security group to allow inbound 6379 from the ECS task security group. Use the ElastiCache cluster's primary endpoint as REDIS_URL in Secrets Manager. All ECS tasks in the same VPC can reach the same Redis cluster — useful for shared session state across multiple MCP server tasks.
Further reading
- MCP server deployment — transport selection and rolling-restart safety
- MCP server Docker — Dockerfile and signal handling
- MCP server health checks — the full initialize probe sequence
- MCP server on Kubernetes — readiness probes, PDBs, and session affinity
- MCP server on Vercel — serverless limitations and what actually works
- MCP server multi-region deployment
- MCP server observability — metrics, tracing, and external probing
- AliveMCP — external monitoring for your AWS-hosted MCP server