Guide · Multi-Cloud Deployment

MCP server multi-cloud deployment

MCP servers are HTTP services — which means they deploy to every cloud platform that runs containers or serverless functions. AWS Lambda, GCP Cloud Run, Azure Container Apps, Fly.io, Railway, and Render all work. The trick is building your server so the same artifact deploys to any of them without code changes. This guide covers the vendor-neutral deployment pattern, cold start behavior on each platform, secret management across providers, and how to use AliveMCP to monitor your server regardless of where it's running.

TL;DR

Build your MCP server as a standard HTTP server that reads its port from PORT environment variable and its secrets from environment variables — not platform-specific APIs. Package it as a Docker container. This artifact deploys unchanged to GCP Cloud Run, Azure Container Apps, AWS App Runner, Fly.io, and Railway. For AWS Lambda, add a thin Lambda adapter (30 lines) that converts API Gateway events to HTTP requests. Use AliveMCP for cloud-agnostic protocol monitoring — it probes the MCP endpoint from outside any cloud, catching deployment failures that internal cloud metrics miss.

The vendor-neutral MCP server pattern

A vendor-neutral MCP server has three properties: reads configuration from environment variables, listens on a configurable port, and is packaged as a Docker container. This single artifact runs unchanged on every major cloud:

// server.ts — vendor-neutral MCP HTTP server
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import http from "http";
import { z } from "zod";

// All config from environment — no platform-specific SDK calls
const PORT = parseInt(process.env.PORT ?? "3000", 10);
const DB_URL = process.env.DATABASE_URL ?? (() => { throw new Error("DATABASE_URL required"); })();
const API_KEY = process.env.API_KEY ?? (() => { throw new Error("API_KEY required"); })();

const server = new McpServer({ name: "my-mcp-server", version: "1.0.0" });

server.tool("lookup", "Look up a record by ID", { id: z.string() }, async ({ id }) => {
  const record = await db.findById(id);  // db initialized using DB_URL
  return { content: [{ type: "text", text: JSON.stringify(record) }] };
});

// Standard Node.js HTTP server — works on any platform
const httpServer = http.createServer(async (req, res) => {
  const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
  await server.connect(transport);
  await transport.handleRequest(req, res, await readBody(req));
});

httpServer.listen(PORT, () => {
  console.log(`MCP server listening on port ${PORT}`);
});

# Dockerfile — same image deploys to GCP, Azure, AWS App Runner, Fly.io
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json .
RUN npm ci --omit=dev
COPY . .
RUN npm run build

FROM node:22-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
# No ENV PORT — each platform sets it at runtime
CMD ["node", "dist/server.js"]

GCP Cloud Run

Cloud Run is the easiest deployment for MCP servers: deploy a container, get an HTTPS URL, done. It auto-scales to zero (saving cost on low traffic) and handles HTTPS termination automatically:

# Deploy to Cloud Run
gcloud auth login
gcloud config set project YOUR_PROJECT_ID

# Build and push the container
gcloud builds submit --tag gcr.io/YOUR_PROJECT_ID/mcp-server:latest

# Deploy (auto-scaling: 0 to 100 instances)
gcloud run deploy mcp-server \
  --image gcr.io/YOUR_PROJECT_ID/mcp-server:latest \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars "PORT=8080" \
  --set-secrets "DATABASE_URL=mcp-db-url:latest,API_KEY=mcp-api-key:latest" \
  --min-instances 1    # avoid cold starts — $0.00002/instance/second at rest
  --max-instances 10

# Cloud Run sets PORT=8080 by default; your server should respect it
# Output: Service URL https://mcp-server-xxxxx-uc.a.run.app

Set --min-instances 1 to keep one instance warm — cold starts on Cloud Run are 200–800ms for Node.js (worse for Python with large imports). For MCP servers with low but steady traffic, one warm instance costs about $5/month and eliminates cold-start latency that can confuse AliveMCP protocol probes. For high-traffic servers, let it scale to zero and pay only for requests.

Azure Container Apps

Azure Container Apps is Azure's equivalent of Cloud Run — managed containers with auto-scaling, built-in HTTPS, and no Kubernetes cluster to manage:

# Deploy to Azure Container Apps
az login
az group create --name mcp-rg --location eastus
az acr create --resource-group mcp-rg --name mcpregistry --sku Basic
az acr build --registry mcpregistry --image mcp-server:latest .

# Create the Container Apps environment (one-time)
az containerapp env create \
  --name mcp-env \
  --resource-group mcp-rg \
  --location eastus

# Deploy the MCP server
az containerapp create \
  --name mcp-server \
  --resource-group mcp-rg \
  --environment mcp-env \
  --image mcpregistry.azurecr.io/mcp-server:latest \
  --target-port 3000 \
  --ingress external \
  --min-replicas 1 \
  --max-replicas 10 \
  --env-vars "PORT=3000" \
  --secrets "db-url=secretref:DATABASE_URL" "api-key=secretref:API_KEY"

# Retrieve the FQDN
az containerapp show --name mcp-server --resource-group mcp-rg \
  --query "properties.configuration.ingress.fqdn" -o tsv

Azure Container Apps secrets are stored in Azure Key Vault or as container app secrets. Reference them with secretref: in the --secrets parameter — the value is injected as an environment variable at runtime. For MCP servers on Azure, use Managed Identity to authenticate to Key Vault instead of storing credential strings.

AWS Lambda + API Gateway

Lambda is the most popular serverless platform for MCP servers, but it uses an event/response model (not a raw HTTP server). Add a 30-line adapter that converts API Gateway proxy events to Node.js HTTP requests:

// lambda.ts — thin adapter wrapping the existing HTTP server
import { APIGatewayProxyHandlerV2 } from "aws-lambda";
import { IncomingMessage, ServerResponse } from "http";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { registerTools } from "./tools.js";  // same tools.ts used in non-Lambda server

const mcpServer = new McpServer({ name: "my-mcp-server", version: "1.0.0" });
registerTools(mcpServer);  // register tools once per cold start (module-level)

export const handler: APIGatewayProxyHandlerV2 = async (event) => {
  const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
  await mcpServer.connect(transport);

  // Convert Lambda event to a fake IncomingMessage
  const body = event.body ? (event.isBase64Encoded ? Buffer.from(event.body, "base64") : event.body) : "";
  const headers: Record<string, string> = {};
  for (const [k, v] of Object.entries(event.headers ?? {})) {
    if (v) headers[k.toLowerCase()] = v;
  }

  // Collect response via writable stream
  let statusCode = 200;
  const responseHeaders: Record<string, string> = {};
  let responseBody = "";

  // Use transport's handleRequest with mock req/res
  const result = await transport.handlePost(headers, body);
  return {
    statusCode: result.status ?? 200,
    headers: result.headers ?? { "Content-Type": "application/json" },
    body: typeof result.body === "string" ? result.body : JSON.stringify(result.body),
  };
};

# Serverless Framework deployment (serverless.yml)
service: mcp-server
provider:
  name: aws
  runtime: nodejs22.x
  region: us-east-1
  environment:
    DATABASE_URL: ${ssm:/mcp/database-url}
    API_KEY: ${ssm:/mcp/api-key}
  iam:
    role:
      statements:
        - Effect: Allow
          Action: ["ssm:GetParameter"]
          Resource: "arn:aws:ssm:us-east-1:*:parameter/mcp/*"
functions:
  mcp:
    handler: dist/lambda.handler
    url: true   # Lambda Function URL (no API Gateway needed for simple cases)
    events:
      - httpApi:
          path: /{proxy+}
          method: ANY

Lambda cold starts for Node.js 22 are 100–400ms with a compiled TypeScript bundle. Use esbuild or Webpack to bundle everything into a single file — this reduces the Lambda package size and cuts cold start time. For Python Lambda with ML imports, cold starts can be 2–5 seconds; use Lambda SnapStart or provisioned concurrency if latency matters.

Fly.io

Fly.io deploys Docker containers globally across 35 regions with automatic HTTPS, persistent volumes, and predictable pricing (per-minute, not per-request). It's the simplest option for MCP servers that need global distribution without the complexity of Kubernetes:

# fly.toml — Fly.io configuration
app = "my-mcp-server"
primary_region = "iad"  # US East (IAD) as home region

[build]
  dockerfile = "Dockerfile"

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = "stop"   # stop idle machines to save cost
  auto_start_machines = true    # restart on traffic
  min_machines_running = 1      # keep 1 warm in primary region

[env]
  PORT = "3000"
  # Secrets set via: fly secrets set DATABASE_URL=... API_KEY=...
  # Never put secret values in fly.toml

[[vm]]
  size = "shared-cpu-1x"   # 256MB RAM — sufficient for most MCP servers
  memory = "256mb"

# Deploy workflow
fly launch --no-deploy          # create app, don't deploy yet
fly secrets set DATABASE_URL="postgres://..." API_KEY="sk-..."
fly deploy                      # build and deploy

# Scale to 2 regions for redundancy
fly scale count 2 --region iad,fra

# Monitor deployment
fly logs --app my-mcp-server
fly status --app my-mcp-server

Fly.io's pricing model is per-minute (not per-request), making it cost-effective for MCP servers with steady but moderate traffic. With auto_stop_machines = "stop" and min_machines_running = 1, one machine runs continuously in the primary region (~$2–4/month for shared-cpu-1x 256MB) and additional machines start on demand in 200–500ms when needed.

Cold start comparison across platforms

Platform	Cold start (Node.js)	Cold start (Python)	Cold start (Go)	Keep-warm strategy
AWS Lambda	100–400ms	200ms–5s (imports)	50–200ms	Provisioned concurrency; SnapStart (Java/Python)
GCP Cloud Run	200–800ms	300ms–3s	100–300ms	min-instances 1 (~$5/month)
Azure Container Apps	200–600ms	300ms–2s	100–300ms	min-replicas 1
Fly.io	200–500ms (machine start)	300ms–2s	100–300ms	min_machines_running 1
Railway	Always-on (no cold start)	Always-on	Always-on	No scale-to-zero by default

AliveMCP's 60-second probe interval means a single cold start won't trigger a false alert (the probe has a timeout configured in your monitor settings — set it to 5s for serverless platforms to avoid false positives during cold starts). Repeated cold starts within a 60-second window would indicate a restart loop, which is a real failure AliveMCP should catch.

Secret management across clouds

Each cloud platform has its own secret store. The vendor-neutral pattern: always inject secrets as environment variables, never use platform-specific SDK calls to read secrets at runtime. This makes your server code identical across platforms:

Platform	Secret store	Injection method	CLI command
GCP Cloud Run	Secret Manager	--set-secrets at deploy time	`gcloud secrets create`
Azure Container Apps	Key Vault or inline secrets	secretref: in deployment config	`az keyvault secret set`
AWS Lambda	Parameter Store (SSM) or Secrets Manager	${ssm:/path} in serverless.yml	`aws ssm put-parameter`
Fly.io	Fly secrets (encrypted)	fly secrets set at deploy time	`fly secrets set KEY=VALUE`

For teams managing the same MCP server across multiple clouds, Doppler or Infisical provides a single secrets source of truth with per-provider sync. This avoids the "four copies of the same secret" problem and gives you a single audit log for secret access across all environments.

Cloud-agnostic monitoring with AliveMCP

The main benefit of external protocol monitoring is that it works identically regardless of which cloud your MCP server runs on. AliveMCP probes the HTTPS URL your clients use — it doesn't know or care whether the response comes from Cloud Run, Lambda, or Fly.io. This gives you:

Unified alert channel — one alerting rule covers your server on any cloud. Move from GCP to AWS, alerts continue without reconfiguration.
Migration validation — run AliveMCP against both the old URL and the new URL during a cloud migration. Compare response times and tools/list hashes to verify the new deployment is protocol-identical before cutting over DNS.
Cloud platform vs server failures — if AliveMCP alerts but your cloud platform's internal health check shows green, the issue is in your server code or MCP protocol layer, not in the cloud platform's infrastructure.

# Verify protocol correctness on any cloud platform before going live
CLOUD_RUN_URL="https://mcp-server-xxxxx-uc.a.run.app"
FLY_URL="https://my-mcp-server.fly.dev"
LAMBDA_URL="https://xxxxx.lambda-url.us-east-1.on.aws"

for URL in "$CLOUD_RUN_URL" "$FLY_URL" "$LAMBDA_URL"; do
  echo "Testing $URL..."
  STATUS=$(curl -s -w "%{http_code}" -o /tmp/mcp-resp.json -X POST "$URL" \
    -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","clientInfo":{"name":"check","version":"1.0"}}}')
  if [ "$STATUS" = "200" ]; then
    echo "  ✓ HTTP 200"
    jq -r '"  protocolVersion: " + .result.protocolVersion + " serverInfo: " + .result.serverInfo.name' /tmp/mcp-resp.json
  else
    echo "  ✗ HTTP $STATUS — MCP protocol check failed"
  fi
done

Frequently asked questions

Should I run the same MCP server on multiple clouds simultaneously for redundancy?

For most MCP servers, no — multi-cloud active-active adds significant operational complexity (split-brain session state, cross-cloud latency, different secret stores to sync, different deployment pipelines to maintain) for reliability that single-cloud multi-region provides more cheaply. Deploy to two regions on one cloud (e.g., Cloud Run in us-central1 + eu-west1) with a global load balancer. Reserve multi-cloud deployment for regulatory requirements (data residency in a country where your primary cloud has no region) or contractual availability guarantees that exceed what one provider can offer.

What's the cheapest cloud option for a low-traffic MCP server?

For a server with fewer than ~100,000 requests/month: AWS Lambda Function URL or GCP Cloud Run with scale-to-zero. Both have generous free tiers (Lambda: 1M requests/month free; Cloud Run: 2M requests/month free) and cost nothing for idle time. For higher traffic or if you need always-on (no cold starts), Railway's Starter plan ($5/month) or Fly.io's pay-per-minute model (typically $2–4/month for a 256MB shared instance) are the cheapest always-on options.

How do I handle database connections across cloud platforms?

Use a connection pooler (PgBouncer, Supabase, Neon, PlanetScale) in front of your database rather than connecting directly from each MCP server instance. On serverless platforms (Lambda, Cloud Run, Cloud Functions), direct database connections from a pool of potentially hundreds of instances can overwhelm connection limits. A connection pooler like Supabase Pooler or AWS RDS Proxy accepts thousands of incoming connections and multiplexes them onto a small set of real database connections. The database URL remains the same across cloud deployments — you're just pointing to the pooler endpoint.

Can I use Terraform to deploy to multiple clouds from one config?

Yes. Terraform's multi-provider support lets you deploy the same container to GCP Cloud Run and AWS Lambda in the same configuration. Use a variable "cloud" input to select which modules to activate, or use workspaces. In practice, the biggest challenge isn't the Terraform config — it's the different IAM models, secret stores, and networking models on each provider. Start with Terraform per-provider (one .tf file per cloud) and unify only if you need to apply the same infrastructure changes across both simultaneously.