Cross-Runtime · 2026-07-02 · Edge + WASM + Go + Multi-Cloud arc
MCP Servers Across Runtimes: Edge, WASM, Go, and Multi-Cloud — What Actually Changes
The Model Context Protocol is an HTTP-level specification. It does not care whether your server is written in TypeScript, Python, or Go. It does not care whether it runs on a long-lived Node.js process, a Cloudflare Workers V8 isolate, a WebAssembly module, or a container on GCP Cloud Run. On paper, this runtime-agnosticism is pure upside — deploy wherever your stack already lives. In practice, four specific decisions change depending on where your server runs, and a fifth changes depending on which language you use. Everything else — tool registration, input schema shape, error handling, monitoring — is identical across all of them. This guide synthesizes what actually changes so you can make those decisions once and stop re-debating them.
TL;DR
Five decisions change across runtimes; the rest is the same everywhere. (1) Transport: use StreamableHTTP with stateless mode everywhere — SSE only on long-lived processes, stdio only for local tools. (2) Session state: stateless tool handlers work on every runtime; when you need cross-request state, externalize it to KV — the pattern is identical across Cloudflare KV, Deno KV, Vercel KV, and Upstash Redis. (3) Language: TypeScript for edge and web ecosystem, Python for ML, Go for CPU/memory efficiency; all three produce protocol-identical servers. (4) Cold starts: calibrate AliveMCP's timeout threshold per runtime — edge is 50–200ms, containers 200–800ms, Lambda 500ms–3s. (5) Deployment artifact: Docker container is vendor-neutral across GCP, Azure, Fly.io, and AWS App Runner; Lambda needs a 30-line adapter. Everything else — tool handler logic, error handling with isError: true, schema definitions, monitoring — does not change based on runtime.
The universal unit: stateless tool handlers
The single most important thing to understand about MCP server portability is that stateless tool handlers are identical across every runtime. A handler that receives arguments, calls an external API or database, and returns a result has no runtime-specific code in it at all. It works the same way in a Node.js process on a VPS, in a Cloudflare Workers V8 isolate, in a Go goroutine, and inside a WebAssembly module.
Here is the same tool handler expressed in each of the three major SDKs. The logic is identical; only the wrapping syntax changes:
// TypeScript (works in Node.js, Bun, Cloudflare Workers, Deno Deploy, Lambda)
server.tool("lookup_record", "Look up a record by ID", { id: z.string() },
async ({ id }) => {
const record = await db.findById(id);
if (!record) return { isError: true, content: [{ type: "text", text: "Record not found" }] };
return { content: [{ type: "text", text: JSON.stringify(record) }] };
}
);
# Python (works in long-lived processes, containers, AWS Lambda)
@mcp.tool()
async def lookup_record(id: str) -> str:
"""Look up a record by ID."""
record = await db.find_by_id(id)
if not record:
raise McpError(ErrorCode.NOT_FOUND, "Record not found")
return json.dumps(record)
// Go (works in containers, VMs, Fly.io, AWS App Runner)
s.AddTool(mcp.NewTool("lookup_record",
mcp.WithDescription("Look up a record by ID"),
mcp.WithSchema(reflector.Reflect(LookupArgs{}).Definitions["LookupArgs"]),
), func(ctx context.Context, req mcp.CallToolRequest) (mcp.CallToolResult, error) {
var args LookupArgs
req.UnmarshalArguments(&args)
record, err := db.FindByID(ctx, args.ID)
if err != nil {
return mcp.NewToolResultError("Record not found"), nil // protocol error, not Go error
}
return mcp.NewToolResultText(jsonMarshal(record)), nil
})
In all three cases, the handler receives validated arguments, calls a data source, and returns a result. The MCP client on the other end sees an identical protocol response — it cannot distinguish whether the handler ran on Cloudflare Workers or in a Go container on Fly.io. This is the property that makes runtime portability real rather than theoretical.
The implication: when you're choosing a runtime, you are not choosing how your tool handlers work. You are choosing how the server is initialized, what transport it uses, how session state is stored, and how it starts up. Those are the five variables. The handlers themselves are portable by design.
For a detailed breakdown of all three SDKs side by side — including transport support, schema definition ergonomics, type safety guarantees, testing patterns, and deployment comparison — see the MCP SDK comparison guide.
When statelessness breaks: the three patterns that need cross-request state
Most MCP tool handlers are stateless. But three real-world patterns genuinely need state that persists across multiple tool calls in a session — and how you store that state is the first runtime-specific decision.
Pattern 1: Multi-step workflows
The LLM calls tool A, then tool B uses the result of A, then tool C uses the result of B. If your tools are truly pure functions with no side effects (A returns a value, B takes that value as an argument), no session state is needed. But many real workflows involve server-side state accumulation: start a data export job, poll its progress, download the result. The job ID is state that lives between three separate tool calls.
Pattern 2: Transactions and undo
Some tools operate in a multi-step transaction: begin, make changes, commit or rollback. The transaction handle is state that must survive between calls in the same session.
Pattern 3: Conversation context accumulation
Tools that build up context from prior results — a search tool that remembers which documents were already retrieved so it doesn't return duplicates — need per-session state. This is less common than it sounds (the LLM's context window often serves this role) but it arises in long-running agentic pipelines.
The KV solution: identical pattern, different syntax
The fix for all three is external KV storage. The insight from the edge runtime guide is that every provider offers KV with the same get/set/TTL pattern — just with different syntax:
// Cloudflare KV
const jobId = await env.SESSION_KV.get(`session:${sessionId}:job_id`);
await env.SESSION_KV.put(`session:${sessionId}:job_id`, newJobId, { expirationTtl: 3600 });
// Deno KV
const kv = await Deno.openKv();
const entry = await kv.get(["session", sessionId, "job_id"]);
await kv.set(["session", sessionId, "job_id"], newJobId, { expireIn: 3600 * 1000 });
// Vercel KV (Upstash Redis)
const jobId = await kv.get(`session:${sessionId}:job_id`);
await kv.set(`session:${sessionId}:job_id`, newJobId, { ex: 3600 });
// Any Redis (works on all platforms, including containers and Go servers)
const jobId = await redis.get(`session:${sessionId}:job_id`);
await redis.setex(`session:${sessionId}:job_id`, 3600, newJobId);
These four snippets accomplish the same thing: store a job ID keyed to a session ID, expire it after one hour. The operations are get, set, and TTL — universal across every KV provider. The only thing that changes is whether you're calling an edge-runtime binding, a cloud-specific SDK, or a Redis client.
The practical advice: write your session state logic as a thin interface with get(key), set(key, value, ttlSeconds), and delete(key). Inject different implementations based on the deployment target. Your tool handlers never reference the KV provider directly — they only reference the interface. This is what makes the session state pattern portable across edge, container, and Go deployments.
Transport selection: one right answer and two special cases
The MCP protocol supports three transports. The runtime portability of your server depends almost entirely on which transport you choose:
| Transport | Works on | When to use |
|---|---|---|
StreamableHTTP | Every runtime: Node.js, Go, Python, edge, Lambda, containers | Default for all HTTP deployments. Use stateless mode (sessionIdGenerator: undefined) for edge and Lambda; stateful mode with external KV for session-aware deployments |
SSE | Long-lived processes only (Node.js, Go, Python on a container or VM) | Only when you're maintaining a persistent server process and need backwards compatibility with older MCP clients that predate StreamableHTTP |
stdio | Local process only | Local developer tools, Claude Desktop integrations, CLI-invoked MCP servers. Not for any networked deployment |
The SSE transport assumes a long-lived HTTP server process that maintains an open HTTP connection per client. This assumption breaks on edge runtimes (no persistent process) and Lambda (request lifecycle ends the process). The TypeScript SDK's SSEServerTransport will not work on Cloudflare Workers or Deno Deploy at all — you'll get a runtime error when you try to initialize it in a V8 isolate context.
StreamableHTTPServerTransport was specifically designed to work in stateless environments. Set sessionIdGenerator: undefined for pure stateless mode (edge runtimes, Lambda with no session tracking). Set it to a UUID generator and pair it with external KV if you need cross-request session state. The protocol behavior from the client's perspective is identical in both modes — the session ID handling is purely a server-side concern.
Go's mcp-go SDK has a NewStreamableHTTPServer that accepts a WithStateless(true) option — same concept. Python's mcp package supports streamable-http transport. All three reach the same destination via equivalent APIs.
The one-sentence rule: use StreamableHTTP with stateless mode everywhere, and don't change this decision based on your runtime choice.
The cold start problem: calibrate per runtime, monitor externally
Cold start latency is the most runtime-specific operational variable. It directly affects your MCP monitoring configuration because a cold start can look like an outage to a monitor with the wrong timeout threshold.
| Runtime | Cold start (p50) | Cold start (p99) | AliveMCP threshold to set | Warm-instance cost |
|---|---|---|---|---|
| Cloudflare Workers (TS) | ~50ms | ~200ms | 500ms | $0/month (Workers standard) |
| Deno Deploy | ~80ms | ~250ms | 500ms | $0/month (Playground) |
| Vercel Edge Functions | ~60ms | ~200ms | 500ms | Included in Vercel plan |
| Node.js / Docker (Cloud Run, Azure) | 200–400ms | 800ms | 2,000ms | ~$5/month (1 min-instance) |
| Go / Docker (Cloud Run, Fly.io) | 30–100ms | 300ms | 500ms | ~$3/month (1 warm instance) |
| Python / Docker (Cloud Run, Lambda) | 500ms–1.5s | 3s | 4,000ms | ~$8/month (1 min-instance) |
| AWS Lambda (Node.js, provisioned) | ~500ms | ~1.5s | 2,500ms | ~$15/month (1 provisioned) |
| AWS Lambda (Go, provisioned) | ~100ms | ~400ms | 750ms | ~$15/month (1 provisioned) |
The pattern in the table: Go has the fastest cold starts across every deployment target because the binary starts in 30–100ms and has no interpreter or runtime to initialize. Python has the slowest because of interpreter startup plus import time for large libraries (NumPy, pandas, or ML frameworks can add 500ms–2s to startup alone). TypeScript on edge runtimes is fast because Cloudflare Workers and Deno Deploy pre-warm V8 isolates at the network edge rather than starting a process from scratch.
The monitoring implication: set AliveMCP's response timeout to 2-3× the p99 cold start for your runtime. If you're on Cloud Run with Node.js and set a 500ms timeout, you will get false-positive alerts every time a scale-from-zero event happens after a traffic lull. If you're on Cloudflare Workers and set a 5,000ms timeout, you won't notice the difference between a 50ms healthy response and a 3,000ms degraded one. Neither is useful monitoring.
The deeper point: cold start behavior is one thing that genuinely differs across runtimes, and it has an operational consequence you must configure explicitly. It is not handled automatically by your framework or your cloud provider. You must set the right threshold once, per deployment target, in your monitoring configuration. See the edge runtime patterns guide for the full cold start analysis and the async dispatch pattern for CPU-time-limited tools.
The async dispatch pattern for CPU-time-limited runtimes
Edge runtimes impose CPU time limits (10–60 seconds, depending on provider and plan tier). If a tool handler does compute-heavy work — processing a large dataset, running inference, converting a file format — it may hit the CPU limit and be killed, returning an error to the MCP client.
The async dispatch pattern solves this: the tool handler submits a job to a queue or external worker and returns a job ID immediately, then a separate polling tool lets the LLM check job status and retrieve the result when ready:
// Tool 1: start_heavy_job — returns immediately with a job ID
server.tool("start_heavy_job", "Start a compute-intensive job", { dataset_id: z.string() },
async ({ dataset_id }) => {
const jobId = crypto.randomUUID();
await queue.send({ jobId, dataset_id }); // dispatches to a durable worker
return { content: [{ type: "text", text: JSON.stringify({ jobId, status: "queued" }) }] };
}
);
// Tool 2: get_job_result — polls status; returns result when complete
server.tool("get_job_result", "Check job status and retrieve result when ready", { job_id: z.string() },
async ({ job_id }) => {
const job = await jobStore.get(job_id);
if (!job) return { isError: true, content: [{ type: "text", text: "Job not found" }] };
return { content: [{ type: "text", text: JSON.stringify(job) }] };
}
);
This pattern is not edge-runtime-specific — it's useful on Lambda (15-minute function timeout, but you don't want the LLM waiting that long), on containers with long-running operations, and anywhere that user-perceived latency matters. The pattern is the same across all runtimes; only the queue implementation changes (Cloudflare Queues, AWS SQS, Google Cloud Tasks, or a Redis-backed queue on any platform).
Language selection: a team decision, not a runtime decision
Once you have decided to use StreamableHTTP with stateless handlers, the language choice becomes almost entirely a function of your team's existing expertise and the libraries you need — not the MCP protocol or the deployment target. The protocol behavior at the client's interface is identical in TypeScript, Python, and Go.
| Criterion | TypeScript | Python | Go |
|---|---|---|---|
| Edge runtime support | Yes — runs on Cloudflare Workers, Deno Deploy, Vercel Edge | No — V8 isolates don't run Python | No — V8 isolates don't run Go |
| ML / data science libraries | Limited — no NumPy, limited ML | Best — NumPy, pandas, scikit-learn, HuggingFace, LangChain, etc. | Growing — Go ML ecosystem is smaller but improving |
| Concurrency model | Single-threaded event loop (Node.js) / V8 isolate (edge) | GIL-limited (I/O-bound fine; CPU-bound use multiprocessing) | Goroutines — cheap concurrent tool calls with no thread pool |
| Cold start speed | Medium (200–400ms Node.js; 50–100ms edge) | Slow (500ms–3s depending on imports) | Fast (30–100ms for static binary) |
| Docker image size | ~150–200MB (node:alpine) | ~200–300MB (python:slim + dependencies) | Under 25MB (distroless/static multi-stage) |
| Type safety | Compile-time with TypeScript; Zod validates at runtime | Static with mypy/pyright; Pydantic validates at runtime | Compile-time via Go type system; UnmarshalArguments checked at runtime |
| Ecosystem maturity for MCP | Most mature — official SDK, most examples, widest client compatibility testing | Mature — official SDK, FastMCP, strong ML community | Community SDK (mark3labs/mcp-go), official SDK in development |
The one hard constraint in the table is edge runtime support: TypeScript is the only option for Cloudflare Workers and Deno Deploy, because V8 isolates run JavaScript/WebAssembly and nothing else. Python and Go require a long-lived process environment. If edge is a requirement, the language decision is already made for you.
For Go specifically, the advantages are concrete and measurable: Go MCP servers start in 30–100ms (vs 200–400ms for Node.js with a warm Docker container), consume 20–50MB of RAM at idle (vs 80–150MB for Node.js), and handle 1,000+ concurrent tool calls via goroutines without a thread pool or worker pool configuration. If you're running a high-concurrency MCP server or need to minimize infrastructure cost, Go is the right choice. If you need to call ML models, Python is the right choice. If your team already writes TypeScript and you don't have a specific constraint pushing you to another language, stay in TypeScript.
For the full SDK comparison including schema definition ergonomics, testing patterns, and deployment side-by-side, see the MCP SDK comparison: TypeScript vs Python vs Go guide. For Go-specific patterns including struct-based schemas, goroutine concurrency, context propagation, table-driven tests with race detection, and multi-stage Docker builds to distroless images under 25MB, see the MCP server with Go guide.
WebAssembly: when WASM makes sense as a tool execution layer
WebAssembly occupies a specific niche in the MCP deployment landscape that is worth understanding: WASM is not a deployment runtime (you don't run an MCP server in WASM), it is a tool execution layer that can be embedded inside an MCP server tool handler.
The use case: you have a high-performance library written in Rust, C, or Go — image processing, cryptography, PDF parsing, text encoding, compression, or ML inference using a compiled model — and you want to call it from a TypeScript or Python MCP server without spawning a child process or maintaining a sidecar service. You compile the library to WASM and load it as a module inside the tool handler.
// Load once at module scope — compilation is expensive (100–500ms for medium modules)
const wasmBytes = fs.readFileSync("./tools/image-processor.wasm");
const wasmModule = await WebAssembly.compile(wasmBytes); // cached — never in the tool handler
// In the tool handler — create a fresh instance per call to isolate state
server.tool("resize_image", "Resize an image to target dimensions",
{ image_base64: z.string(), width: z.number(), height: z.number() },
async ({ image_base64, width, height }) => {
const instance = await WebAssembly.instantiate(wasmModule, {}); // fast — reuses compiled module
const { memory, resize, alloc, dealloc } = instance.exports as WasmExports;
// Write input into WASM memory, call the function, read output, free memory
const inputBytes = Buffer.from(image_base64, "base64");
const inputPtr = alloc(inputBytes.length);
new Uint8Array(memory.buffer).set(inputBytes, inputPtr);
const outputPtr = resize(inputPtr, inputBytes.length, width, height);
if (outputPtr === 0) {
dealloc(inputPtr, inputBytes.length);
return { isError: true, content: [{ type: "text", text: "Resize failed" }] };
}
const outputView = new Uint8Array(memory.buffer, outputPtr, getOutputLength(instance, outputPtr));
const result = Buffer.from(outputView).toString("base64");
dealloc(inputPtr, inputBytes.length);
dealloc(outputPtr, outputView.length);
return { content: [{ type: "text", text: result }] };
}
);
The critical pattern: compile the WASM module once at server startup, instantiate once per tool call. Never compile inside the tool handler — WebAssembly.compile() takes 100–500ms for a medium module and will dominate your tool call latency. Instantiation from a pre-compiled module is typically under 5ms. Creating a fresh instance per call eliminates shared mutable WASM memory state between concurrent tool calls, which is the most common source of data corruption bugs in WASM-backed servers.
WASM on edge runtimes (Cloudflare Workers, Deno Deploy) works natively without any additional configuration beyond the standard module import. Cloudflare Workers even allows module-level WASM imports that are compiled once and shared across all isolate instances in a deployment, making WASM particularly attractive on edge: you get near-native performance with global distribution and near-zero cold starts. For Rust/C/Go code that cannot run in a V8 isolate as native code, WASM is the path to edge deployment.
For the complete guide including Wasmtime for non-JS hosting, WASI capability grants, memory sizing tables, and monitoring WASM panics, see MCP server with WebAssembly.
Multi-cloud portability: one artifact, every provider
The multi-cloud reality for MCP servers is more straightforward than it sounds. Because an MCP server is a standard HTTP service that listens on a port and reads configuration from environment variables, the same Docker container runs unchanged on GCP Cloud Run, Azure Container Apps, AWS App Runner, Fly.io, and Railway. You are not locked in to any provider.
The vendor-neutral pattern has three properties that enable this:
- Read the port from
PORTenvironment variable. Every cloud platform injects the port viaPORT. GCP Cloud Run defaults to 8080; Fly.io usesfly.toml'sinternal_port; AWS uses the task definition. If your server hardcodes port 3000, it will fail or require configuration on platforms that inject a different port. - Read all secrets from environment variables. Do not use platform-specific secrets SDKs inside your server code. Use
process.env.DATABASE_URL, not the AWS SSM SDK directly. Each platform's secret injection mechanism sets environment variables before your container starts — which means the same application code reads secrets identically on every platform. - Package as a Docker container. Every modern cloud platform supports OCI-compliant containers. The container is the portability artifact. If you're deploying Node.js as a zip file to Lambda or a raw Python script to App Engine, you've opted out of portability.
The one exception: AWS Lambda. Lambda does not run long-lived HTTP servers in the traditional sense — it invokes a function handler per HTTP event. Adding a 30-line adapter that converts API Gateway events to Node.js IncomingMessage objects bridges this gap:
// lambda.ts — adapter for AWS Lambda (30 lines, not in main server.ts)
import { APIGatewayProxyEventV2, APIGatewayProxyResultV2 } from "aws-lambda";
import { IncomingMessage, ServerResponse } from "http";
import { Socket } from "net";
import { handleRequest } from "./server.js"; // your standard MCP HTTP handler
export const handler = async (
event: APIGatewayProxyEventV2
): Promise<APIGatewayProxyResultV2> => {
const req = Object.assign(new IncomingMessage(new Socket()), {
method: event.requestContext.http.method,
url: event.rawPath + (event.rawQueryString ? `?${event.rawQueryString}` : ""),
headers: event.headers as Record<string, string>,
});
const chunks: Buffer[] = [];
const res = new ServerResponse(req);
res.write = (chunk: Buffer) => { chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk)); return true; };
res.end = (chunk?: Buffer) => { if (chunk) chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk)); };
if (event.body) req.emit("data", Buffer.from(event.body, event.isBase64Encoded ? "base64" : "utf8"));
req.emit("end");
await handleRequest(req, res);
return {
statusCode: res.statusCode ?? 200,
headers: res.getHeaders() as Record<string, string>,
body: Buffer.concat(chunks).toString("utf8"),
};
};
This adapter is deliberately isolated from server.ts. The main server code has no Lambda-specific imports or logic. The adapter is the seam — and because it is only 30 lines, it is easy to understand, easy to test, and easy to replace if Lambda's invocation model changes.
For provider-specific deployment commands, secret management patterns (Doppler, AWS SSM, GCP Secret Manager, Azure Key Vault), and the cold start comparison table across Cloud Run, Azure Container Apps, Lambda, and Fly.io, see the MCP server multi-cloud deployment guide.
What a multi-cloud migration actually looks like
Teams sometimes reach a point where they need to move their MCP server from one cloud to another — cost optimization, regional latency requirements, organization-wide cloud mandates, or consolidating onto the same platform as the rest of their infrastructure. If you built with the vendor-neutral pattern, the migration procedure is:
- Push the existing Docker image to the new cloud's container registry.
- Set the same environment variables (secrets) in the new platform's secret store.
- Deploy the container and obtain the new HTTPS URL.
- Update your AliveMCP probe to the new URL.
- Compare the
tools/listresponse hash on the old and new URLs to confirm identical tool surfaces before cutting over traffic.
Step 5 is worth calling out explicitly. AliveMCP stores a hash of the tools/list response for every probed endpoint. If you compare the hash on your old deployment URL against the hash on the new one and they match, you have confirmed that both instances are running the same server code and exposing the same tool surface. If they differ, you have a configuration problem or a deployment artifact mismatch — catch it before you update DNS.
This is cloud-agnostic migration validation. It does not matter which provider you're migrating from or to. The MCP protocol gives you a built-in integrity check on the tool surface, and external monitoring tools like AliveMCP give you a timestamped record of what the hash was before the migration.
Monitoring: the one thing that is truly cloud-agnostic
Every runtime-specific choice described in this guide affects something about your MCP server: how it handles session state, how fast it starts, how it handles concurrency, how it is packaged for deployment. But one piece of operational infrastructure is completely identical regardless of runtime: external protocol-level monitoring.
An AliveMCP probe sends a real MCP initialize request to your server's public HTTPS endpoint. It measures response time, checks protocol compliance (valid JSON-RPC response, correct protocol version, non-empty tools list), hashes the tool surface, and records the result. It does not care whether the server is TypeScript or Go, Node.js or edge, Cloud Run or Lambda. The probe targets the URL — not the runtime behind it.
This matters because cloud-native monitoring is runtime-specific and deployment-specific. Cloud Run's metrics tell you about container CPU and memory. Lambda's metrics tell you about function invocations and duration. Cloudflare Workers analytics tell you about Worker requests. None of them tell you what an MCP client actually experiences when it connects and calls a tool. Cloudflare metrics can show zero errors while your MCP server is returning {"error": {"code": -32600}} on every initialize request because you broke the JSON-RPC envelope. Protocol-level monitoring closes the gap.
| What it catches | Cloud-native metrics | AliveMCP protocol probe |
|---|---|---|
| Server crashed / container restarted | Yes (restart count, error rate) | Yes (next probe returns error) |
| TLS certificate expired | No | Yes (TLS handshake fails) |
| MCP JSON-RPC envelope broken | No (HTTP 200 response, but invalid JSON-RPC) | Yes (protocol validation fails) |
| Empty tools/list (server started but tool registration failed) | No | Yes (empty tool surface check) |
| Tool surface changed unexpectedly (schema drift) | No | Yes (tools/list hash comparison) |
| Cold start causing timeout on first request | Partially (latency percentiles) | Yes (if threshold is calibrated correctly) |
| WASM panic causing HTTP 500 without protocol error | Yes (error rate) | Yes (protocol response check fails) |
The table reflects a pattern you'll encounter whenever you add a new layer to the stack: each layer's native observability is authoritative for that layer only. Your WebAssembly module's panics show up in your Node.js error rate. Your Node.js process health shows up in Cloud Run's container metrics. Your Cloud Run deployment status shows up in GCP's service health. But what your MCP client experiences is the aggregate of all these layers — and only an external probe from outside the cloud can observe that aggregate without blind spots.
This is why the monitoring recommendation in every runtime-specific guide in this series — edge runtime, WebAssembly, Go SDK, multi-cloud — is the same: set up a protocol-level probe as soon as you have a publicly reachable URL, calibrate the timeout to your runtime's cold start characteristics, and treat protocol errors (not just HTTP errors) as alerts. The advice is identical across all runtimes because the problem being solved — visibility into what clients actually experience — is the same problem on every runtime.
The five decisions, summarized
Here is the complete decision matrix for MCP server runtime choices:
| Decision | Default / recommended | When it changes |
|---|---|---|
| Transport | StreamableHTTPServerTransport, stateless mode | Never for HTTP. Use stdio only for local/desktop tools |
| Session state | No session state — pure stateless tool handlers | When you need multi-step workflows, transactions, or context accumulation. Solution: external KV (identical get/set/TTL pattern across all providers) |
| Language | Your team's existing language | Must use TypeScript for edge runtimes. Should use Python for ML. May want Go for high-concurrency or low-memory deployments |
| Cold start threshold | 2-3× the p99 cold start for your runtime | Every time you change the runtime or add/remove large import dependencies (Python especially) |
| Deployment artifact | Docker container with PORT env var and env-based secrets | Lambda requires a 30-line adapter. Edge runtimes don't use containers at all — deploy via wrangler or deno deploy |
Everything else — tool handler logic, error handling with isError: true, input schema definitions, output formatting, internal links between tools, the monitoring configuration (except threshold) — does not change based on your runtime choice. You decide these once, and they transfer.
The practical implication of this matrix: most teams spend time debating the language choice and the cloud provider when those decisions have the smallest impact on the production MCP server's behavior. The transport choice (StreamableHTTP, done) and the session state architecture (stateless handlers first, external KV only when genuinely needed) have more impact on reliability and maintainability than Go vs TypeScript vs Python.
Where to go from here
Each section of this guide has a deep-dive companion in the AliveMCP SEO series:
- MCP server edge runtime patterns — the five constraints all edge runtimes impose, KV session state patterns with code, cold start optimization, and async dispatch for CPU-time-limited tools
- MCP server with WebAssembly — loading WASM modules at startup vs per-call, per-call instantiation for state isolation, Wasmtime for non-JS hosts, WASI capability grants, memory sizing, and monitoring WASM panics
- MCP server with Go —
mark3labs/mcp-gosetup, struct-based schemas with JSON struct tags, goroutine concurrency, context propagation, table-driven tests with-race, multi-stage Docker builds to distroless images, Fly.io deployment - MCP SDK comparison: TypeScript vs Python vs Go — transport support table, schema definition ergonomics (Zod vs FastMCP decorator vs struct tags), type safety comparison, testing patterns, deployment side-by-side, and the 10-row decision table
- MCP server multi-cloud deployment — the vendor-neutral pattern with full code, GCP Cloud Run, Azure Container Apps, AWS Lambda adapter, Fly.io, cold start comparison table, cross-cloud secret management, and AliveMCP migration validation
For broader deployment patterns including Kubernetes, IaC, and GitOps, see the MCP server deployment guide. For TypeScript-specific patterns (Zod schema composition, middleware, session handling in Node.js), see MCP server TypeScript patterns.