Guide · WebAssembly
MCP server with WebAssembly
WebAssembly lets you run code compiled from Rust, C, C++, and Go inside an MCP server tool handler — the same code that runs natively, but sandboxed and portable. This is useful when you have existing high-performance libraries (image processing, cryptography, compression, ML inference) that you want to expose as MCP tools without rewriting them in JavaScript. This guide covers loading WASM modules in Node.js and edge runtimes, using Wasmtime for non-JS hosting, understanding the WASI sandbox, and monitoring WASM-backed MCP servers.
TL;DR
WASM is a good fit for MCP tool handlers when you need near-native performance for compute-heavy operations (hashing, encoding, parsing, compression) and want to avoid spawning child processes. Load WASM modules at server startup (not per-request) with WebAssembly.compile() and create instances per call to avoid shared mutable state. For non-JS hosting, Wasmtime wraps your WASM module in a Rust or Python process and exposes functions you call from your MCP server. The WASI sandbox gives WASM no filesystem access by default — you must explicitly grant capabilities. Monitor with AliveMCP to detect WASM panics and instantiation failures that return 500 but not a protocol error.
When WASM makes sense for MCP tools
WebAssembly in MCP tool handlers is the right choice when:
- You have a high-performance library written in Rust, C, or Go that already does what you need (e.g., image resizing via
image-rs, PDF parsing, regex matching at scale) - You need deterministic performance — WASM startup cost is low after the initial compile, and execution is predictable without GC pauses
- You want sandboxed third-party code execution — WASM's capability model means untrusted WASM cannot access your filesystem, network, or process unless you explicitly grant access
- You're targeting edge runtimes (Cloudflare Workers, Deno Deploy) that support WASM natively with no additional dependencies
WASM is not the right choice when:
- Your bottleneck is I/O (database queries, HTTP calls) — WASM doesn't speed up network waits
- You need rich OS APIs — WASM/WASI access to filesystem, sockets, and environment is limited and platform-specific
- Cold start matters more than throughput — WASM module compilation adds 10–500ms on the first load (mitigated by pre-compilation)
Loading WASM in a Node.js MCP server
The key rule: compile the WASM module once at startup, create instances per tool call to isolate state. Do not compile inside the tool handler — WebAssembly.compile() can take 100–500ms for a medium-sized module:
// server.ts — WASM-backed MCP tool in Node.js
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { readFileSync } from "fs";
import { z } from "zod";
import http from "http";
// Compile once at startup — not inside the tool handler
const wasmBuffer = readFileSync("./tools/compressor.wasm");
const wasmModule = await WebAssembly.compile(wasmBuffer);
// Optional: pre-compile typed imports if your WASM imports host functions
const importObject = {
env: {
log: (ptr: number, len: number) => {
// Host function exposed to WASM — decode UTF-8 string from WASM memory
const bytes = new Uint8Array(instance.exports.memory.buffer, ptr, len);
console.log(new TextDecoder().decode(bytes));
},
},
};
const server = new McpServer({ name: "wasm-mcp", version: "1.0.0" });
server.tool(
"compress_data",
"Compress a string using the WASM compressor (zstd)",
{ data: z.string().max(1_000_000), level: z.number().int().min(1).max(22).default(3) },
async ({ data, level }) => {
// Instantiate per call — ensures no shared mutable state between tool calls
const instance = await WebAssembly.instantiate(wasmModule, importObject);
const { compress, alloc, dealloc, memory } = instance.exports as any;
// Copy input string into WASM memory
const encoder = new TextEncoder();
const inputBytes = encoder.encode(data);
const inputPtr = alloc(inputBytes.length);
new Uint8Array(memory.buffer, inputPtr, inputBytes.length).set(inputBytes);
// Call WASM function (returns pointer to output, writes length to outputLen)
const outputLenPtr = alloc(4);
const outputPtr = compress(inputPtr, inputBytes.length, level, outputLenPtr);
const outputLen = new DataView(memory.buffer).getUint32(outputLenPtr, true);
if (outputPtr === 0) {
dealloc(inputPtr, inputBytes.length);
dealloc(outputLenPtr, 4);
return { isError: true, content: [{ type: "text", text: "Compression failed in WASM module" }] };
}
// Read output from WASM memory
const compressed = new Uint8Array(memory.buffer, outputPtr, outputLen);
const b64 = Buffer.from(compressed).toString("base64");
// Dealloc to avoid WASM heap leaks across calls
dealloc(inputPtr, inputBytes.length);
dealloc(outputPtr, outputLen);
dealloc(outputLenPtr, 4);
return {
content: [{ type: "text", text: JSON.stringify({
original_bytes: inputBytes.length,
compressed_bytes: outputLen,
ratio: (outputLen / inputBytes.length).toFixed(3),
compressed_b64: b64,
}) }],
};
}
);
http.createServer(async (req, res) => {
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
await server.connect(transport);
await transport.handleRequest(req, res, await readBody(req));
}).listen(3000);
The per-call instantiation pattern is safe and portable. For performance-critical paths where you call the same WASM function thousands of times per second, consider an instance pool instead — but measure first; for MCP servers with typical LLM-driven call rates (a few calls per second), per-call instantiation is fine.
WASM on edge runtimes (Cloudflare Workers, Deno Deploy)
Edge runtimes support WASM natively with no additional dependencies. Cloudflare Workers has a size limit (1MB for free, 10MB for paid) and requires WASM modules to be imported at the module level (not loaded from file at runtime):
// Cloudflare Workers — WASM imported at module level (not readFileSync)
// In wrangler.toml, add: [[wasm_modules]]; binding = "COMPRESSOR_WASM"; path = "tools/compressor.wasm"
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const server = new McpServer({ name: "edge-wasm-mcp", version: "1.0.0" });
server.tool("compress_data", "Compress text using WASM zstd", { data: z.string() },
async ({ data }) => {
// env.COMPRESSOR_WASM is a WebAssembly.Module (pre-compiled by Cloudflare)
const instance = await WebAssembly.instantiate(env.COMPRESSOR_WASM, {});
// ... same memory manipulation as Node.js example above
return { content: [{ type: "text", text: "..." }] };
}
);
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
await server.connect(transport);
return transport.handleRequest(request, {});
},
};
interface Env { COMPRESSOR_WASM: WebAssembly.Module; }
// Deno Deploy — WASM loaded from URL or bundled as base64
// Deno supports WebAssembly.instantiateStreaming for URL-based loading
const wasmModule = await WebAssembly.compileStreaming(
fetch("https://cdn.example.com/compressor.wasm")
);
// Or bundle the .wasm file alongside your script and import via Uint8Array
On both platforms, WASM compilation happens once per isolate lifetime (free) — subsequent calls instantiate from the compiled module (cheap). The module import model on Workers prevents dynamic loading from the filesystem, but the result is that Cloudflare pre-compiles your WASM and distributes the compiled version to all edge locations, reducing cold start significantly.
Wasmtime for non-JS MCP server hosting
If your MCP server is written in Python or Go (not JavaScript), you can still use WASM modules via Wasmtime — a standalone WASM runtime that embeds into any language. The pattern: compile your compute-heavy Rust/C library to WASM with WASI, load it into your Python or Go MCP server, call it as a function:
# Python MCP server with Wasmtime
# pip install wasmtime mcp
from wasmtime import Store, Module, Instance, Linker, WasiConfig
from mcp.server.fastmcp import FastMCP
from pathlib import Path
# One-time setup at server startup
store = Store()
wasi = WasiConfig()
# Grant only what the WASM module needs — no blanket filesystem access
wasi.inherit_stdout() # allow stdout for debugging
# wasi.preopen_dir("./data", "/data") # only if WASM needs filesystem
store.set_wasi(wasi)
linker = Linker(store.engine)
linker.define_wasi()
module = Module.from_file(store.engine, "tools/image_processor.wasm")
instance = linker.instantiate(store, module)
process_image = instance.exports(store)["process_image"]
mcp = FastMCP("wasm-image-mcp")
@mcp.tool()
def resize_image(image_b64: str, width: int, height: int) -> str:
"""Resize an image using the WASM image processor."""
# Convert b64 to bytes, pass to WASM, return result
import base64
img_bytes = base64.b64decode(image_b64)
# ... WASM memory management similar to JS example
result_ptr = process_image(store, ...)
return base64.b64encode(result_bytes).decode()
mcp.run(transport="streamable-http")
The Wasmtime sandbox denies all capability access by default — filesystem, network, environment variables. You grant them explicitly via WasiConfig. This is the key security property: you can run third-party WASM without worrying about it accessing your secrets or making outbound network calls.
WASM memory limits and performance
WASM memory is a linear array (WebAssembly.Memory) that starts at a specified size and can grow to a max. For MCP tools that process large inputs, size these correctly or you'll get RuntimeError: memory access out of bounds:
| Metric | Typical values | MCP-specific notes |
|---|---|---|
| Initial memory | 1–16 MB (16–256 pages) | Set to 2× your largest typical input |
| Max memory | 64–256 MB | Edge runtimes cap at 128MB; Workers at 128MB |
| Instantiation cost | 0.1–2ms (from compiled module) | Negligible vs tool handler I/O latency |
| Compilation cost | 10–500ms (first time) | Do at startup; never in tool handler |
| Function call overhead | <0.1ms per call | WASM→JS boundary crossing is cheap |
Memory grows in 64KB pages. If you need to pass a 10MB string to WASM, you need at least 160 pages of initial memory ({ initial: 160 }). The WASM module itself may also control growth via memory.grow() — check your compiled module's imports to see whether it manages its own memory growth or expects the host to pre-allocate.
Monitoring WASM-backed MCP servers
WASM panics are the most common failure mode in WASM-backed tool handlers. A Rust program compiled to WASM that panics (null pointer dereference, out-of-bounds access, explicit panic!()) returns a RuntimeError: unreachable trap — which your tool handler catches as a thrown exception. If not caught and converted to an isError: true response, the exception propagates to the MCP transport and returns a 500 to the client.
Wrap every WASM call in a try/catch that maps traps to MCP errors:
server.tool("process", "...", { input: z.string() }, async ({ input }) => {
let instance: WebAssembly.Instance | null = null;
try {
instance = await WebAssembly.instantiate(wasmModule, importObject);
const result = callWasm(instance, input);
return { content: [{ type: "text", text: result }] };
} catch (err) {
// WebAssembly.RuntimeError: unreachable = WASM trap (panic, OOB access)
const isWasmTrap = err instanceof WebAssembly.RuntimeError;
return {
isError: true,
content: [{ type: "text", text: isWasmTrap
? `WASM execution fault: ${err.message} — input may be malformed or exceed size limits`
: `Unexpected error: ${String(err)}`
}],
};
}
});
AliveMCP monitors the MCP protocol layer: if WASM panics cause your tools/list to fail (e.g., WASM code runs during server initialization) or if tool calls return consistent errors, AliveMCP's protocol probe and error rate monitoring surfaces the problem. Pair AliveMCP with structured logging of WASM trap messages for root-cause analysis.
Frequently asked questions
Can WASM tool handlers make outbound HTTP calls?
Not directly. WASM executed in Node.js, Deno, or Cloudflare Workers has no built-in networking capability — WASM is sandboxed and cannot open sockets or call fetch(). To give WASM access to networking, you must expose a host function (an import into the WASM module) that your JavaScript MCP server implements. For example, you can expose a make_http_request host function that calls Node.js's fetch()` and writes the response into the WASM memory. This is more complex than calling fetch() directly in your tool handler — only do it if the WASM module needs to orchestrate the network call itself (e.g., a compiled HTTP client with specific TLS behavior).
Should I share a single WASM instance across tool calls or create one per call?
Per-call instantiation is safer and simpler. WASM instances have mutable memory — if you share one instance, tool calls can clobber each other's in-progress memory operations, especially if you ever handle concurrent requests. The instantiation cost from a compiled WebAssembly.Module is 0.1–2ms — negligible for MCP tool handler latency. Only pool instances if profiling shows instantiation is a real bottleneck, and then protect the pool with proper locks or per-request borrowing.
What languages compile well to WASM for MCP tool use?
Rust has the best WASM toolchain (wasm-pack, wasm-bindgen) and produces compact, fast modules with minimal runtime overhead. C and C++ compile via Emscripten and produce working WASM but require the Emscripten runtime for anything beyond pure computation. Go compiles to WASM with GOOS=wasip1 GOARCH=wasm go build since Go 1.21 — the WASM binary includes the Go runtime (adds ~2MB). AssemblyScript (TypeScript-like syntax → WASM) is easier to learn but has fewer ecosystem libraries. For MCP tool handlers, Rust is the practical choice for new code; use C/C++ only if you're wrapping an existing library.
How do I debug a WASM module that's panicking inside a tool handler?
Compile your WASM with debug symbols (wasm-pack build --debug for Rust). Use the @bytecodealliance/preview2-shim or Wasmtime's --wasm-backtrace flag to get a stack trace from the WASM panic. In Node.js, the WebAssembly.RuntimeError message shows the trap kind (unreachable, integer divide by zero, etc.) but not the source location unless you have a WASM source map. For Cloudflare Workers, wrangler tail prints the RuntimeError message. The most efficient debug workflow: reproduce locally with Wasmtime's CLI (wasmtime run --invoke function_name module.wasm arg1 arg2), then fix and redeploy.
Further reading
- MCP server edge runtime patterns — stateless constraints, KV session state, and cold starts
- MCP server on Cloudflare Workers — WASM support, V8 isolates, and Durable Objects
- MCP server tool design — arguments, responses, and error handling
- MCP server worker threads — CPU-bound work without blocking the event loop
- MCP server health checks — protocol probes and readiness verification
- AliveMCP — continuous protocol monitoring for MCP servers