Guide · WebAssembly

MCP server with WebAssembly

WebAssembly lets you run code compiled from Rust, C, C++, and Go inside an MCP server tool handler — the same code that runs natively, but sandboxed and portable. This is useful when you have existing high-performance libraries (image processing, cryptography, compression, ML inference) that you want to expose as MCP tools without rewriting them in JavaScript. This guide covers loading WASM modules in Node.js and edge runtimes, using Wasmtime for non-JS hosting, understanding the WASI sandbox, and monitoring WASM-backed MCP servers.

TL;DR

WASM is a good fit for MCP tool handlers when you need near-native performance for compute-heavy operations (hashing, encoding, parsing, compression) and want to avoid spawning child processes. Load WASM modules at server startup (not per-request) with WebAssembly.compile() and create instances per call to avoid shared mutable state. For non-JS hosting, Wasmtime wraps your WASM module in a Rust or Python process and exposes functions you call from your MCP server. The WASI sandbox gives WASM no filesystem access by default — you must explicitly grant capabilities. Monitor with AliveMCP to detect WASM panics and instantiation failures that return 500 but not a protocol error.

When WASM makes sense for MCP tools

WebAssembly in MCP tool handlers is the right choice when:

WASM is not the right choice when:

Loading WASM in a Node.js MCP server

The key rule: compile the WASM module once at startup, create instances per tool call to isolate state. Do not compile inside the tool handler — WebAssembly.compile() can take 100–500ms for a medium-sized module:

// server.ts — WASM-backed MCP tool in Node.js
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { readFileSync } from "fs";
import { z } from "zod";
import http from "http";

// Compile once at startup — not inside the tool handler
const wasmBuffer = readFileSync("./tools/compressor.wasm");
const wasmModule = await WebAssembly.compile(wasmBuffer);

// Optional: pre-compile typed imports if your WASM imports host functions
const importObject = {
  env: {
    log: (ptr: number, len: number) => {
      // Host function exposed to WASM — decode UTF-8 string from WASM memory
      const bytes = new Uint8Array(instance.exports.memory.buffer, ptr, len);
      console.log(new TextDecoder().decode(bytes));
    },
  },
};

const server = new McpServer({ name: "wasm-mcp", version: "1.0.0" });

server.tool(
  "compress_data",
  "Compress a string using the WASM compressor (zstd)",
  { data: z.string().max(1_000_000), level: z.number().int().min(1).max(22).default(3) },
  async ({ data, level }) => {
    // Instantiate per call — ensures no shared mutable state between tool calls
    const instance = await WebAssembly.instantiate(wasmModule, importObject);
    const { compress, alloc, dealloc, memory } = instance.exports as any;

    // Copy input string into WASM memory
    const encoder = new TextEncoder();
    const inputBytes = encoder.encode(data);
    const inputPtr = alloc(inputBytes.length);
    new Uint8Array(memory.buffer, inputPtr, inputBytes.length).set(inputBytes);

    // Call WASM function (returns pointer to output, writes length to outputLen)
    const outputLenPtr = alloc(4);
    const outputPtr = compress(inputPtr, inputBytes.length, level, outputLenPtr);
    const outputLen = new DataView(memory.buffer).getUint32(outputLenPtr, true);

    if (outputPtr === 0) {
      dealloc(inputPtr, inputBytes.length);
      dealloc(outputLenPtr, 4);
      return { isError: true, content: [{ type: "text", text: "Compression failed in WASM module" }] };
    }

    // Read output from WASM memory
    const compressed = new Uint8Array(memory.buffer, outputPtr, outputLen);
    const b64 = Buffer.from(compressed).toString("base64");

    // Dealloc to avoid WASM heap leaks across calls
    dealloc(inputPtr, inputBytes.length);
    dealloc(outputPtr, outputLen);
    dealloc(outputLenPtr, 4);

    return {
      content: [{ type: "text", text: JSON.stringify({
        original_bytes: inputBytes.length,
        compressed_bytes: outputLen,
        ratio: (outputLen / inputBytes.length).toFixed(3),
        compressed_b64: b64,
      }) }],
    };
  }
);

http.createServer(async (req, res) => {
  const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
  await server.connect(transport);
  await transport.handleRequest(req, res, await readBody(req));
}).listen(3000);

The per-call instantiation pattern is safe and portable. For performance-critical paths where you call the same WASM function thousands of times per second, consider an instance pool instead — but measure first; for MCP servers with typical LLM-driven call rates (a few calls per second), per-call instantiation is fine.

WASM on edge runtimes (Cloudflare Workers, Deno Deploy)

Edge runtimes support WASM natively with no additional dependencies. Cloudflare Workers has a size limit (1MB for free, 10MB for paid) and requires WASM modules to be imported at the module level (not loaded from file at runtime):

// Cloudflare Workers — WASM imported at module level (not readFileSync)
// In wrangler.toml, add: [[wasm_modules]]; binding = "COMPRESSOR_WASM"; path = "tools/compressor.wasm"
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const server = new McpServer({ name: "edge-wasm-mcp", version: "1.0.0" });

    server.tool("compress_data", "Compress text using WASM zstd", { data: z.string() },
      async ({ data }) => {
        // env.COMPRESSOR_WASM is a WebAssembly.Module (pre-compiled by Cloudflare)
        const instance = await WebAssembly.instantiate(env.COMPRESSOR_WASM, {});
        // ... same memory manipulation as Node.js example above
        return { content: [{ type: "text", text: "..." }] };
      }
    );

    const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
    await server.connect(transport);
    return transport.handleRequest(request, {});
  },
};

interface Env { COMPRESSOR_WASM: WebAssembly.Module; }
// Deno Deploy — WASM loaded from URL or bundled as base64
// Deno supports WebAssembly.instantiateStreaming for URL-based loading
const wasmModule = await WebAssembly.compileStreaming(
  fetch("https://cdn.example.com/compressor.wasm")
);
// Or bundle the .wasm file alongside your script and import via Uint8Array

On both platforms, WASM compilation happens once per isolate lifetime (free) — subsequent calls instantiate from the compiled module (cheap). The module import model on Workers prevents dynamic loading from the filesystem, but the result is that Cloudflare pre-compiles your WASM and distributes the compiled version to all edge locations, reducing cold start significantly.

Wasmtime for non-JS MCP server hosting

If your MCP server is written in Python or Go (not JavaScript), you can still use WASM modules via Wasmtime — a standalone WASM runtime that embeds into any language. The pattern: compile your compute-heavy Rust/C library to WASM with WASI, load it into your Python or Go MCP server, call it as a function:

# Python MCP server with Wasmtime
# pip install wasmtime mcp
from wasmtime import Store, Module, Instance, Linker, WasiConfig
from mcp.server.fastmcp import FastMCP
from pathlib import Path

# One-time setup at server startup
store = Store()
wasi = WasiConfig()
# Grant only what the WASM module needs — no blanket filesystem access
wasi.inherit_stdout()   # allow stdout for debugging
# wasi.preopen_dir("./data", "/data")  # only if WASM needs filesystem
store.set_wasi(wasi)

linker = Linker(store.engine)
linker.define_wasi()

module = Module.from_file(store.engine, "tools/image_processor.wasm")
instance = linker.instantiate(store, module)
process_image = instance.exports(store)["process_image"]

mcp = FastMCP("wasm-image-mcp")

@mcp.tool()
def resize_image(image_b64: str, width: int, height: int) -> str:
    """Resize an image using the WASM image processor."""
    # Convert b64 to bytes, pass to WASM, return result
    import base64
    img_bytes = base64.b64decode(image_b64)
    # ... WASM memory management similar to JS example
    result_ptr = process_image(store, ...)
    return base64.b64encode(result_bytes).decode()

mcp.run(transport="streamable-http")

The Wasmtime sandbox denies all capability access by default — filesystem, network, environment variables. You grant them explicitly via WasiConfig. This is the key security property: you can run third-party WASM without worrying about it accessing your secrets or making outbound network calls.

WASM memory limits and performance

WASM memory is a linear array (WebAssembly.Memory) that starts at a specified size and can grow to a max. For MCP tools that process large inputs, size these correctly or you'll get RuntimeError: memory access out of bounds:

MetricTypical valuesMCP-specific notes
Initial memory1–16 MB (16–256 pages)Set to 2× your largest typical input
Max memory64–256 MBEdge runtimes cap at 128MB; Workers at 128MB
Instantiation cost0.1–2ms (from compiled module)Negligible vs tool handler I/O latency
Compilation cost10–500ms (first time)Do at startup; never in tool handler
Function call overhead<0.1ms per callWASM→JS boundary crossing is cheap

Memory grows in 64KB pages. If you need to pass a 10MB string to WASM, you need at least 160 pages of initial memory ({ initial: 160 }). The WASM module itself may also control growth via memory.grow() — check your compiled module's imports to see whether it manages its own memory growth or expects the host to pre-allocate.

Monitoring WASM-backed MCP servers

WASM panics are the most common failure mode in WASM-backed tool handlers. A Rust program compiled to WASM that panics (null pointer dereference, out-of-bounds access, explicit panic!()) returns a RuntimeError: unreachable trap — which your tool handler catches as a thrown exception. If not caught and converted to an isError: true response, the exception propagates to the MCP transport and returns a 500 to the client.

Wrap every WASM call in a try/catch that maps traps to MCP errors:

server.tool("process", "...", { input: z.string() }, async ({ input }) => {
  let instance: WebAssembly.Instance | null = null;
  try {
    instance = await WebAssembly.instantiate(wasmModule, importObject);
    const result = callWasm(instance, input);
    return { content: [{ type: "text", text: result }] };
  } catch (err) {
    // WebAssembly.RuntimeError: unreachable = WASM trap (panic, OOB access)
    const isWasmTrap = err instanceof WebAssembly.RuntimeError;
    return {
      isError: true,
      content: [{ type: "text", text: isWasmTrap
        ? `WASM execution fault: ${err.message} — input may be malformed or exceed size limits`
        : `Unexpected error: ${String(err)}`
      }],
    };
  }
});

AliveMCP monitors the MCP protocol layer: if WASM panics cause your tools/list to fail (e.g., WASM code runs during server initialization) or if tool calls return consistent errors, AliveMCP's protocol probe and error rate monitoring surfaces the problem. Pair AliveMCP with structured logging of WASM trap messages for root-cause analysis.

Frequently asked questions

Can WASM tool handlers make outbound HTTP calls?

Not directly. WASM executed in Node.js, Deno, or Cloudflare Workers has no built-in networking capability — WASM is sandboxed and cannot open sockets or call fetch(). To give WASM access to networking, you must expose a host function (an import into the WASM module) that your JavaScript MCP server implements. For example, you can expose a make_http_request host function that calls Node.js's fetch()` and writes the response into the WASM memory. This is more complex than calling fetch() directly in your tool handler — only do it if the WASM module needs to orchestrate the network call itself (e.g., a compiled HTTP client with specific TLS behavior).

Should I share a single WASM instance across tool calls or create one per call?

Per-call instantiation is safer and simpler. WASM instances have mutable memory — if you share one instance, tool calls can clobber each other's in-progress memory operations, especially if you ever handle concurrent requests. The instantiation cost from a compiled WebAssembly.Module is 0.1–2ms — negligible for MCP tool handler latency. Only pool instances if profiling shows instantiation is a real bottleneck, and then protect the pool with proper locks or per-request borrowing.

What languages compile well to WASM for MCP tool use?

Rust has the best WASM toolchain (wasm-pack, wasm-bindgen) and produces compact, fast modules with minimal runtime overhead. C and C++ compile via Emscripten and produce working WASM but require the Emscripten runtime for anything beyond pure computation. Go compiles to WASM with GOOS=wasip1 GOARCH=wasm go build since Go 1.21 — the WASM binary includes the Go runtime (adds ~2MB). AssemblyScript (TypeScript-like syntax → WASM) is easier to learn but has fewer ecosystem libraries. For MCP tool handlers, Rust is the practical choice for new code; use C/C++ only if you're wrapping an existing library.

How do I debug a WASM module that's panicking inside a tool handler?

Compile your WASM with debug symbols (wasm-pack build --debug for Rust). Use the @bytecodealliance/preview2-shim or Wasmtime's --wasm-backtrace flag to get a stack trace from the WASM panic. In Node.js, the WebAssembly.RuntimeError message shows the trap kind (unreachable, integer divide by zero, etc.) but not the source location unless you have a WASM source map. For Cloudflare Workers, wrangler tail prints the RuntimeError message. The most efficient debug workflow: reproduce locally with Wasmtime's CLI (wasmtime run --invoke function_name module.wasm arg1 arg2), then fix and redeploy.

Further reading

Know when your MCP server is down — before users do

AliveMCP probes your server's MCP endpoint every minute, detects protocol errors and transport failures, and pages you before users notice.

Start monitoring free