Testing guide · 2026-06-05 · MCP server development
MCP Server Testing Guide: Unit Tests, Coverage, Inspector, and Production Monitoring
Testing an MCP server requires a different mental model than testing a REST API. A REST handler is a function: give it a request, check the response. An MCP tool handler runs inside a server that negotiates a protocol — the client sends an initialize request, the server responds with its capabilities, and only after that handshake can tool calls proceed. You cannot call a tool handler as a function in a test because the handler does not run outside the server context. Every MCP testing strategy has to start with a way to create a server-plus-client pair inside a test process — and then layer tooling on top: a test runner that handles ESM without configuration surgery, a mocking strategy for external dependencies, a coverage tool that surfaces untested branches, and an exploratory debugger for the cases automated tests miss. This guide covers all five layers as a system — InMemoryTransport for unit tests, Vitest as the test runner, mocking tool handler dependencies, coverage configuration, and MCP Inspector for exploratory testing — and where the boundary is between tests you write and the production monitoring that starts where tests stop.
TL;DR
- Unit tests: use InMemoryTransport.
InMemoryTransport.createLinkedPair()creates a linked in-process server and client that run the full MCP protocol without a network. Tool calls complete in microseconds. This is the foundation of every MCP unit test. - Test runner: use Vitest. The MCP SDK is published as ESM. Jest requires
transformIgnorePatternssurgery andts-jestto handle it. Vitest handles ESM and TypeScript natively via esbuild — no transform config needed. - Mock dependencies: dependency injection first,
vi.mock()second, msw for HTTP. Pass fakes as constructor arguments rather than patching module imports. For HTTP APIs inside tool handlers, use Mock Service Worker to intercept at the network layer. - Coverage: set
coverage.include: ['src/**/*.ts']. Without this, files with zero tests are silently excluded from the report. Target 90%+ branch coverage on tool handler files — every conditional return is user-facing behavior. - Inspector: use it during development, not in CI. MCP Inspector connects to your server as a real MCP client, shows every tool with its full JSON Schema, and displays raw protocol traffic. It catches schema bugs that unit tests miss and cannot automate.
- Production gap: tests cannot verify a deployed server. Reachability, protocol handshake health, migration completion, and connection pool exhaustion are invisible to unit tests. AliveMCP probes them from outside, every 60 seconds.
The MCP Testing Lifecycle
Before writing any test, understand which tool covers which phase. The four tools in the MCP testing lifecycle have distinct responsibilities that do not overlap:
| Tool | When to use | What it verifies | Automation |
|---|---|---|---|
| MCP Inspector | During development, after adding a tool | Schema correctness, manual happy path, raw protocol messages | Manual — requires a human |
| Unit tests (InMemoryTransport) | During development, in CI on every push | Tool handler logic, argument validation, error cases, content shapes | Fully automated |
| Integration tests | In CI before deploy | Full stack: database, HTTP server, auth middleware, MCP protocol over real transport | Fully automated |
| AliveMCP | After deploy, continuously | Live server reachability, MCP protocol health, uptime, latency, schema drift | Automated — probes every 60s |
A common mistake is to pick only one of these and try to use it for all four concerns. Unit tests alone tell you nothing about whether the server boots correctly after a deploy. Inspector alone leaves you with no regression protection. The lifecycle above defines where each tool applies.
Layer 1 — InMemoryTransport: the Foundation of Unit Testing
The core problem with testing MCP tool handlers is the protocol layer. An MCP server does not expose tool handlers as callable functions — they run inside a Server instance that manages the request lifecycle. To call a tool handler in a test, you need a client that has completed the full MCP handshake with that server.
InMemoryTransport, exported from the MCP SDK, solves this by providing a linked pair of transports that communicate through an in-process channel. InMemoryTransport.createLinkedPair() returns [serverTransport, clientTransport]. Connect the server to serverTransport and connect a test Client to clientTransport. The two sides negotiate capabilities using the exact same code path as production — the only difference is that messages travel through a JavaScript object rather than a network socket.
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { createServer, fakeDeps } from './server.js';
let client: Client;
beforeEach(async () => {
const [serverTransport, clientTransport] = InMemoryTransport.createLinkedPair();
const server = createServer(fakeDeps);
await server.connect(serverTransport);
client = new Client({ name: 'test-client', version: '1.0.0' }, { capabilities: {} });
await client.connect(clientTransport);
});
afterEach(async () => {
await client.close();
});
it('returns formatted content', async () => {
const result = await client.callTool({ name: 'get_weather', arguments: { city: 'London' } });
expect(result.isError).toBeFalsy();
expect(result.content[0]).toMatchObject({ type: 'text', text: 'London: 15°C, cloudy' });
});
Tool responses have a specific shape: { content: [{type: 'text', text: '...'}], isError?: boolean }. This is not optional — it is the MCP protocol's tool result format. Tests should assert on result.content and result.isError, not on a custom return value.
There is one important distinction about error handling: a tool handler that returns { isError: true } is telling the LLM client "the tool ran but the operation failed." A tool handler that throws produces a JSON-RPC error response — a different error shape that the LLM client typically cannot recover from. Unit tests should verify that your handlers catch upstream exceptions and return isError: true, not throw. Test both:
// Test that upstream failures produce isError: true, not a thrown exception
it('handles API timeout gracefully', async () => {
const [serverTransport, clientTransport] = InMemoryTransport.createLinkedPair();
const server = createServer({
...fakeDeps,
fetchWeather: async () => { throw new Error('API timeout'); },
});
await server.connect(serverTransport);
const c = new Client({ name: 'test', version: '1.0.0' }, { capabilities: {} });
await c.connect(clientTransport);
const result = await c.callTool({ name: 'get_weather', arguments: { city: 'London' } });
expect(result.isError).toBe(true);
await c.close();
});
Create a fresh InMemoryTransport pair in beforeEach — not once per suite. Sharing state between tests produces timing-dependent failures. Each test gets its own server and client, its own fake dependency instances, and its own clean slate.
Layer 2 — Vitest: the Right Test Runner
The @modelcontextprotocol/sdk is published as ESM — it uses import syntax and ships .js extension imports. Jest was designed for CommonJS and requires transformIgnorePatterns and ts-jest (or babel-jest) to process ESM packages. This configuration is fragile: any SDK update that changes the exports map can break the transform, and the error ("cannot use import statement outside a module") is opaque.
Vitest is built on Vite, natively understands TypeScript and ESM via esbuild, and resolves the MCP SDK's imports without additional transforms. A minimal vitest.config.ts for an MCP server:
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
environment: 'node',
coverage: {
provider: 'v8',
reporter: ['text', 'lcov', 'html'],
include: ['src/**/*.ts'],
exclude: ['src/**/*.test.ts', 'src/**/*.spec.ts'],
thresholds: { lines: 80, branches: 70, functions: 80 },
},
testTimeout: 10_000,
},
});
Add test scripts to package.json:
{
"scripts": {
"test": "vitest run",
"test:watch": "vitest",
"test:coverage": "vitest run --coverage"
}
}
vitest run exits after a single pass for CI. vitest (without run) launches in watch mode for development, re-running affected tests on file change. MCP tests with InMemoryTransport complete in microseconds, so watch mode gives instant feedback as you develop tool handlers.
If you are migrating from Jest, the vi.* API is a drop-in replacement for jest.*: vi.fn(), vi.mock(), vi.spyOn(), vi.clearAllMocks() all have identical signatures. The main behavioral difference is that vi.mock() factory functions are hoisted by Vitest's transform (the same way Jest hoists jest.mock()), so they run before any imports in the test file.
Layer 3 — Mocking Tool Handler Dependencies
There are two distinct mocking layers in an MCP server: mocking the MCP connection (handled by InMemoryTransport) and mocking the tool handler's external dependencies (databases, HTTP APIs, message queues). The second layer requires a strategy.
Dependency injection — the cleanest approach
The cleanest mocking strategy is to never import external dependencies at the module level in your server file. Instead, pass them as a typed parameter to your server factory:
// src/server.ts
export interface ServerDeps {
db: { getUser: (id: string) => Promise<{ name: string } | null> };
stripe: { createCharge: (amount: number) => Promise<{ id: string }> };
}
export function createServer(deps: ServerDeps): Server {
const server = new Server({ name: 'my-server', version: '1.0.0' }, { capabilities: { tools: {} } });
// ... register tool handlers that call deps.db.getUser(), deps.stripe.createCharge()
return server;
}
// In tests: pass fakes
const fakeDeps: ServerDeps = {
db: { getUser: async (id) => id === 'u1' ? { name: 'Alice' } : null },
stripe: { createCharge: async () => ({ id: 'ch_test' }) },
};
const server = createServer(fakeDeps);
This requires no module patching. Tests pass fakes as plain objects; production passes the real implementations. TypeScript's structural typing means the fake only needs to implement the methods that matter — you do not need a complete mock of the Stripe SDK, just the createCharge method your tool actually calls.
vi.mock() for legacy code without dependency injection
For codebases that import dependencies at the module level, vi.mock() replaces the module before any imports in the test file run:
// vi.mock is hoisted — runs before imports
vi.mock('./db.js', () => ({
getUser: vi.fn().mockResolvedValue({ id: 'u1', name: 'Alice' }),
}));
// Per-test overrides
it('returns isError when user not found', async () => {
const { getUser } = await import('./db.js');
vi.mocked(getUser).mockResolvedValueOnce(null);
const result = await client.callTool({ name: 'get_user', arguments: { id: 'missing' } });
expect(result.isError).toBe(true);
});
Mock Service Worker for HTTP APIs
For tool handlers that call external HTTP APIs via fetch, axios, or any other HTTP client, use Mock Service Worker (msw) rather than mocking the HTTP client directly. msw intercepts at the network layer — it works regardless of which HTTP client the handler uses and catches cases where the handler switches client libraries:
import { setupServer } from 'msw/node';
import { http, HttpResponse } from 'msw';
const mswServer = setupServer(
http.get('https://api.weather.example/current', () =>
HttpResponse.json({ temp: 15, condition: 'cloudy' })
)
);
beforeAll(() => mswServer.listen({ onUnhandledRequest: 'error' }));
afterEach(() => mswServer.resetHandlers());
afterAll(() => mswServer.close());
onUnhandledRequest: 'error' fails tests on any unexpected HTTP request — it catches incomplete test isolation, where a handler makes a call you forgot to mock.
In-memory SQLite for database-backed tools
For tool handlers that call a SQLite database directly, use better-sqlite3 with the ':memory:' path instead of mocking the database object. An in-memory database gives you real SQL semantics, real constraint enforcement, and real query behavior — without file I/O and without state that leaks between tests:
import Database from 'better-sqlite3';
let db: Database.Database;
beforeEach(() => {
db = new Database(':memory:');
db.exec(`CREATE TABLE users (id TEXT PRIMARY KEY, name TEXT NOT NULL)`);
db.prepare('INSERT INTO users VALUES (?, ?)').run('u1', 'Alice');
});
afterEach(() => {
db.close();
});
Pass db into createServer() as the database dependency. Each test gets a fresh schema and seed data — no test can corrupt another test's state.
Layer 4 — Coverage: Measuring What You Have Tested
Test coverage for MCP servers has one non-obvious configuration requirement: without coverage.include: ['src/**/*.ts'], files that have no tests are silently excluded from the report. A file with 0% coverage does not appear at all — it looks like it does not exist. Setting include forces all application source files into the report, revealing the true coverage baseline.
coverage: {
provider: 'v8',
reporter: ['text', 'lcov', 'html'],
include: ['src/**/*.ts'], // surface files with zero tests
exclude: ['src/**/*.test.ts', 'src/**/*.spec.ts', 'src/generated/**'],
thresholds: {
lines: 80,
branches: 70,
functions: 80,
},
}
Branch coverage is the most valuable metric for MCP servers. A tool handler's conditional logic — argument validation, database not-found checks, error path returns — is user-facing behavior. A line can contain multiple branches: a ternary on one line is two branches; a logical OR on one line is two branches. 80% line coverage can coexist with 40% branch coverage if your tests only exercise happy paths. The targets by file type:
| File type | Recommended branch coverage | Rationale |
|---|---|---|
Tool handlers (tools/*.ts) | 90%+ | Every conditional is user-facing behavior; each branch is a distinct LLM experience |
| Input validation | 90%+ | Validation branches are the first line of defence against bad arguments |
| Database helpers | 70–80% | Some edge cases (constraint violations) are hard to exercise in-memory |
| Server setup / wiring | 60–70% | Startup and shutdown paths are hard to unit-test fully |
Entry point (index.ts) | 20–40% | Process lifecycle — SIGTERM handler runs at OS level, not in unit tests |
For the SIGTERM handler and other OS-level paths that genuinely cannot be unit-tested, use the /* c8 ignore next */ annotation rather than lowering the global threshold. This makes the exclusion explicit — a future reader knows why the line is not covered, not just that it isn't.
Schema snapshot testing
Branch coverage does not catch one important category of regression: unintentional schema changes. If you rename a tool, drop a required argument, or change an argument type, coverage metrics stay the same. Add a snapshot test for client.listTools():
it('tool list schema matches snapshot', async () => {
const { tools } = await client.listTools();
expect(tools).toMatchSnapshot();
});
When a tool's inputSchema changes, the snapshot test fails and requires a deliberate vitest run --update-snapshots to accept the change. This catches renames and schema drift that coverage cannot detect.
Layer 5 — MCP Inspector: Exploratory Testing During Development
MCP Inspector is the official debugging UI from the MCP SDK team. It opens a browser window, connects to your server as a real MCP client, lists all tools with their full inputSchema JSON, and lets you call them interactively — showing the raw JSON-RPC request and response alongside the formatted result. It is not a substitute for automated tests, but it fills a gap that automated tests cannot: exploratory testing of schema correctness and interactive debugging of protocol-level failures.
# Stdio server (most common)
npx @modelcontextprotocol/inspector tsx src/index.ts
# With environment variables
npx @modelcontextprotocol/inspector \
--env DB_PATH=/tmp/dev.db \
--env API_KEY=sk-dev \
node dist/index.js
# HTTP/SSE server already running at port 3000
npx @modelcontextprotocol/inspector
# then enter http://localhost:3000/sse in the Inspector UI
Inspector is most valuable at two moments: right after you add a new tool (before writing tests, to verify the schema looks correct in the protocol layer) and when a tool is misbehaving in ways that unit tests pass but real clients see incorrectly. The protocol log — showing every JSON-RPC message — often reveals the root cause immediately: missing type: 'object' at the inputSchema root, required fields listing an argument that is not in properties, or a response shape that differs from the spec.
Inspector distinguishes three failure modes that look identical to a caller but have different root causes:
| Failure mode | Inspector displays | Implication |
|---|---|---|
Tool returns isError: true | Yellow badge with formatted error content | Protocol worked; application error — LLM can recover with different arguments |
| JSON-RPC error response | Red error in protocol log; no result panel | Handler threw; or request schema was invalid — LLM receives JSON-RPC error, not content |
| Connection failure | Inspector disconnects; no messages in protocol log | Server crashed on startup; wrong URL; missing auth header |
The third case — server crash on startup — is why the development loop is: run Inspector, see if it connects, check the terminal for startup errors, add the missing environment variable or fix the migration that failed, then run Inspector again. Unit tests cannot reproduce startup failures because they construct the server object directly, bypassing the startup sequence.
The Production Gap: What Tests Cannot Verify
Every layer of the testing lifecycle above — unit tests, integration tests, Inspector — runs before deployment. They verify that your code is correct. They cannot verify that your deployed server is reachable, that the MCP protocol handshake succeeds over the network, or that the deployment itself completed without introducing a new failure. Four failure categories are invisible to the testing lifecycle but visible to external probing:
| Failure | Why tests miss it | What AliveMCP sees |
|---|---|---|
| Server unreachable after deploy | Unit tests construct the server object directly; integration tests run in a local process | Connection refused within 60s of the failing deploy |
MCP initialize handler broken by a bad change |
Tests pass because they test a different code path than what was deployed | Protocol handshake failure — server returns HTTP 200 but initialize response is malformed |
| Migration ran against wrong database | Database migrations are tested in a separate environment from production | Tool calls fail at query time — protocol-level error rate spike within 60s |
| Connection pool exhausted under load | Unit tests use in-memory fakes; integration tests rarely simulate 20+ concurrent sessions | Elevated probe latency — a pool-exhausted server accepts connections but tool calls wait for a free slot |
A concrete example: a deploy that updates a tool handler's argument name from city to location. Your unit tests still pass because you updated the tests. The snapshot test catches the schema change in CI and you accept the update. But after the deploy, a cached tool list in an active LLM session still sends city as the argument name — and your handler returns isError: true because the argument is now wrong. No alarm fires internally because the tool call technically succeeded. AliveMCP's probe would catch the schema drift — it compares the tool list hash against the prior baseline and alerts on change within 60 seconds of the new deployment going live.
The four-layer testing lifecycle closes the gap: unit tests verify handler logic, integration tests verify the full stack, Inspector verifies schema and protocol correctness during development, and AliveMCP monitors the deployed server continuously — probing the real initialize handshake, measuring latency, detecting schema drift, and alerting before your users notice.
Putting It All Together: the Complete Testing Sequence
A complete testing strategy for an MCP server from development to production monitoring has eight steps:
- Structure your server with dependency injection.
createServer(deps: ServerDeps)receives the database, HTTP clients, and any external dependencies. No direct imports at the module level in the server file. This is the precondition for all testability — unit tests, integration tests, and schema snapshot tests all depend on being able to construct the server with controlled dependencies. - Install Vitest + coverage.
npm install --save-dev vitest @vitest/coverage-v8. Configurevitest.config.tswithcoverage.include: ['src/**/*.ts']and thresholds. Addtest,test:watch, andtest:coveragescripts topackage.json. - Write unit tests using InMemoryTransport. One test file per tool group. Fresh
createLinkedPair()inbeforeEach. Test every branch: happy path, missing argument, not-found result, upstream failure returningisError: true. Close the client inafterEach. - Add the schema snapshot test.
client.listTools()+toMatchSnapshot(). Commit the initial snapshot. Any unintentional schema change — a renamed tool, a dropped argument, a changed type — now fails CI before it reaches production. - Set up the CI pipeline. Run
vitest run --coverageon every push. Uploadcoverage/lcov.infoas an artifact. Fail the pipeline on coverage threshold violations. The pipeline should fail the build before a bad deploy, not discover failures after. - Use Inspector during development, not CI. Run
npx @modelcontextprotocol/inspector tsx src/index.tswhenever you add a new tool or change a schema. Verify the tool list looks right, call the tool with both valid and invalid arguments, check the protocol log for correct response shapes. This is a manual step, not an automated one — budget five minutes per tool addition. - Write integration tests before each deploy. Integration tests start a real HTTP server, run real database migrations against a test database, and call tools via the real transport. They are slower than unit tests and typically run in a separate CI job. The goal is to verify that the startup sequence works — migrations complete, connections open, the server accepts real protocol traffic. See also the deployment guide for post-deploy smoke tests.
- Register with AliveMCP after the first public deploy. Once your server is reachable at a public URL, AliveMCP starts probing it automatically — a real
initializerequest every 60 seconds, latency tracking, schema drift detection, and alerts when the server goes down for more than 15 minutes. This covers the failure modes that your entire testing pipeline cannot: deployment failures, production database issues, network-level problems, and any regression that passes tests but breaks the live server.
Related Guides
- MCP server unit testing — InMemoryTransport patterns, test lifecycle, asserting on content responses
- MCP server Vitest — full vitest.config.ts, vi.mock() patterns, CI pipeline
- MCP server mocking — dependency injection, vi.mock(), msw, in-memory SQLite, ioredis-mock
- MCP server test coverage — @vitest/coverage-v8, coverage.include, branch coverage by file type
- MCP Inspector — connecting to stdio and HTTP servers, schema verification, protocol log
- MCP server integration testing — full-stack tests with real database and HTTP transport
- MCP server error handling — isError vs. thrown exceptions, why the distinction matters for LLM clients
- MCP server dependency injection — server factory pattern with typed deps interface
- MCP server graceful shutdown — shutdown lifecycle that test teardown should mirror
- MCP server SQLite — better-sqlite3 with in-memory database path for test isolation
- MCP Server Deployment Guide — post-deploy smoke tests and zero-downtime deployment
- MCP Server Data Persistence Guide — SQLite, Prisma, Redis, migrations — the persistence layer that tests need to mock
- AliveMCP — external uptime monitoring for MCP servers, covering the production gap that tests cannot reach