Guide · Testing

MCP server integration testing

Unit tests for MCP servers mock the SDK's tool handler and assert on what your code does with the arguments. Integration tests go further — they wire a real McpServer to a real Client, call tools over a real transport, and assert on the JSON-RPC response. The MCP SDK provides InMemoryTransport precisely for this: two linked transport objects that route messages in-process with no network, no port binding, and no test-suite setup overhead. Integration tests catch issues that unit tests can't — protocol negotiation bugs, tool registration errors, schema drift, and middleware behaviour.

TL;DR

Use InMemoryTransport.createLinkedPair() to connect an McpServer to a Client in-process. Call client.callTool() and assert on result.content[0].text. For error paths, assert result.isError === true. Add a schema snapshot test that computes a SHA-256 hash of tools/list output and compares it to a committed baseline — any tool added, removed, or renamed fails CI until you update the baseline intentionally. After deploy, run the same initialize + tools/list probe that AliveMCP runs to confirm the production server matches the CI snapshot.

Setting up an in-process test client

The @modelcontextprotocol/sdk package ships InMemoryTransport for testing. It creates two linked transport instances — one for the server, one for the client — that pass JSON-RPC messages through an in-memory queue rather than a network socket:

// test/helpers/test-server.ts
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import type { Deps } from '../../deps.js';
import { registerAllTools } from '../../tools/index.js';

export interface TestHandle {
  client: Client;
  cleanup: () => Promise<void>;
}

export async function createTestServer(deps: Deps): Promise<TestHandle> {
  const server = new McpServer({ name: 'test-server', version: '0.0.0' });
  registerAllTools(server, deps);

  const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();

  await server.connect(serverTransport);

  const client = new Client(
    { name: 'test-client', version: '0.0.0' },
    { capabilities: {} }
  );
  await client.connect(clientTransport);

  return {
    client,
    cleanup: async () => {
      await client.close();
    },
  };
}

The linked pair is created synchronously; the await calls on connect() complete the MCP handshake (the initialize request/response pair). After createTestServer returns, the client has a negotiated session and is ready to call tools.

Writing tool-call assertions

Tool calls return a CallToolResult object. The main fields are content (an array of content blocks) and isError (true when the tool returned an application error, as opposed to a protocol error). Most tools return a single text block:

// test/search.test.ts
import { createTestServer } from './helpers/test-server.js';
import { createTestDeps } from './helpers/test-deps.js';

let handle: TestHandle;

beforeEach(async () => {
  const deps = createTestDeps();
  await seedTestData(deps.db); // insert rows your tool will query
  handle = await createTestServer(deps);
});

afterEach(async () => { await handle.cleanup(); });

test('search_records returns rows matching query', async () => {
  const result = await handle.client.callTool({
    name: 'search_records',
    arguments: { query: 'typescript', limit: 5 },
  });

  expect(result.isError).toBeFalsy();
  expect(result.content).toHaveLength(1);
  expect(result.content[0].type).toBe('text');

  const rows = JSON.parse((result.content[0] as { type: 'text'; text: string }).text);
  expect(rows.length).toBeGreaterThan(0);
  expect(rows.every((r: any) => r.id && r.title)).toBe(true);
});

test('search_records returns isError when query is too short', async () => {
  const result = await handle.client.callTool({
    name: 'search_records',
    arguments: { query: 'a' }, // too short, triggers validation error
  });

  expect(result.isError).toBe(true);
  expect((result.content[0] as any).text).toContain('at least');
});

Notice the distinction: argument schema validation errors (wrong type, missing required field) throw a JSON-RPC McpError and are surfaced as a rejected promise from client.callTool(). Application errors that the tool catches and returns as { isError: true } resolve the promise normally with result.isError === true. Test both paths.

Testing tools/list and schema snapshots

The tools/list result defines your MCP server's public contract. Any unintentional change to the tool list — a renamed tool, a dropped argument, a changed description — breaks clients silently. A schema snapshot test catches these regressions at CI time:

// test/schema-snapshot.test.ts
import { createHash } from 'node:crypto';
import { readFileSync, writeFileSync } from 'node:fs';
import { createTestServer } from './helpers/test-server.js';
import { createTestDeps } from './helpers/test-deps.js';

const BASELINE_PATH = 'test/schema-baseline.json';

test('tool schema matches committed baseline', async () => {
  const deps = createTestDeps();
  const { client, cleanup } = await createTestServer(deps);

  try {
    const { tools } = await client.listTools();

    // Sort for deterministic output
    const schema = JSON.stringify(
      tools.sort((a, b) => a.name.localeCompare(b.name)),
      null,
      2
    );

    const hash = createHash('sha256').update(schema).digest('hex');

    let baseline: { hash: string; schema: string };
    try {
      baseline = JSON.parse(readFileSync(BASELINE_PATH, 'utf8'));
    } catch {
      // No baseline yet — write it on first run
      writeFileSync(BASELINE_PATH, JSON.stringify({ hash, schema }, null, 2));
      return; // first run always passes
    }

    if (hash !== baseline.hash) {
      // Show the diff in the test output
      throw new Error(
        `Tool schema changed. Current hash: ${hash}. Expected: ${baseline.hash}.\n` +
        `Run: node -e "require('./test/schema-snapshot.js').updateBaseline()" to accept the change.\n` +
        `Schema diff:\n${schema}`
      );
    }
  } finally {
    await cleanup();
  }
});

Commit test/schema-baseline.json to version control. Every schema change — intentional or not — requires an explicit baseline update, creating a mandatory code-review moment for API contract changes. This is the same pattern as schema versioning but lighter than a full version negotiation layer.

Testing authentication middleware

InMemoryTransport bypasses the HTTP layer, so it can't test Express middleware directly. For auth middleware testing, use a real HTTP server on a random port:

// test/auth-middleware.test.ts
import request from 'supertest';
import { createApp } from '../../server.js';
import { createTestDeps } from './helpers/test-deps.js';

test('POST /mcp without Authorization returns 401', async () => {
  const deps = createTestDeps();
  const app = await createApp(deps);

  const res = await request(app)
    .post('/mcp')
    .send({
      jsonrpc: '2.0',
      id: 1,
      method: 'initialize',
      params: { protocolVersion: '2024-11-05', capabilities: {}, clientInfo: { name: 'test', version: '0.0.0' } },
    });

  expect(res.status).toBe(401);
});

test('POST /mcp with valid token returns 200', async () => {
  const deps = createTestDeps();
  const app = await createApp(deps);
  const token = deps.config.testApiKey;

  const res = await request(app)
    .post('/mcp')
    .set('Authorization', `Bearer ${token}`)
    .send({ jsonrpc: '2.0', id: 1, method: 'initialize', params: { protocolVersion: '2024-11-05', capabilities: {}, clientInfo: { name: 'test', version: '0.0.0' } } });

  expect(res.status).toBe(200);
});

The pattern requires extracting the Express app into a createApp(deps) factory that the test can call without starting the HTTP server. This is also better architecture for the production path — main() calls createDeps(), then createApp(deps), then app.listen().

Post-deploy probe as a CI gate

After deploying to production, run the same protocol-level probe that AliveMCP runs — an initialize + tools/list over real HTTPS — and verify the tool hash matches the CI baseline. This confirms the deploy succeeded at the MCP protocol level, not just at the HTTP level:

#!/usr/bin/env node
// scripts/post-deploy-probe.mjs
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
import { createHash } from 'node:crypto';
import { readFileSync } from 'node:fs';

const MCP_URL = process.env.MCP_URL ?? 'https://api.yourdomain.com/mcp';
const BASELINE = JSON.parse(readFileSync('test/schema-baseline.json', 'utf8'));

async function probe(attempt = 1) {
  if (attempt > 10) throw new Error('Post-deploy probe failed after 10 attempts');

  try {
    const transport = new StreamableHTTPClientTransport(new URL(MCP_URL));
    const client = new Client({ name: 'deploy-probe', version: '0.0.0' }, { capabilities: {} });
    await client.connect(transport);

    const { tools } = await client.listTools();
    const schema = JSON.stringify(tools.sort((a, b) => a.name.localeCompare(b.name)), null, 2);
    const hash = createHash('sha256').update(schema).digest('hex');

    if (hash !== BASELINE.hash) {
      throw new Error(`Production schema hash ${hash} does not match baseline ${BASELINE.hash}`);
    }

    console.log('Post-deploy probe passed. Schema matches baseline.');
    await client.close();
  } catch (err) {
    console.warn(`Attempt ${attempt} failed: ${err.message}. Retrying in 12s...`);
    await new Promise(r => setTimeout(r, 12_000));
    return probe(attempt + 1);
  }
}

probe();

Run this in CI after the deploy step and before marking the deployment successful. It waits up to two minutes (10 × 12s) for the production server to come up, then confirms the schema is correct. AliveMCP provides continuous protocol-level monitoring after the post-deploy probe completes — the probe is for deploy-time verification, AliveMCP is for ongoing runtime health.

Related questions

Should I mock the database in integration tests?

For unit tests that test argument parsing and business logic without touching the DB, yes. For integration tests that test the full tool call including data retrieval, no — use a real in-memory database (SQLite via better-sqlite3) or a test-isolated PostgreSQL schema. Mocking the DB in integration tests was the source of many production incidents: the mock passed while the real query failed because of a type mismatch, a missing index, or an incorrect JOIN. See the dependency injection guide for how createTestDeps() wires an in-memory DB without touching the real schema.

How do I test tools that call external APIs?

Inject the HTTP client as a dep. In tests, replace it with a stub that returns canned responses. If the tool uses the fetch global, use jest.spyOn(global, 'fetch') or switch to a client interface in your Deps type. The key is that the external API call is behind a seam — an interface the test can replace — rather than being a direct fetch(url) call inside the tool handler.

Do I need to test tools/list if I have a schema snapshot?

The schema snapshot tests that the tool list doesn't change unexpectedly. You still want tests that call individual tools and assert on their output — the snapshot only catches structural contract violations, not behavioral regressions. Both tests complement each other: the snapshot is a guard at the contract level, the per-tool tests guard the behavior.

Can AliveMCP replace my post-deploy probe?

AliveMCP runs every 60 seconds and will catch the failure within one minute of deploy. The post-deploy CI probe is a gate that blocks the pipeline from marking the deploy successful until the protocol check passes. AliveMCP is for continuous alerting after the deploy completes. Use both: the probe for deploy-time confidence, AliveMCP for runtime uptime and regression detection.

Further reading