Guide · Testing

MCP server integration testing

Unit tests for MCP servers mock the SDK's tool handler and assert on what your code does with the arguments. Integration tests go further — they wire a real McpServer to a real Client, call tools over a real transport, and assert on the JSON-RPC response. The MCP SDK provides InMemoryTransport precisely for this: two linked transport objects that route messages in-process with no network, no port binding, and no test-suite setup overhead. Integration tests catch issues that unit tests can't — protocol negotiation bugs, tool registration errors, schema drift, and middleware behaviour.

TL;DR

Use InMemoryTransport.createLinkedPair() to connect an McpServer to a Client in-process. Call client.callTool() and assert on result.content[0].text. For error paths, assert result.isError === true. Add a schema snapshot test that computes a SHA-256 hash of tools/list output and compares it to a committed baseline — any tool added, removed, or renamed fails CI until you update the baseline intentionally. After deploy, run the same initialize + tools/list probe that AliveMCP runs to confirm the production server matches the CI snapshot.

Setting up an in-process test client

The @modelcontextprotocol/sdk package ships InMemoryTransport for testing. It creates two linked transport instances — one for the server, one for the client — that pass JSON-RPC messages through an in-memory queue rather than a network socket:

// test/helpers/test-server.ts
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import type { Deps } from '../../deps.js';
import { registerAllTools } from '../../tools/index.js';

export interface TestHandle {
  client: Client;
  cleanup: () => Promise<void>;
}

export async function createTestServer(deps: Deps): Promise<TestHandle> {
  const server = new McpServer({ name: 'test-server', version: '0.0.0' });
  registerAllTools(server, deps);

  const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();

  await server.connect(serverTransport);

  const client = new Client(
    { name: 'test-client', version: '0.0.0' },
    { capabilities: {} }
  );
  await client.connect(clientTransport);

  return {
    client,
    cleanup: async () => {
      await client.close();
    },
  };
}

The linked pair is created synchronously; the await calls on connect() complete the MCP handshake (the initialize request/response pair). After createTestServer returns, the client has a negotiated session and is ready to call tools.

Writing tool-call assertions

Tool calls return a CallToolResult object. The main fields are content (an array of content blocks) and isError (true when the tool returned an application error, as opposed to a protocol error). Most tools return a single text block:

// test/search.test.ts
import { createTestServer } from './helpers/test-server.js';
import { createTestDeps } from './helpers/test-deps.js';

let handle: TestHandle;

beforeEach(async () => {
  const deps = createTestDeps();
  await seedTestData(deps.db); // insert rows your tool will query
  handle = await createTestServer(deps);
});

afterEach(async () => { await handle.cleanup(); });

test('search_records returns rows matching query', async () => {
  const result = await handle.client.callTool({
    name: 'search_records',
    arguments: { query: 'typescript', limit: 5 },
  });

  expect(result.isError).toBeFalsy();
  expect(result.content).toHaveLength(1);
  expect(result.content[0].type).toBe('text');

  const rows = JSON.parse((result.content[0] as { type: 'text'; text: string }).text);
  expect(rows.length).toBeGreaterThan(0);
  expect(rows.every((r: any) => r.id && r.title)).toBe(true);
});

test('search_records returns isError when query is too short', async () => {
  const result = await handle.client.callTool({
    name: 'search_records',
    arguments: { query: 'a' }, // too short, triggers validation error
  });

  expect(result.isError).toBe(true);
  expect((result.content[0] as any).text).toContain('at least');
});

Notice the distinction: argument schema validation errors (wrong type, missing required field) throw a JSON-RPC McpError and are surfaced as a rejected promise from client.callTool(). Application errors that the tool catches and returns as { isError: true } resolve the promise normally with result.isError === true. Test both paths.

Testing tools/list and schema snapshots

The tools/list result defines your MCP server's public contract. Any unintentional change to the tool list — a renamed tool, a dropped argument, a changed description — breaks clients silently. A schema snapshot test catches these regressions at CI time:

// test/schema-snapshot.test.ts
import { createHash } from 'node:crypto';
import { readFileSync, writeFileSync } from 'node:fs';
import { createTestServer } from './helpers/test-server.js';
import { createTestDeps } from './helpers/test-deps.js';

const BASELINE_PATH = 'test/schema-baseline.json';

test('tool schema matches committed baseline', async () => {
  const deps = createTestDeps();
  const { client, cleanup } = await createTestServer(deps);

  try {
    const { tools } = await client.listTools();

    // Sort for deterministic output
    const schema = JSON.stringify(
      tools.sort((a, b) => a.name.localeCompare(b.name)),
      null,
      2
    );

    const hash = createHash('sha256').update(schema).digest('hex');

    let baseline: { hash: string; schema: string };
    try {
      baseline = JSON.parse(readFileSync(BASELINE_PATH, 'utf8'));
    } catch {
      // No baseline yet — write it on first run
      writeFileSync(BASELINE_PATH, JSON.stringify({ hash, schema }, null, 2));
      return; // first run always passes
    }

    if (hash !== baseline.hash) {
      // Show the diff in the test output
      throw new Error(
        `Tool schema changed. Current hash: ${hash}. Expected: ${baseline.hash}.\n` +
        `Run: node -e "require('./test/schema-snapshot.js').updateBaseline()" to accept the change.\n` +
        `Schema diff:\n${schema}`
      );
    }
  } finally {
    await cleanup();
  }
});

Commit test/schema-baseline.json to version control. Every schema change — intentional or not — requires an explicit baseline update, creating a mandatory code-review moment for API contract changes. This is the same pattern as schema versioning but lighter than a full version negotiation layer.

Testing authentication middleware

InMemoryTransport bypasses the HTTP layer, so it can't test Express middleware directly. For auth middleware testing, use a real HTTP server on a random port:

// test/auth-middleware.test.ts
import request from 'supertest';
import { createApp } from '../../server.js';
import { createTestDeps } from './helpers/test-deps.js';

test('POST /mcp without Authorization returns 401', async () => {
  const deps = createTestDeps();
  const app = await createApp(deps);

  const res = await request(app)
    .post('/mcp')
    .send({
      jsonrpc: '2.0',
      id: 1,
      method: 'initialize',
      params: { protocolVersion: '2024-11-05', capabilities: {}, clientInfo: { name: 'test', version: '0.0.0' } },
    });

  expect(res.status).toBe(401);
});

test('POST /mcp with valid token returns 200', async () => {
  const deps = createTestDeps();
  const app = await createApp(deps);
  const token = deps.config.testApiKey;

  const res = await request(app)
    .post('/mcp')
    .set('Authorization', `Bearer ${token}`)
    .send({ jsonrpc: '2.0', id: 1, method: 'initialize', params: { protocolVersion: '2024-11-05', capabilities: {}, clientInfo: { name: 'test', version: '0.0.0' } } });

  expect(res.status).toBe(200);
});

The pattern requires extracting the Express app into a createApp(deps) factory that the test can call without starting the HTTP server. This is also better architecture for the production path — main() calls createDeps(), then createApp(deps), then app.listen().

Post-deploy probe as a CI gate

After deploying to production, run the same protocol-level probe that AliveMCP runs — an initialize + tools/list over real HTTPS — and verify the tool hash matches the CI baseline. This confirms the deploy succeeded at the MCP protocol level, not just at the HTTP level:

#!/usr/bin/env node
// scripts/post-deploy-probe.mjs
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
import { createHash } from 'node:crypto';
import { readFileSync } from 'node:fs';

const MCP_URL = process.env.MCP_URL ?? 'https://api.yourdomain.com/mcp';
const BASELINE = JSON.parse(readFileSync('test/schema-baseline.json', 'utf8'));

async function probe(attempt = 1) {
  if (attempt > 10) throw new Error('Post-deploy probe failed after 10 attempts');

  try {
    const transport = new StreamableHTTPClientTransport(new URL(MCP_URL));
    const client = new Client({ name: 'deploy-probe', version: '0.0.0' }, { capabilities: {} });
    await client.connect(transport);

    const { tools } = await client.listTools();
    const schema = JSON.stringify(tools.sort((a, b) => a.name.localeCompare(b.name)), null, 2);
    const hash = createHash('sha256').update(schema).digest('hex');

    if (hash !== BASELINE.hash) {
      throw new Error(`Production schema hash ${hash} does not match baseline ${BASELINE.hash}`);
    }

    console.log('Post-deploy probe passed. Schema matches baseline.');
    await client.close();
  } catch (err) {
    console.warn(`Attempt ${attempt} failed: ${err.message}. Retrying in 12s...`);
    await new Promise(r => setTimeout(r, 12_000));
    return probe(attempt + 1);
  }
}

probe();

Run this in CI after the deploy step and before marking the deployment successful. It waits up to two minutes (10 × 12s) for the production server to come up, then confirms the schema is correct. AliveMCP provides continuous protocol-level monitoring after the post-deploy probe completes — the probe is for deploy-time verification, AliveMCP is for ongoing runtime health.