Guide · Testing

MCP server testing

MCP server testing has three layers that don't exist for REST APIs: protocol compliance (does your initialize response include the required capabilities fields?), schema stability (does your tools/list hash match what you deployed yesterday?), and session correctness (does the full initializetools/listtools/call → result sequence complete without error?). Unit tests on your tool logic alone miss all three.

TL;DR

Run three types of tests: (1) a protocol compliance check that verifies the shape of your initialize response against the MCP spec; (2) a schema snapshot test that fails if tools/list changes unexpectedly; (3) a session integration test that runs the full initialize → tool call → result sequence against a locally running server. Wire all three into CI so schema drift and protocol regressions fail the build before reaching production. Use AliveMCP as the production layer that runs the same probe sequence every 60 seconds after deployment.

Layer 1: Protocol compliance testing

The MCP specification defines the required fields in each message type. A server that omits required fields works fine with forgiving clients but breaks against strict clients. The initialize response must include:

A compliance test in Node.js using the built-in fetch:

import { describe, it, before, after } from 'node:test';
import assert from 'node:assert';
import { startServer, stopServer } from './server.js';

describe('MCP protocol compliance', () => {
  let server;
  before(async () => { server = await startServer(3001); });
  after(async () => { await stopServer(server); });

  it('initialize returns required fields', async () => {
    const res = await fetch('http://localhost:3001/mcp', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        jsonrpc: '2.0', id: 1, method: 'initialize',
        params: {
          protocolVersion: '2024-11-05', capabilities: {},
          clientInfo: { name: 'test', version: '1' }
        }
      })
    });
    const { result } = await res.json();
    assert.ok(result.protocolVersion, 'protocolVersion missing');
    assert.ok(typeof result.capabilities === 'object', 'capabilities must be object');
    assert.ok(result.serverInfo?.name, 'serverInfo.name missing');
    assert.ok(result.serverInfo?.version, 'serverInfo.version missing');
  });

  it('tools/list returns array', async () => {
    // Must first initialize — tools/list requires an initialized session
    // Use your SDK's client here for the full session:
    const tools = await getToolsViaSession('http://localhost:3001/mcp');
    assert.ok(Array.isArray(tools), 'tools must be an array');
    assert.ok(tools.length > 0, 'server has no tools');
  });
});

Run this against your server in the same CI job that runs unit tests. A protocol regression is often introduced by updating the MCP SDK without reading the changelog — a compliance test catches it before the deploy. See JSON-RPC health checks vs HTTP probes for the distinction between protocol-level and HTTP-level checking.

Layer 2: Schema snapshot testing

Schema snapshot tests lock the set of tools your server exposes. The test connects, runs tools/list, computes a deterministic hash of the sorted tool definitions, and compares it to a committed baseline. If the hash differs, the test fails.

import { createHash } from 'node:crypto';
import { readFileSync, writeFileSync } from 'node:fs';

const SNAPSHOT_PATH = './test/schema-snapshot.json';

async function getSchemaHash(serverUrl) {
  const tools = await getToolsViaSession(serverUrl);
  // Sort deterministically: by tool name, then stringify
  const sorted = tools
    .sort((a, b) => a.name.localeCompare(b.name))
    .map(t => ({ name: t.name, description: t.description, inputSchema: t.inputSchema }));
  return createHash('sha256').update(JSON.stringify(sorted)).digest('hex');
}

it('tools/list matches committed snapshot', async () => {
  const currentHash = await getSchemaHash('http://localhost:3001/mcp');
  let snapshot;
  try {
    snapshot = JSON.parse(readFileSync(SNAPSHOT_PATH, 'utf8'));
  } catch {
    // First run — write the snapshot
    writeFileSync(SNAPSHOT_PATH, JSON.stringify({ hash: currentHash }, null, 2));
    console.log('Snapshot created. Commit test/schema-snapshot.json.');
    return;
  }
  assert.strictEqual(currentHash, snapshot.hash,
    'Tool schema changed. If intentional, delete test/schema-snapshot.json and re-run to update.');
});

Workflow: the first run writes the snapshot. You review and commit it. Future runs compare against the committed hash. When you intentionally change a tool (add a parameter, rename a tool), delete the snapshot and re-run to create a new baseline. This creates a review moment for every schema change — which is what you want, since schema changes can break clients in production.

See schema drift in MCP tool definitions for why unexpected schema changes cause silent breakage in clients that cache tool schemas.

Layer 3: Session integration tests

Integration tests run the full session sequence from the client's perspective. Use the official MCP SDK's client to connect, initialize, list tools, and call each tool with minimal valid inputs. These tests catch bugs in the session lifecycle that unit tests can't reach — session state corruption, missing session termination, tool call results that don't match the declared output schema.

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';

it('full session — initialize, list, call', async () => {
  const client = new Client({ name: 'test', version: '1' }, { capabilities: {} });
  const transport = new SSEClientTransport(new URL('http://localhost:3001/mcp'));
  await client.connect(transport);

  // tools/list
  const { tools } = await client.listTools();
  assert.ok(tools.some(t => t.name === 'my_expected_tool'), 'expected tool not found');

  // tool call
  const result = await client.callTool({
    name: 'my_expected_tool',
    arguments: { param: 'test-value' }
  });
  assert.ok(result.content?.length > 0, 'empty tool result');
  assert.strictEqual(result.content[0].type, 'text', 'expected text result');

  await client.close();
});

Write one integration test per tool, with both success and error paths. For error paths, test that the server returns a well-formed JSON-RPC error (not a 500 HTTP error) when given invalid arguments. See MCP server error rate for how to distinguish client errors from server errors in production.

Testing error paths

A server that only handles the happy path is brittle in production. Test the error cases explicitly:

CI integration

Wire all three test layers into CI. A minimal GitHub Actions workflow:

name: MCP server tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '22' }
      - run: npm ci
      - run: npm run build --if-present
      - name: Start MCP server
        run: node index.js &
        env:
          PORT: 3001
          NODE_ENV: test
      - name: Wait for server ready
        run: |
          for i in $(seq 1 20); do
            curl -sf -X POST http://localhost:3001/mcp \
              -H 'Content-Type: application/json' \
              -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"ci","version":"1"}}}' \
              | grep -q protocolVersion && break
            sleep 1
          done
      - run: npm test

Run the protocol compliance test first (fastest feedback), then schema snapshot, then session integration (slowest). If any step fails, the subsequent steps don't run — fail fast. The schema snapshot file (test/schema-snapshot.json) is committed to the repository; CI fails if the hash doesn't match, which prevents unreviewed schema changes from reaching production.

After CI passes and the deploy is live, AliveMCP takes over: it runs the initializetools/list probe every 60 seconds in production. Think of CI tests as pre-production verification and AliveMCP as post-deployment continuous verification. See MCP server deployment for the post-deploy verification checklist.

Testing against multiple MCP clients

The official MCP SDK's test client is authoritative for protocol compliance, but production users connect via Claude Desktop, the Anthropic Agent SDK, or third-party clients. Each client has slightly different behavior: some clients reconnect aggressively on session drop; some cache the tools/list response and don't re-fetch; some send additional fields in the initialize request that your server shouldn't reject.

The safest approach: test with the SDK client in CI for correctness, and do periodic manual smoke tests with Claude Desktop and the Agent SDK to catch client-specific compatibility issues. If users report a specific client breaking, add a test that simulates that client's initialization behavior.

Related questions

Should I mock the MCP SDK in unit tests?

Mock the tool implementation logic, not the MCP SDK itself. A unit test that mocks server.addTool tells you nothing about whether the tool integrates correctly with the session lifecycle. Unit-test the functions your tools call (business logic, API clients, data transformations). Integration-test the tool as called via the MCP SDK. The compliance and snapshot tests verify the SDK integration itself.

How do I test tools that call external APIs?

In unit tests, mock the external API call. In integration tests, use a test-mode flag or environment variable to swap the real API for a local fixture server. Don't make real external API calls in CI — network failures make tests flaky, and most APIs have rate limits or cost money per call. For smoke tests against a production-like environment, consider a staging environment with a real but sandboxed API key.

What's the difference between testing and load testing?

Testing verifies correctness — the right answer is returned for given inputs. Load testing verifies performance under concurrent load — the server returns answers within acceptable latency when N sessions run simultaneously. You need both: a functionally correct server that collapses under 10 concurrent sessions is a production incident waiting to happen.

How do I test schema drift detection?

Add a test that intentionally changes a tool's name or schema, runs tools/list, and verifies the hash differs from the baseline. Then restore the change and verify the hash matches again. This confirms your snapshot mechanism works. In production, AliveMCP detects the same drift and alerts you — the schema snapshot in CI catches intentional changes before deploy; AliveMCP catches unintended drift after deploy.

Further reading