Guide · Testing

MCP server testing

MCP server testing has three layers that don't exist for REST APIs: protocol compliance (does your initialize response include the required capabilities fields?), schema stability (does your tools/list hash match what you deployed yesterday?), and session correctness (does the full initialize → tools/list → tools/call → result sequence complete without error?). Unit tests on your tool logic alone miss all three.

TL;DR

Run three types of tests: (1) a protocol compliance check that verifies the shape of your initialize response against the MCP spec; (2) a schema snapshot test that fails if tools/list changes unexpectedly; (3) a session integration test that runs the full initialize → tool call → result sequence against a locally running server. Wire all three into CI so schema drift and protocol regressions fail the build before reaching production. Use AliveMCP as the production layer that runs the same probe sequence every 60 seconds after deployment.

Layer 1: Protocol compliance testing

The MCP specification defines the required fields in each message type. A server that omits required fields works fine with forgiving clients but breaks against strict clients. The initialize response must include:

protocolVersion — a string matching one of the spec-defined versions (e.g., "2024-11-05").
capabilities — an object (may be empty {} but must be present).
serverInfo — an object with at least name and version fields.

A compliance test in Node.js using the built-in fetch:

import { describe, it, before, after } from 'node:test';
import assert from 'node:assert';
import { startServer, stopServer } from './server.js';

describe('MCP protocol compliance', () => {
  let server;
  before(async () => { server = await startServer(3001); });
  after(async () => { await stopServer(server); });

  it('initialize returns required fields', async () => {
    const res = await fetch('http://localhost:3001/mcp', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        jsonrpc: '2.0', id: 1, method: 'initialize',
        params: {
          protocolVersion: '2024-11-05', capabilities: {},
          clientInfo: { name: 'test', version: '1' }
        }
      })
    });
    const { result } = await res.json();
    assert.ok(result.protocolVersion, 'protocolVersion missing');
    assert.ok(typeof result.capabilities === 'object', 'capabilities must be object');
    assert.ok(result.serverInfo?.name, 'serverInfo.name missing');
    assert.ok(result.serverInfo?.version, 'serverInfo.version missing');
  });

  it('tools/list returns array', async () => {
    // Must first initialize — tools/list requires an initialized session
    // Use your SDK's client here for the full session:
    const tools = await getToolsViaSession('http://localhost:3001/mcp');
    assert.ok(Array.isArray(tools), 'tools must be an array');
    assert.ok(tools.length > 0, 'server has no tools');
  });
});

Run this against your server in the same CI job that runs unit tests. A protocol regression is often introduced by updating the MCP SDK without reading the changelog — a compliance test catches it before the deploy. See JSON-RPC health checks vs HTTP probes for the distinction between protocol-level and HTTP-level checking.

Layer 2: Schema snapshot testing

Schema snapshot tests lock the set of tools your server exposes. The test connects, runs tools/list, computes a deterministic hash of the sorted tool definitions, and compares it to a committed baseline. If the hash differs, the test fails.

import { createHash } from 'node:crypto';
import { readFileSync, writeFileSync } from 'node:fs';

const SNAPSHOT_PATH = './test/schema-snapshot.json';

async function getSchemaHash(serverUrl) {
  const tools = await getToolsViaSession(serverUrl);
  // Sort deterministically: by tool name, then stringify
  const sorted = tools
    .sort((a, b) => a.name.localeCompare(b.name))
    .map(t => ({ name: t.name, description: t.description, inputSchema: t.inputSchema }));
  return createHash('sha256').update(JSON.stringify(sorted)).digest('hex');
}

it('tools/list matches committed snapshot', async () => {
  const currentHash = await getSchemaHash('http://localhost:3001/mcp');
  let snapshot;
  try {
    snapshot = JSON.parse(readFileSync(SNAPSHOT_PATH, 'utf8'));
  } catch {
    // First run — write the snapshot
    writeFileSync(SNAPSHOT_PATH, JSON.stringify({ hash: currentHash }, null, 2));
    console.log('Snapshot created. Commit test/schema-snapshot.json.');
    return;
  }
  assert.strictEqual(currentHash, snapshot.hash,
    'Tool schema changed. If intentional, delete test/schema-snapshot.json and re-run to update.');
});

Workflow: the first run writes the snapshot. You review and commit it. Future runs compare against the committed hash. When you intentionally change a tool (add a parameter, rename a tool), delete the snapshot and re-run to create a new baseline. This creates a review moment for every schema change — which is what you want, since schema changes can break clients in production.

See schema drift in MCP tool definitions for why unexpected schema changes cause silent breakage in clients that cache tool schemas.

Layer 3: Session integration tests

Integration tests run the full session sequence from the client's perspective. Use the official MCP SDK's client to connect, initialize, list tools, and call each tool with minimal valid inputs. These tests catch bugs in the session lifecycle that unit tests can't reach — session state corruption, missing session termination, tool call results that don't match the declared output schema.

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';

it('full session — initialize, list, call', async () => {
  const client = new Client({ name: 'test', version: '1' }, { capabilities: {} });
  const transport = new SSEClientTransport(new URL('http://localhost:3001/mcp'));
  await client.connect(transport);

  // tools/list
  const { tools } = await client.listTools();
  assert.ok(tools.some(t => t.name === 'my_expected_tool'), 'expected tool not found');

  // tool call
  const result = await client.callTool({
    name: 'my_expected_tool',
    arguments: { param: 'test-value' }
  });
  assert.ok(result.content?.length > 0, 'empty tool result');
  assert.strictEqual(result.content[0].type, 'text', 'expected text result');

  await client.close();
});

Write one integration test per tool, with both success and error paths. For error paths, test that the server returns a well-formed JSON-RPC error (not a 500 HTTP error) when given invalid arguments. See MCP server error rate for how to distinguish client errors from server errors in production.

Testing error paths

A server that only handles the happy path is brittle in production. Test the error cases explicitly:

Invalid method name: send a request with "method": "tools/nonexistent". Expect a JSON-RPC error with code -32601 (Method not found). If the server returns a 404 HTTP error instead, it's violating the protocol — clients that expect a JSON-RPC envelope will break.
Missing required parameters: call a tool without its required input parameters. Expect a JSON-RPC error code -32602 (Invalid params). Do not expect a 400 HTTP error.
Downstream dependency failure: mock the downstream API your tool calls to return an error. Expect the tool to return a structured error result, not an unhandled exception that crashes the session.
Oversized input: if a tool accepts text input, test what happens with a 1MB string. Does the server reject it gracefully, or does it OOM? Set input size limits in your tool implementation and test that they work.
Concurrent calls: call the same tool concurrently from two sessions. Verify there's no state leakage between sessions (e.g., session A's data appearing in session B's results).

CI integration

Wire all three test layers into CI. A minimal GitHub Actions workflow:

name: MCP server tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '22' }
      - run: npm ci
      - run: npm run build --if-present
      - name: Start MCP server
        run: node index.js &
        env:
          PORT: 3001
          NODE_ENV: test
      - name: Wait for server ready
        run: |
          for i in $(seq 1 20); do
            curl -sf -X POST http://localhost:3001/mcp \
              -H 'Content-Type: application/json' \
              -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"ci","version":"1"}}}' \
              | grep -q protocolVersion && break
            sleep 1
          done
      - run: npm test

Run the protocol compliance test first (fastest feedback), then schema snapshot, then session integration (slowest). If any step fails, the subsequent steps don't run — fail fast. The schema snapshot file (test/schema-snapshot.json) is committed to the repository; CI fails if the hash doesn't match, which prevents unreviewed schema changes from reaching production.

After CI passes and the deploy is live, AliveMCP takes over: it runs the initialize → tools/list probe every 60 seconds in production. Think of CI tests as pre-production verification and AliveMCP as post-deployment continuous verification. See MCP server deployment for the post-deploy verification checklist.

Testing against multiple MCP clients

The official MCP SDK's test client is authoritative for protocol compliance, but production users connect via Claude Desktop, the Anthropic Agent SDK, or third-party clients. Each client has slightly different behavior: some clients reconnect aggressively on session drop; some cache the tools/list response and don't re-fetch; some send additional fields in the initialize request that your server shouldn't reject.

The safest approach: test with the SDK client in CI for correctness, and do periodic manual smoke tests with Claude Desktop and the Agent SDK to catch client-specific compatibility issues. If users report a specific client breaking, add a test that simulates that client's initialization behavior.