Guide · Testing

MCP server integration testing

Unit tests with InMemoryTransport test individual tool handlers. But they don't catch bugs at the boundary where your tool registry, request dispatcher, and protocol handling interact. Integration tests use a real Server instance connected to a real Client via the full MCP protocol stack — the initialize handshake fires, capabilities are negotiated, tools are listed, and tool calls go through the actual dispatcher. These tests catch wiring bugs (wrong handler name, missing tool registration, bad inputSchema format) before your users do.

TL;DR

Create a real Server instance with your actual handlers registered. Create an InMemoryTransport linked pair. Connect server and client. Call client.listTools() to verify registration, then client.callTool({ name, arguments }) to exercise the full call path. Swap external dependencies (database, HTTP clients) for fakes via constructor injection so tests run without network access. In CI, run a real PostgreSQL or Redis container with Docker Compose if your handlers need it for final confidence.

Unit tests vs. integration tests for MCP servers

The distinction matters for MCP servers because the protocol stack has multiple layers. A unit test that calls your handler function directly skips the MCP request-routing layer — it never tests that the tool was registered with the correct name, that the inputSchema is wired correctly, or that the handler function is invoked for the right method name.

Test layer	What it tests	What it misses
Handler unit test (direct call)	Handler logic, input validation, output formatting	Tool registration, name routing, protocol handshake
Integration test (InMemoryTransport)	Full protocol stack: handshake, listTools, callTool routing	Network failures, TLS, HTTP server binding
End-to-end test (real HTTP)	HTTP server, port binding, real network	Fast to run, hard to isolate
AliveMCP probe (production)	Live endpoint reachability, MCP initialize over real network	Handler logic — tests the infrastructure

The integration test layer using InMemoryTransport is the highest-value layer for most MCP servers: it covers the full protocol stack without requiring a running HTTP server, making it fast enough to run in CI on every commit.

Basic integration test setup

The InMemoryTransport.createLinkedPair() method returns a [serverTransport, clientTransport] pair that route messages in-process. Pass serverTransport to your Server.connect() and clientTransport to your Client.connect().

// server.ts — your actual server factory
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js';

export interface Deps {
  db: { getUser: (id: string) => Promise<{ name: string } | null> };
}

export function createServer(deps: Deps) {
  const server = new Server(
    { name: 'user-service-mcp', version: '1.0.0' },
    { capabilities: { tools: {} } }
  );

  server.setRequestHandler(ListToolsRequestSchema, async () => ({
    tools: [
      {
        name: 'get_user',
        description: 'Fetch a user by their ID.',
        inputSchema: {
          type: 'object',
          properties: { userId: { type: 'string', description: 'The user UUID' } },
          required: ['userId'],
        },
      },
    ],
  }));

  server.setRequestHandler(CallToolRequestSchema, async (request) => {
    if (request.params.name === 'get_user') {
      const { userId } = request.params.arguments as { userId: string };
      const user = await deps.db.getUser(userId);
      if (!user) {
        return { content: [{ type: 'text', text: `No user found for ID ${userId}` }], isError: true };
      }
      return { content: [{ type: 'text', text: JSON.stringify(user) }] };
    }
    throw new Error(`Unknown tool: ${request.params.name}`);
  });

  return server;
}

// server.integration.test.ts
import { describe, it, expect, afterEach } from 'vitest';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import { createServer } from './server.js';

// Fake dependency — no real database
const fakeDb = {
  getUser: async (id: string) =>
    id === 'user-1' ? { name: 'Alice' } : null,
};

describe('user-service-mcp integration', () => {
  let client: Client;

  afterEach(async () => {
    await client?.close();
  });

  async function connect() {
    const [serverTransport, clientTransport] = InMemoryTransport.createLinkedPair();
    const server = createServer({ db: fakeDb });
    await server.connect(serverTransport);
    client = new Client({ name: 'test-client', version: '1.0.0' }, { capabilities: {} });
    await client.connect(clientTransport);
    return client;
  }

  it('lists the get_user tool', async () => {
    const c = await connect();
    const { tools } = await c.listTools();
    expect(tools.map(t => t.name)).toContain('get_user');
  });

  it('returns user data for a known ID', async () => {
    const c = await connect();
    const result = await c.callTool({ name: 'get_user', arguments: { userId: 'user-1' } });
    expect(result.isError).toBeFalsy();
    const text = (result.content[0] as { text: string }).text;
    expect(JSON.parse(text)).toEqual({ name: 'Alice' });
  });

  it('returns isError for unknown user', async () => {
    const c = await connect();
    const result = await c.callTool({ name: 'get_user', arguments: { userId: 'does-not-exist' } });
    expect(result.isError).toBe(true);
  });
});

Testing the initialize handshake

The MCP initialize handshake happens automatically when client.connect() is called — it negotiates capabilities between client and server. Most integration tests don't need to inspect it directly, but if your server advertises specific capabilities (like resources or prompts), verify they appear in the server info returned by the client.

it('advertises the tools capability', async () => {
  const c = await connect();
  // The client's serverInfo is populated after connect()
  // @ts-expect-error — internal property
  const serverCapabilities = c._serverCapabilities;
  expect(serverCapabilities.tools).toBeDefined();
});

A more practical test of the handshake is that client.connect() completes without throwing and client.listTools() returns without an error. If the server name or version is wrong, the protocol error surfaces here.

Testing tool input schema

Integration tests let you verify that the inputSchema your server advertises matches what it actually accepts. Fetch the schema from listTools() and assert the properties you depend on.

it('get_user inputSchema requires userId as string', async () => {
  const c = await connect();
  const { tools } = await c.listTools();
  const getUserTool = tools.find(t => t.name === 'get_user')!;
  expect(getUserTool.inputSchema).toMatchObject({
    type: 'object',
    properties: {
      userId: { type: 'string' },
    },
    required: ['userId'],
  });
});

This is the simplest form of contract testing — the test will fail if a future change accidentally removes userId from the required array or changes its type. For more systematic schema drift detection, see contract testing patterns that compare the current schema to a stored baseline.

Dependency injection for integration tests

Pass dependencies through a constructor argument (the Deps interface in the example above) so tests can swap in fakes without patching modules. The real server in production receives the real database client; the test receives a fake object that implements the same interface.

// Production bootstrap — real deps
import { createPool } from './db.js';
const pool = await createPool(process.env.DATABASE_URL!);
const server = createServer({ db: pool });

// Test bootstrap — fake deps
const fakeDb = { getUser: async (id: string) => ({ name: 'Test User' }) };
const server = createServer({ db: fakeDb });

Fakes are preferable to mocks (spy-based assertions) for integration tests because they produce realistic behavior across the full call stack, not just a single assert that the mock was called. See MCP server test doubles for the distinction between stubs, fakes, and spies and when to use each.

Integration tests with real external dependencies in CI

If your handlers access a real PostgreSQL database or Redis cache, you have two options in CI: use a fake that implements the same interface (fast, no infra), or run the real service as a Docker sidecar (slow but realistic). GitHub Actions supports services containers for exactly this.

# .github/workflows/test.yml
jobs:
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_DB: testdb
          POSTGRES_USER: testuser
          POSTGRES_PASSWORD: testpass
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '22' }
      - run: npm ci
      - run: npm test
        env:
          DATABASE_URL: postgres://testuser:testpass@localhost:5432/testdb

Use the real-database integration tests as a separate test suite or tag them (it.concurrent.skip without the environment variable) so the fast fake-based tests still run on every commit and the slow real-db tests run only on PRs or scheduled jobs.

Testing SSE streaming tools

Some MCP tools stream progressive results rather than returning a single response. With InMemoryTransport, streaming works the same as over the network — the client callTool() still returns when the stream completes, but you can pass a progress callback to observe intermediate results during the call.

it('streams progress events during long-running export', async () => {
  const c = await connect();
  const progressUpdates: string[] = [];

  const result = await c.callTool(
    { name: 'export_report', arguments: { format: 'csv' } },
    undefined,
    {
      onprogress: (progress) => {
        progressUpdates.push(progress.progressToken?.toString() ?? '');
      },
    }
  );

  expect(result.isError).toBeFalsy();
  // At least one progress event was emitted during the export
  expect(progressUpdates.length).toBeGreaterThan(0);
});

Keeping integration tests fast

Integration tests can be slow if they set up real infrastructure per test. Keep them fast with three techniques: create one server instance per test file (not per test), use fake dependencies for the default test suite and real dependencies only for a tagged subset, and run tests in parallel within a file by creating independent InMemoryTransport pairs per test.

// Share one client across an entire describe block
describe('user-service-mcp', () => {
  let client: Client;

  beforeAll(async () => {
    const [serverTransport, clientTransport] = InMemoryTransport.createLinkedPair();
    await createServer(fakeDeps).connect(serverTransport);
    client = new Client({ name: 'test', version: '1.0.0' }, { capabilities: {} });
    await client.connect(clientTransport);
  });

  afterAll(async () => client.close());

  it('...', async () => { /* uses shared client */ });
  it('...', async () => { /* uses shared client */ });
});

For tests that mutate shared state (e.g., create then delete a record in the fake db), create a separate client per test to avoid inter-test interference. See MCP server parallel testing for how to shard a large test suite across workers.

What integration tests catch that AliveMCP catches in production

Integration tests with InMemoryTransport run entirely in-process. They can detect: wrong tool names in the registry, incorrect inputSchema declarations, handler logic bugs, and missing error handling. What they cannot detect is whether the deployed server accepts real TCP connections, whether the TLS certificate is valid, whether the HTTP process started on the expected port, or whether a cloud deployment broke the network path between the registry and the endpoint. AliveMCP probes the live MCP initialize handshake over the network every 60 seconds and alerts you when the infrastructure fails — the gap integration tests leave uncovered.