Guide · Testing

MCP server test doubles

MCP server tests need to run fast, without real databases or external APIs. Test doubles — fakes, stubs, and spies — replace real dependencies so tool handlers can be tested in isolation. With InMemoryTransport providing a real MCP protocol layer, you only need test doubles for the external dependencies your handlers call: databases, HTTP APIs, file systems, and queues. The protocol itself doesn't need to be mocked.

TL;DR

Use fakes (in-memory Map or array that implements the same interface as your real database) as the default for all test suites — they're fast and don't require module patching. Use stubs (vi.fn().mockResolvedValue()) when a dependency is only called once and you don't need it to hold state. Use spies (vi.spyOn()) when you need to assert that an external call was made with the right arguments — email sends, webhook fires, audit log writes. Avoid mocking the MCP SDK itself; use real Server + Client + InMemoryTransport for the protocol layer.

The three types of test doubles

Type	What it is	When to use for MCP
Fake	A working mini-implementation — e.g., an in-memory Map that acts like a database	Default. Use for all handler dependencies across your entire test suite.
Stub	A function that returns a hardcoded value, ignoring its inputs	Single-use dependencies: a pricing API that always returns $9.99 in tests.
Spy	A wrapper that records calls (arguments, count) and optionally controls return values	Side-effect dependencies: assert that a webhook was fired with the correct payload.
Mock	A spy with pre-programmed behavior expectations (verify at the end)	Rare. Use when call order matters; prefer fakes + spies for most MCP tests.

Dependency injection: the prerequisite

Test doubles only work if your server accepts its dependencies from outside rather than constructing them internally. The pattern: define a Deps interface, accept it as a constructor argument in your server factory, and pass the real implementations in production and test doubles in tests.

// server.ts
export interface Deps {
  db: {
    getUser: (id: string) => Promise<User | null>;
    saveUser: (user: User) => Promise<void>;
  };
  email: {
    sendWelcome: (to: string) => Promise<void>;
  };
  analytics: {
    track: (event: string, props: Record<string, unknown>) => void;
  };
}

export function createServer(deps: Deps): Server {
  const server = new Server(/* ... */);

  server.setRequestHandler(CallToolRequestSchema, async (request) => {
    if (request.params.name === 'create_user') {
      const { name, email } = request.params.arguments as { name: string; email: string };
      const user = { id: crypto.randomUUID(), name, email };
      await deps.db.saveUser(user);
      await deps.email.sendWelcome(email);
      deps.analytics.track('user_created', { userId: user.id });
      return { content: [{ type: 'text', text: JSON.stringify(user) }] };
    }
    // ...
  });

  return server;
}

Fakes: the default test double

A fake is a lightweight working implementation that behaves like the real thing but uses in-memory storage. The real database uses PostgreSQL; the fake uses a JavaScript Map. The key property: a fake produces realistic return values that test multiple tools interacting through shared state.

// test/fakes/fake-db.ts
import type { Deps } from '../server.js';

export function createFakeDb(): Deps['db'] {
  const users = new Map<string, User>();

  return {
    async getUser(id) {
      return users.get(id) ?? null;
    },
    async saveUser(user) {
      users.set(user.id, user);
    },
  };
}

The fake can be reset between tests by creating a new instance per test or by clearing the Map in beforeEach. With createFakeDb(), a new instance per createServer() call gives full test isolation without module-level cleanup.

describe('user tools', () => {
  let client: McpTestClient;

  beforeEach(async () => {
    // Fresh fake db per test — no state leaks
    const db = createFakeDb();
    client = await createMcpTestClient(createServer, {
      db,
      email: { sendWelcome: async () => {} }, // silent stub
      analytics: { track: () => {} },          // no-op stub
    });
  });

  afterEach(() => client.close());

  it('creates and retrieves a user via two separate tool calls', async () => {
    // create_user stores into the fake db
    const created = await client.callToolJson<User>(
      'create_user', { name: 'Alice', email: 'alice@example.com' }
    );
    // get_user reads from the same fake db instance
    const fetched = await client.callToolJson<User>(
      'get_user', { userId: created.id }
    );
    expect(fetched.name).toBe('Alice');
  });
});

Stubs: single-dependency canned returns

A stub returns a fixed value regardless of its input. Use Vitest's vi.fn() when a dependency is only exercised once in a test and you don't need it to track state — for example, an external pricing API that your tool calls to look up the current subscription price.

import { vi } from 'vitest';

it('formats the price returned by the pricing API', async () => {
  const stubPricing = {
    // Returns $9.99 for any plan ID
    getPrice: vi.fn().mockResolvedValue({ amount: 9.99, currency: 'USD' }),
  };

  const client = await createMcpTestClient(createServer, {
    db: createFakeDb(),
    pricing: stubPricing,
    // ...
  });

  const result = await client.callToolText('get_plan_price', { planId: 'author' });
  expect(result).toContain('$9.99');
});

Stubs are intentionally dumb — they don't validate that planId was passed correctly. If you need to assert that the stub was called with the right arguments, upgrade to a spy.

Spies: asserting side effects

MCP tools often have side effects: sending an email, firing a webhook, writing to an audit log. Spies let you assert these side effects happened with the right arguments without needing a real email server or webhook receiver.

import { vi, expect } from 'vitest';

it('sends a welcome email when create_user is called', async () => {
  const emailSpy = {
    sendWelcome: vi.fn().mockResolvedValue(undefined),
  };

  const client = await createMcpTestClient(createServer, {
    db: createFakeDb(),
    email: emailSpy,
    analytics: { track: () => {} },
  });

  await client.callToolText('create_user', { name: 'Bob', email: 'bob@example.com' });

  expect(emailSpy.sendWelcome).toHaveBeenCalledOnce();
  expect(emailSpy.sendWelcome).toHaveBeenCalledWith('bob@example.com');
});

it('tracks an analytics event on user creation', async () => {
  const trackSpy = vi.fn();
  const client = await createMcpTestClient(createServer, {
    db: createFakeDb(),
    email: { sendWelcome: async () => {} },
    analytics: { track: trackSpy },
  });

  const user = await client.callToolJson<User>(
    'create_user', { name: 'Carol', email: 'carol@example.com' }
  );

  expect(trackSpy).toHaveBeenCalledWith('user_created', {
    userId: user.id,
  });
});

Spy on error paths

Spies are especially useful for testing that error paths trigger the right side effects. If your handler calls a Sentry-like error reporter on unexpected exceptions, assert that the reporter was called with the error instance.

it('reports unexpected errors to the error tracker', async () => {
  const reportError = vi.fn();
  const breakingDb = {
    getUser: vi.fn().mockRejectedValue(new Error('ECONNRESET')),
    saveUser: vi.fn(),
  };

  const client = await createMcpTestClient(createServer, {
    db: breakingDb,
    email: { sendWelcome: async () => {} },
    analytics: { track: () => {} },
    errorTracker: { report: reportError },
  });

  // The tool call should handle the error and return isError
  const result = await client.client.callTool({ name: 'get_user', arguments: { userId: 'x' } });
  expect(result.isError).toBe(true);

  // The error was reported upstream
  expect(reportError).toHaveBeenCalledOnce();
  expect(reportError.mock.calls[0][0]).toBeInstanceOf(Error);
});

Why you don't need to mock InMemoryTransport

A common mistake when first testing MCP servers is to mock the Server class or the transport layer. Don't. InMemoryTransport is already a test double for the real network transport — it routes messages in-process with zero latency. Using it gives you a real protocol stack (real JSON-RPC encoding, real request routing, real capability negotiation) with the speed of an in-process call. Mocking the Server itself only tests your test setup, not your server code.

Layer	Test double to use	Rationale
MCP transport (network)	`InMemoryTransport` (SDK built-in)	Provided by the SDK; no need to mock
Database	Fake (in-memory Map)	Stateful; needs to behave like the real thing
External HTTP API	Stub (canned returns) or Spy (assert calls)	No state; call assertions are the goal
Email / Slack sender	Spy (verify it was called)	Side effect — must assert it fired
Cache (Redis)	Fake (in-memory Map with TTL simulation)	Read/write state needs to persist across calls

Where test doubles stop and AliveMCP starts

Test doubles replace real dependencies to make tests fast and deterministic. But the gap they leave is the real dependencies themselves — the production database, the real email provider, the actual network path to your deployed MCP endpoint. AliveMCP probes the live production endpoint every 60 seconds, verifying that the real infrastructure your server depends on is healthy. Tests assert your logic is correct; AliveMCP asserts the deployed system is reachable.