Guide · Testing

MCP server test doubles

MCP server tests need to run fast, without real databases or external APIs. Test doubles — fakes, stubs, and spies — replace real dependencies so tool handlers can be tested in isolation. With InMemoryTransport providing a real MCP protocol layer, you only need test doubles for the external dependencies your handlers call: databases, HTTP APIs, file systems, and queues. The protocol itself doesn't need to be mocked.

TL;DR

Use fakes (in-memory Map or array that implements the same interface as your real database) as the default for all test suites — they're fast and don't require module patching. Use stubs (vi.fn().mockResolvedValue()) when a dependency is only called once and you don't need it to hold state. Use spies (vi.spyOn()) when you need to assert that an external call was made with the right arguments — email sends, webhook fires, audit log writes. Avoid mocking the MCP SDK itself; use real Server + Client + InMemoryTransport for the protocol layer.

The three types of test doubles

TypeWhat it isWhen to use for MCP
FakeA working mini-implementation — e.g., an in-memory Map that acts like a databaseDefault. Use for all handler dependencies across your entire test suite.
StubA function that returns a hardcoded value, ignoring its inputsSingle-use dependencies: a pricing API that always returns $9.99 in tests.
SpyA wrapper that records calls (arguments, count) and optionally controls return valuesSide-effect dependencies: assert that a webhook was fired with the correct payload.
MockA spy with pre-programmed behavior expectations (verify at the end)Rare. Use when call order matters; prefer fakes + spies for most MCP tests.

Dependency injection: the prerequisite

Test doubles only work if your server accepts its dependencies from outside rather than constructing them internally. The pattern: define a Deps interface, accept it as a constructor argument in your server factory, and pass the real implementations in production and test doubles in tests.

// server.ts
export interface Deps {
  db: {
    getUser: (id: string) => Promise<User | null>;
    saveUser: (user: User) => Promise<void>;
  };
  email: {
    sendWelcome: (to: string) => Promise<void>;
  };
  analytics: {
    track: (event: string, props: Record<string, unknown>) => void;
  };
}

export function createServer(deps: Deps): Server {
  const server = new Server(/* ... */);

  server.setRequestHandler(CallToolRequestSchema, async (request) => {
    if (request.params.name === 'create_user') {
      const { name, email } = request.params.arguments as { name: string; email: string };
      const user = { id: crypto.randomUUID(), name, email };
      await deps.db.saveUser(user);
      await deps.email.sendWelcome(email);
      deps.analytics.track('user_created', { userId: user.id });
      return { content: [{ type: 'text', text: JSON.stringify(user) }] };
    }
    // ...
  });

  return server;
}

Fakes: the default test double

A fake is a lightweight working implementation that behaves like the real thing but uses in-memory storage. The real database uses PostgreSQL; the fake uses a JavaScript Map. The key property: a fake produces realistic return values that test multiple tools interacting through shared state.

// test/fakes/fake-db.ts
import type { Deps } from '../server.js';

export function createFakeDb(): Deps['db'] {
  const users = new Map<string, User>();

  return {
    async getUser(id) {
      return users.get(id) ?? null;
    },
    async saveUser(user) {
      users.set(user.id, user);
    },
  };
}

The fake can be reset between tests by creating a new instance per test or by clearing the Map in beforeEach. With createFakeDb(), a new instance per createServer() call gives full test isolation without module-level cleanup.

describe('user tools', () => {
  let client: McpTestClient;

  beforeEach(async () => {
    // Fresh fake db per test — no state leaks
    const db = createFakeDb();
    client = await createMcpTestClient(createServer, {
      db,
      email: { sendWelcome: async () => {} }, // silent stub
      analytics: { track: () => {} },          // no-op stub
    });
  });

  afterEach(() => client.close());

  it('creates and retrieves a user via two separate tool calls', async () => {
    // create_user stores into the fake db
    const created = await client.callToolJson<User>(
      'create_user', { name: 'Alice', email: 'alice@example.com' }
    );
    // get_user reads from the same fake db instance
    const fetched = await client.callToolJson<User>(
      'get_user', { userId: created.id }
    );
    expect(fetched.name).toBe('Alice');
  });
});

Stubs: single-dependency canned returns

A stub returns a fixed value regardless of its input. Use Vitest's vi.fn() when a dependency is only exercised once in a test and you don't need it to track state — for example, an external pricing API that your tool calls to look up the current subscription price.

import { vi } from 'vitest';

it('formats the price returned by the pricing API', async () => {
  const stubPricing = {
    // Returns $9.99 for any plan ID
    getPrice: vi.fn().mockResolvedValue({ amount: 9.99, currency: 'USD' }),
  };

  const client = await createMcpTestClient(createServer, {
    db: createFakeDb(),
    pricing: stubPricing,
    // ...
  });

  const result = await client.callToolText('get_plan_price', { planId: 'author' });
  expect(result).toContain('$9.99');
});

Stubs are intentionally dumb — they don't validate that planId was passed correctly. If you need to assert that the stub was called with the right arguments, upgrade to a spy.

Spies: asserting side effects

MCP tools often have side effects: sending an email, firing a webhook, writing to an audit log. Spies let you assert these side effects happened with the right arguments without needing a real email server or webhook receiver.

import { vi, expect } from 'vitest';

it('sends a welcome email when create_user is called', async () => {
  const emailSpy = {
    sendWelcome: vi.fn().mockResolvedValue(undefined),
  };

  const client = await createMcpTestClient(createServer, {
    db: createFakeDb(),
    email: emailSpy,
    analytics: { track: () => {} },
  });

  await client.callToolText('create_user', { name: 'Bob', email: 'bob@example.com' });

  expect(emailSpy.sendWelcome).toHaveBeenCalledOnce();
  expect(emailSpy.sendWelcome).toHaveBeenCalledWith('bob@example.com');
});

it('tracks an analytics event on user creation', async () => {
  const trackSpy = vi.fn();
  const client = await createMcpTestClient(createServer, {
    db: createFakeDb(),
    email: { sendWelcome: async () => {} },
    analytics: { track: trackSpy },
  });

  const user = await client.callToolJson<User>(
    'create_user', { name: 'Carol', email: 'carol@example.com' }
  );

  expect(trackSpy).toHaveBeenCalledWith('user_created', {
    userId: user.id,
  });
});

Spy on error paths

Spies are especially useful for testing that error paths trigger the right side effects. If your handler calls a Sentry-like error reporter on unexpected exceptions, assert that the reporter was called with the error instance.

it('reports unexpected errors to the error tracker', async () => {
  const reportError = vi.fn();
  const breakingDb = {
    getUser: vi.fn().mockRejectedValue(new Error('ECONNRESET')),
    saveUser: vi.fn(),
  };

  const client = await createMcpTestClient(createServer, {
    db: breakingDb,
    email: { sendWelcome: async () => {} },
    analytics: { track: () => {} },
    errorTracker: { report: reportError },
  });

  // The tool call should handle the error and return isError
  const result = await client.client.callTool({ name: 'get_user', arguments: { userId: 'x' } });
  expect(result.isError).toBe(true);

  // The error was reported upstream
  expect(reportError).toHaveBeenCalledOnce();
  expect(reportError.mock.calls[0][0]).toBeInstanceOf(Error);
});

Why you don't need to mock InMemoryTransport

A common mistake when first testing MCP servers is to mock the Server class or the transport layer. Don't. InMemoryTransport is already a test double for the real network transport — it routes messages in-process with zero latency. Using it gives you a real protocol stack (real JSON-RPC encoding, real request routing, real capability negotiation) with the speed of an in-process call. Mocking the Server itself only tests your test setup, not your server code.

LayerTest double to useRationale
MCP transport (network)InMemoryTransport (SDK built-in)Provided by the SDK; no need to mock
DatabaseFake (in-memory Map)Stateful; needs to behave like the real thing
External HTTP APIStub (canned returns) or Spy (assert calls)No state; call assertions are the goal
Email / Slack senderSpy (verify it was called)Side effect — must assert it fired
Cache (Redis)Fake (in-memory Map with TTL simulation)Read/write state needs to persist across calls

Where test doubles stop and AliveMCP starts

Test doubles replace real dependencies to make tests fast and deterministic. But the gap they leave is the real dependencies themselves — the production database, the real email provider, the actual network path to your deployed MCP endpoint. AliveMCP probes the live production endpoint every 60 seconds, verifying that the real infrastructure your server depends on is healthy. Tests assert your logic is correct; AliveMCP asserts the deployed system is reachable.

Related questions

When should I prefer a fake over a stub?

Use a fake when the dependency holds state that multiple operations share. A fake database lets you call create_user and then get_user in the same test, and the second call finds the record the first call saved. A stub returns canned values regardless of inputs — it can't simulate "save then retrieve". Rule of thumb: if two tool calls need to interact through the dependency, use a fake. If only one tool call uses the dependency and you just need it to return something, use a stub.

Should I use vi.mock() to mock the database module?

Avoid module-level mocking (vi.mock('./db.js')) when you have dependency injection. Module mocks apply globally to the entire test file and require careful reset logic. DI fakes are explicit per-test and don't interact with each other. Use vi.mock() only for modules you can't inject — legacy code with static imports, third-party libraries that construct their own dependencies internally, or Node built-ins like fs and crypto.

How do I test a tool that uses setTimeout or Date.now()?

Pass time-related functions as dependencies: deps.now = () => Date.now() and deps.sleep = (ms: number) => new Promise(r => setTimeout(r, ms)). In tests, replace them with deterministic stubs: now: vi.fn().mockReturnValue(1700000000000) and sleep: vi.fn().mockResolvedValue(undefined). This is cleaner than Vitest's fake timers for MCP handlers because the async call stack in handlers can interact unexpectedly with global timer mocking.

Can I reuse fakes across test files?

Yes. Define fakes in test/fakes/ and import them. The key is that fakes are stateful — each import gives you the factory function; each call to the factory gives a fresh instance. Sharing a single fake instance across files leads to inter-test state leaks. Share the factory, not the instance: import { createFakeDb } from '@test/fakes/fake-db.js'; const db = createFakeDb();.

Further reading