Guide · E2E Testing

MCP server end-to-end testing

Most MCP testing guides tell you to use InMemoryTransport — it's fast, it's deterministic, and it tests the full JSON-RPC protocol in-process. But in-process transports leave a gap: your server can pass every unit test and still fail when a real MCP client connects over SSE or stdio. E2E tests close that gap by spawning the actual server process, connecting a real Client from @modelcontextprotocol/sdk, and driving the full initializetools/listtools/call sequence over the wire. The non-obvious part: SSE and stdio transports fail in different ways, and a server that passes SSE E2E tests can still fail stdio E2E tests. If you support both transports, you need both test modes.

TL;DR

Transport mocks (including InMemoryTransport) skip the actual network and process layer, so bugs in SSE event framing, CORS headers, HTTP keep-alive, stdio process lifecycle, and JSON-RPC framing go undetected until a real client hits your server. True mcp server e2e testing spawns the server as a child process, connects an SDK Client via SSEClientTransport or StdioClientTransport, and tears the process down after each test. Parameterize the same test suite across both transport types with a factory function so you get transport-independent coverage with no duplicated test logic.

Why transport mocks miss whole bug classes

Unit tests with InMemoryTransport and integration tests share one property: the transport layer is bypassed. Messages travel through an in-process channel rather than a socket, a pipe, or an HTTP connection. That's ideal for speed and isolation — but it means a range of failure modes never get exercised:

Failure modeInMemoryTransportReal SSE transportReal stdio transport
Wrong SSE event format (data: prefix, double newline)Never triggeredCaughtN/A
Missing or incorrect CORS headersNever triggeredCaughtN/A
HTTP keep-alive or chunked transfer encoding issuesNever triggeredCaughtN/A
JSON-RPC framing errors (missing newline delimiter in stdio)Never triggeredN/ACaught
Server process exits on startup (bad env var, missing file)Never triggeredCaughtCaught
Server crashes mid-session (memory error, unhandled promise)Never triggeredCaughtCaught
Port binding race condition on startupNever triggeredCaughtN/A
Stdio pipe closure not handled gracefullyNever triggeredN/ACaught
Tool handler logic bugs (wrong computation, bad output)CaughtCaughtCaught
JSON-RPC protocol negotiation bugsCaughtCaughtCaught

The SSE-specific failures are particularly insidious. An MCP server that works perfectly when you call it with curl can still fail when a real SSE client connects — because curl reads the raw stream while the SSE client expects strictly formatted data: ...\n\n events. A missing newline at the end of an event silently stalls the client. A response without Content-Type: text/event-stream causes the browser's EventSource (and the SDK's SSE client) to reject the connection immediately. None of these surface in in-process transport tests.

Stdio transport has its own failure class. The MCP stdio protocol delimits messages with newlines: each JSON-RPC message must be a single line terminated by \n. If your server writes a pretty-printed JSON object (multi-line), the stdio client reads it as multiple partial messages and fails to parse any of them. This is one of the most common debugging scenarios for stdio servers — it never manifests in InMemoryTransport tests.

Setting up a real E2E test harness

A real E2E test harness needs to: (1) spawn the server process, (2) wait for it to be ready to accept connections, (3) connect an SDK client, (4) run assertions, and (5) kill the process and clean up. The tricky part is step 2 — you can't connect before the server is listening. For SSE servers, poll the health endpoint or the MCP endpoint; for stdio servers, the process being alive is sufficient.

First, install the dependencies:

npm install --save-dev vitest @types/node
# The MCP SDK is already in your dependencies
# npm install @modelcontextprotocol/sdk

Here is a complete SSE E2E harness:

// test/e2e/harness-sse.ts
import { spawn, ChildProcess } from 'node:child_process';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';

export interface E2EHandle {
	client: Client;
	baseUrl: string;
	teardown: () => Promise<void>;
}

/** Wait until the server responds to a GET on the given URL, or throw on timeout */
async function waitForServer(url: string, timeoutMs = 10_000): Promise<void> {
	const deadline = Date.now() + timeoutMs;
	while (Date.now() < deadline) {
		try {
			const res = await fetch(url, { signal: AbortSignal.timeout(500) });
			if (res.status < 500) return; // anything that isn't a server crash
		} catch {
			// ECONNREFUSED or timeout — server not up yet
		}
		await new Promise(r => setTimeout(r, 150));
	}
	throw new Error(`Server at ${url} did not become ready within ${timeoutMs}ms`);
}

export async function startSseServer(
	serverScript: string,
	port: number,
	env: Record<string, string> = {}
): Promise<E2EHandle> {
	const proc: ChildProcess = spawn(
		process.execPath, // node binary
		[serverScript],
		{
			env: { ...process.env, PORT: String(port), ...env },
			stdio: ['ignore', 'pipe', 'pipe'],
		}
	);

	// Capture stderr for diagnostics if the process dies
	const stderrLines: string[] = [];
	proc.stderr?.on('data', (chunk: Buffer) => {
		stderrLines.push(chunk.toString());
	});

	const baseUrl = `http://127.0.0.1:${port}`;

	// Kill the server if this process exits
	const onExit = () => proc.kill('SIGTERM');
	process.once('exit', onExit);

	try {
		await waitForServer(`${baseUrl}/health`);
	} catch (err) {
		proc.kill('SIGTERM');
		throw new Error(
			`Server failed to start.\nStderr:\n${stderrLines.join('')}\n${err}`
		);
	}

	const transport = new SSEClientTransport(new URL(`${baseUrl}/sse`));
	const client = new Client(
		{ name: 'e2e-test-client', version: '0.0.0' },
		{ capabilities: {} }
	);
	await client.connect(transport);

	return {
		client,
		baseUrl,
		teardown: async () => {
			process.removeListener('exit', onExit);
			await client.close();
			proc.kill('SIGTERM');
			// Wait for process to exit so the port is freed for the next test
			await new Promise<void>(resolve => {
				proc.once('exit', () => resolve());
				setTimeout(resolve, 2_000); // safety timeout
			});
		},
	};
}

And the stdio harness, which is simpler because there's no port to wait on:

// test/e2e/harness-stdio.ts
import { spawn } from 'node:child_process';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
import type { E2EHandle } from './harness-sse.js';

export async function startStdioServer(
	serverScript: string,
	env: Record<string, string> = {}
): Promise<E2EHandle> {
	const transport = new StdioClientTransport({
		command: process.execPath,
		args: [serverScript],
		env: { ...process.env, ...env },
	});

	const client = new Client(
		{ name: 'e2e-test-client', version: '0.0.0' },
		{ capabilities: {} }
	);

	// StdioClientTransport spawns the process and connects in one step
	await client.connect(transport);

	return {
		client,
		baseUrl: 'stdio://',
		teardown: async () => {
			await client.close();
			// transport.close() sends EOF on stdin, triggering the server's graceful shutdown
		},
	};
}

Testing both stdio and SSE transports

The key insight for end-to-end MCP server tests across both transport types is to write the test logic once and parameterize it over a factory function. Each factory function returns an E2EHandle — same shape, different underlying transport. The test suite runs identically against both.

// test/e2e/mcp-server.e2e.ts
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { startSseServer } from './harness-sse.js';
import { startStdioServer } from './harness-stdio.js';
import type { E2EHandle } from './harness-sse.js';
import type { Client } from '@modelcontextprotocol/sdk/client/index.js';

// Path to the built server entry points
const SSE_SERVER  = new URL('../../dist/server-sse.js',  import.meta.url).pathname;
const STDIO_SERVER = new URL('../../dist/server-stdio.js', import.meta.url).pathname;

// Test factory: same suite, different transport
function describeTransport(
	name: string,
	factory: () => Promise<E2EHandle>
) {
	describe(name, () => {
		let handle: E2EHandle;
		let client: Client;

		beforeEach(async () => {
			handle = await factory();
			client = handle.client;
		}, 15_000); // allow up to 15s for server startup

		afterEach(async () => {
			await handle.teardown();
		}, 5_000);

		it('completes the initialize handshake', async () => {
			// If beforeEach succeeded, initialize already completed.
			// Verify the server info is populated.
			const info = client.getServerVersion();
			expect(info).toBeDefined();
			expect(info?.name).toBeTruthy();
		});

		it('returns a non-empty tools list', async () => {
			const { tools } = await client.listTools();

			expect(Array.isArray(tools)).toBe(true);
			expect(tools.length).toBeGreaterThan(0);
		});

		it('tools have required schema fields', async () => {
			const { tools } = await client.listTools();

			for (const tool of tools) {
				expect(typeof tool.name).toBe('string');
				expect(tool.name.length).toBeGreaterThan(0);
				expect(typeof tool.description).toBe('string');
				expect(tool.inputSchema).toBeDefined();
				expect(tool.inputSchema.type).toBe('object');
			}
		});

		it('executes a tool call and returns text content', async () => {
			const result = await client.callTool({
				name: 'echo',
				arguments: { message: 'hello from e2e' },
			});

			expect(result.isError).toBeFalsy();
			expect(result.content).toHaveLength(1);
			expect(result.content[0].type).toBe('text');
			expect((result.content[0] as { type: 'text'; text: string }).text)
				.toContain('hello from e2e');
		});

		it('returns isError for a missing required argument', async () => {
			const result = await client.callTool({
				name: 'echo',
				arguments: {}, // 'message' is required
			});

			expect(result.isError).toBe(true);
		});

		it('returns an error response for an unknown tool name', async () => {
			await expect(
				client.callTool({ name: 'nonexistent_tool_xyz', arguments: {} })
			).rejects.toThrow(); // protocol-level error, not isError:true
		});
	});
}

// Run the full suite against both transports
describeTransport('SSE transport E2E', () =>
	startSseServer(SSE_SERVER, 47821)
);

describeTransport('stdio transport E2E', () =>
	startStdioServer(STDIO_SERVER)
);

The describeTransport factory produces two full describe blocks — one for SSE, one for stdio — each spawning a real server process and exercising the real protocol. When a test fails in the SSE block but passes in the stdio block, you have a transport-specific bug: the tool logic is correct but something in the SSE layer is broken. That's the exact diagnostic signal transport mocks can never give you.

Note the port choice: pick a fixed, high-numbered port (47821 in this example) and document it in your project. Use a different port per test file if you run multiple test files in parallel. Alternatively, pick a random available port at test startup using the OS (listen(0)) and pass it through the environment.

What to assert in E2E tests

MCP protocol integration testing should assert at three levels: the handshake, the contract, and the behavior.

1. The initialize handshake

The MCP initialize exchange negotiates protocol version and capabilities. If this fails, no further calls succeed. Your E2E tests implicitly test this — client.connect(transport) performs the handshake and throws if it fails. Make it explicit by asserting on the server version and capabilities after connect:

it('negotiate MCP protocol version 2024-11-05 or later', async () => {
	const version = client.getServerVersion();
	expect(version).toBeDefined();

	const caps = client.getServerCapabilities();
	expect(caps?.tools).toBeDefined(); // server must advertise tool support
});

2. The tools/list contract

Assert on the exact tool names and their inputSchema required fields. This is different from a schema snapshot (which detects drift) — the E2E assertion validates the contract in terms a consumer understands:

it('exposes the expected tool set', async () => {
	const { tools } = await client.listTools();
	const names = tools.map(t => t.name).sort();

	expect(names).toEqual(['create_issue', 'echo', 'get_issue', 'list_issues']);
});

it('create_issue has required project and title arguments', async () => {
	const { tools } = await client.listTools();
	const createIssue = tools.find(t => t.name === 'create_issue');

	expect(createIssue?.inputSchema.required).toContain('project');
	expect(createIssue?.inputSchema.required).toContain('title');
	expect(createIssue?.inputSchema.properties?.project?.type).toBe('string');
});

3. Tool call output format

Assert the output shape, not just the status. MCP tool responses are content arrays — each item has a type field. Test that your tool returns the expected content type and that the payload is parseable:

it('list_issues returns JSON-parseable content', async () => {
	const result = await client.callTool({
		name: 'list_issues',
		arguments: { project: 'test-project', limit: 10 },
	});

	expect(result.isError).toBeFalsy();

	const text = (result.content[0] as { type: 'text'; text: string }).text;
	let parsed: unknown;
	expect(() => { parsed = JSON.parse(text); }).not.toThrow();
	expect(Array.isArray(parsed)).toBe(true);
});

4. Negative tests (error responses)

Distinguish between the two error modes. A tool that catches a bad input and returns { isError: true } resolves the call promise normally. A protocol-level error (unknown tool, invalid argument type) rejects the promise with an McpError. Your E2E tests should explicitly cover both:

it('get_issue with non-existent ID returns isError content', async () => {
	// Tool caught the error and returned it as content
	const result = await client.callTool({
		name: 'get_issue',
		arguments: { id: 'does-not-exist-99999' },
	});
	expect(result.isError).toBe(true);
	expect((result.content[0] as any).text).toMatch(/not found/i);
});

it('passing wrong argument type throws an McpError', async () => {
	// Schema validation at the protocol layer — rejects the promise
	await expect(
		client.callTool({
			name: 'get_issue',
			arguments: { id: 12345 }, // should be string, not number
		})
	).rejects.toThrow();
});

CI integration

E2E tests that spawn real server processes need a bit more care in CI than unit tests. The main concerns are: process cleanup on test failure, port conflicts, and build ordering (you must build the server before the E2E tests can run).

Here is a complete GitHub Actions job that handles all three:

# .github/workflows/e2e.yml
name: E2E Tests

on: [push, pull_request]

jobs:
  e2e:
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '22'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Build server
        run: npm run build
        # Must produce dist/server-sse.js and dist/server-stdio.js

      - name: Run unit and integration tests
        run: npm run test:unit

      - name: Run E2E tests
        run: npm run test:e2e
        env:
          # Provide any secrets the server needs at startup
          DB_URL: ${{ secrets.E2E_DB_URL }}
          API_KEY: test-api-key-for-e2e
        timeout-minutes: 5

In package.json, separate the E2E script from the unit test script so they can be run independently:

{
  "scripts": {
    "test:unit": "vitest run --exclude 'test/e2e/**'",
    "test:e2e": "vitest run test/e2e --reporter=verbose",
    "test": "npm run test:unit && npm run test:e2e"
  }
}

In a Docker-based CI environment, run the E2E tests inside the container image that will be deployed. This catches the failure mode where the server works locally but crashes at container startup because a required environment variable or file is missing:

# In your Dockerfile, add a health check stage
FROM node:22-alpine AS e2e
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
COPY test ./test
RUN npm run test:e2e

The Docker build fails if E2E tests fail — the image is never pushed. This is stronger than a post-deploy probe because it prevents bad images from ever reaching the registry.

For process cleanup on test failure, Vitest's afterEach always runs even when a test throws. The teardown() call in afterEach sends SIGTERM to the server process. In CI, orphaned processes are killed when the job ends anyway, but cleaning up promptly prevents port conflicts between tests in the same run.

E2E tests and AliveMCP monitoring

E2E tests and AliveMCP monitoring are complementary — they operate at different points in the lifecycle and catch different things.

E2E tests catch pre-deploy regressions. They run in CI against a locally-built server binary before the server is deployed. They verify that the build artifact is correct: the MCP protocol handshake succeeds, the tool list matches the expected contract, tools return the right output shape. A failed E2E test blocks the deploy. By the time the server reaches production, E2E tests have already confirmed it speaks valid MCP over both transport modes.

AliveMCP catches post-deploy regressions. Once the server is deployed, E2E tests are no longer running. Configuration changes, infrastructure failures, upstream API outages, memory leaks causing process crashes, and certificate expirations all happen after the deploy and are invisible to pre-deploy tests. AliveMCP probes the live initialize + tools/list endpoint every 60 seconds, checks for protocol-correct responses, and pages your team the moment a check fails.

E2E tests (pre-deploy)AliveMCP (post-deploy)
When it runsCI, on every push and PRContinuously, every 60 seconds
Against whatLocal build artifactLive production endpoint
CatchesCode regressions, transport bugs, protocol errors in new codeInfrastructure failures, config drift, upstream outages, crashes
Action on failureBlocks merge / deployPages on-call, triggers incident response
ScopeAll tool calls and error pathsinitialize + tools/list handshake

A common architecture: the E2E test suite in CI verifies correctness before deploy; the post-deploy CI step runs a single initialize + tools/list probe to confirm the server came up healthy; and AliveMCP runs continuously thereafter to catch anything that changes after the post-deploy probe passes. This is the same layered approach your general MCP server testing strategy should follow: each layer catches what the others miss.

One practical tip: export the same tool-list hash from your E2E suite that your post-deploy probe and AliveMCP check against. If the hash is in version control, any deployment that changes the tool contract is immediately visible in the AliveMCP monitoring dashboard — the probe response changes even if the server is technically "up."

Related questions

Should E2E tests use a real database or a test database?

A test database — either an in-memory SQLite database seeded with known fixtures, or a separate test schema in Postgres. The E2E harness should set a DATABASE_URL environment variable pointing at the test database. This gives you realistic behavior (real SQL, real query plans) without polluting production data. Wipe and reseed the database in beforeEach or beforeAll depending on whether tests are isolated. If your tool calls modify state, test isolation at the database level is essential for deterministic results.

How do I pick a port for the SSE server in E2E tests?

Two approaches: a fixed high-numbered port per test file (e.g. 47821 for the main E2E suite, 47822 for the auth E2E suite), or a dynamic port. For dynamic ports, have your server process write its bound port to stdout on startup — parse it in the harness's waitForServer step. The fixed-port approach is simpler and works when test files run sequentially. Dynamic ports are necessary when you run multiple test files in parallel (vitest --pool=forks).

How is E2E testing different from the MCP Inspector?

The MCP Inspector is an interactive tool for manual exploration and debugging — you open a browser UI, connect to a server, and call tools by hand. E2E tests are automated and run in CI: they spawn the server programmatically, call tools with predefined arguments, and assert on specific outputs. Use the Inspector for development and debugging; use E2E tests for regression prevention in CI. They complement each other — the Inspector helps you understand failure modes that your E2E tests then codify as permanent regression guards.

Do I need E2E tests if I already have integration tests?

Integration tests with InMemoryTransport cover protocol correctness and tool handler behavior in-process. E2E tests cover the real transport layer. If your server only supports a single transport and you've confirmed the SSE or stdio framing is straightforward, you might deprioritize E2E tests. But if you support both transports, or if you've ever hit a real-client connection bug that only appeared in production, E2E tests pay for themselves quickly. The harness setup is roughly 50–100 lines of code; the ongoing maintenance cost is low since the tests run against the real binary.

Can I run E2E tests against a staging environment instead of a local server?

Yes — replace the startSseServer call with a version that skips process spawning and connects directly to the staging URL. This is useful as a post-deploy smoke test. The risk is that staging state (databases, external APIs) can cause non-deterministic failures. A hybrid approach: run the full E2E suite against a locally-spawned server with a seeded test database in CI, and run a subset of read-only smoke tests against staging after deploy. AliveMCP's continuous probes cover the gap between smoke tests.

Further reading

Know when your MCP server is down — before users do

AliveMCP probes your server's MCP endpoint every minute, detects protocol errors and transport failures, and pages you before users notice.

Start monitoring free