Guide · Testing · Python MCP

Testing Python MCP servers — pytest, anyio, and the MCP client

Testing a Python MCP server has two distinct layers. Unit tests call your tool handler functions directly as plain Python async functions — no protocol overhead, fast feedback, easy mocking. Integration tests connect a real MCP client to your server over the actual transport (stdio or SSE), running the full initializetools/listtools/call sequence. Both layers are necessary: unit tests verify business logic quickly, integration tests verify the MCP protocol contract, schema registration, and transport behavior. This guide covers the setup for both, plus fixture patterns, mocking external dependencies, and running tests in CI.

TL;DR

Install pytest pytest-asyncio anyio[trio]. Mark async tests with @pytest.mark.anyio. Unit-test tool handlers by calling them directly as async functions. For integration tests, use the MCP Python SDK's stdio_client + ClientSession to connect to your server process and call tools over the real protocol. Mock external dependencies with pytest-mock's mocker.patch.object(). Use pytest-anyio fixtures for shared async resources (database connections, HTTP sessions) with proper teardown.

Installation

pip install pytest pytest-asyncio anyio mcp
# or with uv
uv add --dev pytest pytest-asyncio anyio mcp

# Optional but recommended
pip install pytest-mock  # for mocker fixture
pip install aiosqlite    # if testing SQLite tools

Configure pytest to use anyio as the async backend in pyproject.toml:

[tool.pytest.ini_options]
asyncio_mode = "auto"
# or use anyio's marker explicitly per test

Unit testing tool handlers directly

FastMCP tool handlers are plain async Python functions. You can call them directly in tests without starting the MCP server, which makes unit tests fast and easy to set up:

# server.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("calculator")

@mcp.tool()
async def add(a: int, b: int) -> int:
    """Add two integers."""
    return a + b

@mcp.tool()
async def divide(a: float, b: float) -> float:
    """Divide a by b."""
    if b == 0:
        raise ValueError("Division by zero is not allowed")
    return a / b
# tests/test_tools.py
import pytest
from server import add, divide

@pytest.mark.asyncio
async def test_add_positive():
    result = await add(3, 4)
    assert result == 7

@pytest.mark.asyncio
async def test_add_negative():
    result = await add(-5, 3)
    assert result == -2

@pytest.mark.asyncio
async def test_divide_normal():
    result = await divide(10.0, 4.0)
    assert result == pytest.approx(2.5)

@pytest.mark.asyncio
async def test_divide_by_zero():
    with pytest.raises(ValueError, match="Division by zero"):
        await divide(10.0, 0.0)

This pattern works even for tools with complex business logic — you call the function, assert the return value or expected exception, and the test runs in milliseconds. No MCP client, no transport overhead.

Mocking external dependencies

Tool handlers that call databases, HTTP APIs, or other external services need their dependencies mocked in unit tests. Use pytest-mock's mocker fixture:

# server.py
import aiohttp
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("weather-server")

async def _fetch_weather(city: str) -> dict:
    async with aiohttp.ClientSession() as s:
        async with s.get(f"https://api.weather.example/v1/{city}") as resp:
            resp.raise_for_status()
            return await resp.json()

@mcp.tool()
async def get_weather(city: str, units: str = "celsius") -> dict:
    """Get current weather for a city."""
    data = await _fetch_weather(city)
    temp = data["temperature"]
    if units == "fahrenheit":
        temp = temp * 9 / 5 + 32
    return {"city": city, "temperature": temp, "units": units, "condition": data["condition"]}
# tests/test_weather.py
import pytest
from unittest.mock import AsyncMock, patch
from server import get_weather

@pytest.mark.asyncio
async def test_get_weather_celsius():
    mock_data = {"temperature": 20.0, "condition": "sunny"}

    with patch("server._fetch_weather", new_callable=AsyncMock) as mock_fetch:
        mock_fetch.return_value = mock_data
        result = await get_weather("London")

    mock_fetch.assert_called_once_with("London")
    assert result["temperature"] == 20.0
    assert result["units"] == "celsius"

@pytest.mark.asyncio
async def test_get_weather_fahrenheit():
    mock_data = {"temperature": 20.0, "condition": "sunny"}

    with patch("server._fetch_weather", new_callable=AsyncMock) as mock_fetch:
        mock_fetch.return_value = mock_data
        result = await get_weather("London", units="fahrenheit")

    assert result["temperature"] == pytest.approx(68.0)

@pytest.mark.asyncio
async def test_get_weather_api_error():
    with patch("server._fetch_weather", new_callable=AsyncMock) as mock_fetch:
        mock_fetch.side_effect = RuntimeError("API unavailable")
        with pytest.raises(RuntimeError, match="API unavailable"):
            await get_weather("London")

Use AsyncMock (from unittest.mock) for mocking async functions. MagicMock is not awaitable — it will cause TypeError: object MagicMock is not awaitable.

Integration tests with the MCP client

Integration tests verify that your server works correctly over the real MCP protocol — that tools are registered correctly, that the initialize handshake succeeds, and that inputs and outputs serialize correctly through the protocol layer. Use the MCP Python SDK's client:

# tests/test_integration.py
import pytest
import anyio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

SERVER_PARAMS = StdioServerParameters(
    command="python",
    args=["server.py"],
    env={"DATABASE_URL": "sqlite:///test.db"}
)

@pytest.mark.anyio
async def test_tools_list():
    """Verify all expected tools are registered."""
    async with stdio_client(SERVER_PARAMS) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools = await session.list_tools()
            tool_names = {t.name for t in tools.tools}
            assert "get_weather" in tool_names
            assert "search_docs" in tool_names

@pytest.mark.anyio
async def test_get_weather_tool():
    """Call the get_weather tool over real protocol."""
    async with stdio_client(SERVER_PARAMS) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            result = await session.call_tool(
                "get_weather",
                arguments={"city": "London", "units": "celsius"}
            )
            assert not result.isError
            # result.content is a list of ContentBlock
            content = result.content[0]
            assert content.type == "text"
            # Parse the returned JSON text
            import json
            data = json.loads(content.text)
            assert data["city"] == "London"
            assert "temperature" in data

@pytest.mark.anyio
async def test_tool_validation_error():
    """Verify validation errors return isError:true, not protocol errors."""
    async with stdio_client(SERVER_PARAMS) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            result = await session.call_tool(
                "get_weather",
                arguments={"city": "London", "units": "invalid_unit"}
            )
            # Should be isError:true (validation error), not a raised exception
            assert result.isError
            error_text = result.content[0].text
            assert "units" in error_text.lower() or "invalid" in error_text.lower()

Integration tests are slower than unit tests (they spawn a subprocess) but catch issues that unit tests miss: tool registration bugs, schema mismatches between what Pydantic generates and what the protocol expects, stdout contamination (a stray print() breaking the stdio transport), and environment issues.

Shared fixtures for async resources

Use pytest fixtures with yield to share expensive setup across tests and ensure proper teardown:

# conftest.py
import pytest
import aiosqlite

@pytest.fixture(scope="session")
def anyio_backend():
    return "asyncio"

@pytest.fixture
async def db():
    """Provide a fresh in-memory SQLite database for each test."""
    async with aiosqlite.connect(":memory:") as conn:
        await conn.execute("""
            CREATE TABLE users (
                id TEXT PRIMARY KEY,
                name TEXT NOT NULL,
                email TEXT UNIQUE NOT NULL
            )
        """)
        await conn.commit()
        yield conn
        # Connection closes automatically when context exits

@pytest.fixture
async def populated_db(db):
    """Database pre-loaded with test data."""
    await db.executemany(
        "INSERT INTO users VALUES (?, ?, ?)",
        [
            ("usr_001", "Alice", "alice@example.com"),
            ("usr_002", "Bob", "bob@example.com"),
        ]
    )
    await db.commit()
    yield db
# tests/test_user_tools.py
import pytest
from server import get_user  # tool handler function
from unittest.mock import patch, AsyncMock

@pytest.mark.asyncio
async def test_get_user_found(populated_db):
    with patch("server.db_pool", populated_db):
        result = await get_user("usr_001")
    assert result["name"] == "Alice"
    assert result["email"] == "alice@example.com"

@pytest.mark.asyncio
async def test_get_user_not_found(populated_db):
    with patch("server.db_pool", populated_db):
        with pytest.raises(KeyError, match="usr_999"):
            await get_user("usr_999")

Testing Pydantic validation in tool inputs

When tools use Pydantic models for input, test both the happy path and validation failure cases. Pydantic validation happens before your tool handler runs — simulate it directly by calling the model:

# tests/test_validation.py
import pytest
from pydantic import ValidationError
from server import CreateIssueInput, create_issue
from unittest.mock import AsyncMock, patch

def test_create_issue_valid_input():
    """Model validates correctly for a valid input."""
    issue = CreateIssueInput(
        title="Fix login bug",
        body="The login form breaks when email contains a plus sign.",
        labels=["bug", "auth"],
        priority="high"
    )
    assert issue.title == "Fix login bug"
    assert issue.priority == "high"

def test_create_issue_title_too_short():
    with pytest.raises(ValidationError) as exc_info:
        CreateIssueInput(title="Bug", body="Short title")
    errors = exc_info.value.errors()
    assert any(e["loc"] == ("title",) for e in errors)

def test_create_issue_invalid_priority():
    with pytest.raises(ValidationError):
        CreateIssueInput(title="Fix something important", body="Details here.", priority="urgent")

def test_create_issue_critical_requires_body():
    with pytest.raises(ValidationError, match="at least 50 characters"):
        CreateIssueInput(title="Critical bug found", body="Short.", priority="critical")

@pytest.mark.asyncio
async def test_create_issue_tool_calls_tracker():
    issue = CreateIssueInput(
        title="Fix the thing",
        body="A detailed description of the issue that needs fixing in production.",
        priority="normal"
    )
    mock_tracker = AsyncMock(return_value=type("R", (), {"id": "ISS-123", "url": "https://..."})())
    with patch("server.tracker.create", mock_tracker):
        result = await create_issue(issue)
    assert result["id"] == "ISS-123"

CI pipeline with GitHub Actions

A minimal GitHub Actions workflow for testing a Python MCP server:

# .github/workflows/test.yml
name: Test MCP Server

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install uv
        uses: astral-sh/setup-uv@v3

      - name: Install dependencies
        run: uv sync --dev

      - name: Run unit tests
        run: uv run pytest tests/unit/ -v

      - name: Run integration tests
        run: uv run pytest tests/integration/ -v
        env:
          DATABASE_URL: sqlite:///test.db

      - name: Check tool schemas
        run: uv run python -c "
from server import mcp
import asyncio, json
tools = asyncio.run(mcp.list_tools())
print(f'Registered {len(tools)} tools')
for t in tools:
    schema = json.dumps(t.inputSchema, indent=2)
    assert schema, f'Empty schema for tool {t.name}'
print('All tool schemas valid')
"

Separate unit and integration tests into different directories and run them with different pytest markers or path arguments. Unit tests should run in under 5 seconds; integration tests that spawn subprocesses can take 30–60 seconds. Run unit tests on every commit and integration tests on PRs and main branch pushes.

Related questions

How do I test SSE transport instead of stdio in integration tests?

Start the FastMCP SSE server as a subprocess in a pytest fixture, wait for it to be ready (poll /health or the SSE endpoint), then use the MCP SDK's sse_client context manager instead of stdio_client. Tear down the subprocess in the fixture's cleanup. Alternatively, use httpx with SSE support directly to test the transport endpoints without the full MCP client.

Can I test tool registration without starting the server?

Yes. Call asyncio.run(mcp.list_tools()) or await mcp.list_tools() in an async test. FastMCP builds the tool list from the registered decorators without requiring an active transport. This is useful in CI to catch registration errors (duplicate tool names, invalid schema types) before running integration tests.

What's the difference between pytest-asyncio and anyio for async tests?

pytest-asyncio runs async tests on the asyncio event loop specifically. anyio with @pytest.mark.anyio runs tests on whichever async backend anyio is configured for (asyncio or trio). FastMCP uses anyio internally, so using anyio markers gives you closer alignment with the framework's own async runtime. Both work in practice — use anyio if you want trio compatibility or are already using anyio in your server code.

How do I test tools that modify state (writes, deletes)?

Use a fresh in-memory database fixture (SQLite :memory:) for each test, as shown in the shared fixtures example above. Each test gets a clean slate and can perform writes without affecting other tests. For PostgreSQL, use pytest-postgresql or wrap each test in a transaction that rolls back after the test. The key principle: tests should be order-independent and leave no state behind.

Further reading