Guide · Agentic Frameworks

MCP server AutoGen integration

Microsoft AutoGen structures multi-agent conversations where AssistantAgent and UserProxyAgent instances exchange messages, with agents able to call registered tools and return results into the conversation. MCP servers integrate as the tool implementation layer: you wrap each MCP tool call in an async Python function and register it on the appropriate agent. AutoGen's tool system is framework-agnostic — any Python callable with type annotations becomes a tool — which makes MCP integration straightforward but puts connection lifecycle management and error handling entirely in your code. The critical considerations are maintaining a persistent connection pool, returning error strings (not raising exceptions) so AutoGen's conversation continues, and monitoring MCP servers that power long-running unattended AutoGen workflows.

TL;DR

In AutoGen v0.4, create an async Python function that calls your MCP server tool via httpx or the MCP SDK client, then register it with register_function(fn, caller=assistant, executor=proxy, name="...", description="..."). Return error information as strings rather than raising exceptions — AutoGen injects error strings back into the conversation, while uncaught exceptions abort the current turn. Maintain one MCP connection per server across the conversation rather than reconnecting per call. Monitor MCP servers with AliveMCP since AutoGen workflows often run for many turns before a dead server surfaces as a failure.

AutoGen v0.4 architecture

AutoGen v0.4 (the Python rewrite, released late 2024) changed significantly from v0.2. The key concepts relevant to MCP integration:

ConceptDescriptionMCP relevance
AssistantAgentLLM-backed agent that reasons and selects toolsRegistered as tool caller — decides when to call MCP tools
UserProxyAgentExecutes tool calls and manages conversation terminationRegistered as tool executor — actually runs the MCP call
register_functionBinds a Python callable to a tool name and descriptionThe bridge between Python wrappers and the MCP protocol
GroupChatMulti-agent conversation with speaker selectionDifferent agents can have different MCP tools registered
Streaminga_initiate_chat async interfaceMCP progress notifications interleave with conversation turns

Registering an MCP tool in AutoGen

The integration pattern is to wrap each MCP tool as an async Python function with type annotations, then register it. AutoGen infers the tool schema from the function signature — accurate type annotations and a clear docstring are critical because that's what the LLM uses to decide when and how to call the tool.

import asyncio
import os
import httpx
import json
import autogen

# Persistent HTTP session — reuse across all tool calls
_http_session: httpx.AsyncClient | None = None

def get_http_session() -> httpx.AsyncClient:
    global _http_session
    if _http_session is None or _http_session.is_closed:
        _http_session = httpx.AsyncClient(
            base_url="https://search.internal/mcp",
            headers={"Authorization": f"Bearer {os.environ['SEARCH_TOKEN']}"},
            timeout=30.0,
        )
    return _http_session

async def search_papers(query: str, max_results: int = 10) -> str:
    """Search academic papers by keyword or topic.

    Returns a JSON list of papers with title, authors, abstract, and URL.
    Use when the user asks for research papers, academic citations, or
    literature on a technical topic.

    Args:
        query: Search terms or natural language question
        max_results: Maximum number of results to return (1-50)
    """
    try:
        session = get_http_session()
        resp = await session.post("/", json={
            "jsonrpc": "2.0",
            "method": "tools/call",
            "params": {
                "name": "search_papers",
                "arguments": {"query": query, "max_results": max_results},
            },
            "id": 1,
        })
        data = resp.json()
        if "error" in data:
            return f"Search error: {data['error']['message']}"
        result = data["result"]
        if result.get("isError"):
            return f"Tool error: {result['content'][0]['text']}"
        return result["content"][0]["text"]
    except Exception as e:
        return f"Connection error: {e}"  # return string, never raise

# AutoGen agent setup
llm_config = {
    "model": "claude-sonnet-4-6",
    "api_key": os.environ["ANTHROPIC_API_KEY"],
    "api_type": "anthropic",
}

assistant = autogen.AssistantAgent(
    name="research_assistant",
    llm_config=llm_config,
    system_message="You are a research assistant that finds and summarises academic papers.",
)

user_proxy = autogen.UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config=False,
)

# Register: caller decides when to call; executor runs the function
autogen.register_function(
    search_papers,
    caller=assistant,
    executor=user_proxy,
    name="search_papers",
    description="Search academic papers by keyword or topic",
)

The function signature drives schema generation: query: str becomes a required string parameter; max_results: int = 10 becomes an optional integer with default 10. Keep parameter types simple — AutoGen's schema inference handles primitives well; complex nested types may generate malformed schemas.

Error handling: return strings, not exceptions

In AutoGen, tool functions must return strings. An uncaught exception in a tool function aborts the current agent turn and may corrupt the conversation state. An error string is injected back into the conversation as a tool result, allowing the LLM to reason about the error and retry or escalate.

async def query_database(sql: str) -> str:
    """Execute a read-only SQL query against the analytics database.

    Returns query results as a JSON array of row objects.
    Only SELECT statements are allowed.
    """
    if not sql.strip().upper().startswith("SELECT"):
        return "Error: only SELECT queries are permitted"

    try:
        result = await call_mcp_tool("run_query", {"sql": sql})
        if result.get("isError"):
            error_text = result["content"][0]["text"]
            # Return useful context so the LLM can self-correct
            return f"Query failed: {error_text}. Check that table names are correct with list_tables()."
        return result["content"][0]["text"]
    except httpx.TimeoutException:
        return "Query timed out after 30 seconds. Try a more specific WHERE clause to reduce result size."
    except httpx.ConnectError:
        return "Cannot reach database server. The service may be temporarily unavailable."
    except Exception as e:
        return f"Unexpected error: {type(e).__name__}: {e}"

Error messages should guide the LLM's next step. "Error: connection refused" tells the LLM nothing actionable. "Cannot reach database server — try the cached results tool instead" gives the LLM a concrete alternative to try. Good error messages reduce the number of retry turns and lower token cost.

Multi-agent GroupChat with multiple MCP servers

In a GroupChat, different agents can have different MCP tools registered. The GroupChatManager selects which agent to speak next based on the conversation context — the right agent with the right tools gets called for each subtask:

researcher = autogen.AssistantAgent("researcher", llm_config=llm_config,
    system_message="Search for and retrieve research papers on the given topic.")
analyst = autogen.AssistantAgent("analyst", llm_config=llm_config,
    system_message="Analyze and summarize research findings with citations.")
writer = autogen.AssistantAgent("writer", llm_config=llm_config,
    system_message="Write structured reports from research findings.")
user_proxy = autogen.UserProxyAgent("user", human_input_mode="NEVER",
    max_consecutive_auto_reply=20, code_execution_config=False)

# Each agent gets its relevant MCP tools
autogen.register_function(search_papers, caller=researcher, executor=user_proxy,
    name="search_papers", description="Search academic papers")
autogen.register_function(fetch_full_text, caller=researcher, executor=user_proxy,
    name="fetch_full_text", description="Retrieve full paper text by DOI or URL")
autogen.register_function(compute_citation_metrics, caller=analyst, executor=user_proxy,
    name="compute_citation_metrics", description="Calculate citation impact and h-index")
autogen.register_function(save_report, caller=writer, executor=user_proxy,
    name="save_report", description="Save the final report to document storage")

groupchat = autogen.GroupChat(
    agents=[user_proxy, researcher, analyst, writer],
    messages=[],
    max_round=30,
    speaker_selection_method="auto",  # manager LLM selects next speaker
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

await user_proxy.a_initiate_chat(
    manager,
    message="Research the latest developments in MCP server security and write a technical report",
)

Connection pooling across conversation turns

Each MCP tool call over HTTP involves a JSON-RPC request to the server. If you create a new httpx.AsyncClient per call, each call opens a new TCP connection — adding 50–200 ms overhead and increasing load on the MCP server. For a 30-turn AutoGen conversation with 3 tool calls per turn, that's 90 unnecessary connection establishments.

Use a module-level shared client that persists across turns:

# At module level — shared across all tool calls in the process
_mcp_clients: dict[str, httpx.AsyncClient] = {}

def get_mcp_client(server_name: str, url: str, token: str) -> httpx.AsyncClient:
    if server_name not in _mcp_clients or _mcp_clients[server_name].is_closed:
        _mcp_clients[server_name] = httpx.AsyncClient(
            base_url=url,
            headers={"Authorization": f"Bearer {token}"},
            timeout=httpx.Timeout(connect=5.0, read=30.0, write=10.0, pool=5.0),
            limits=httpx.Limits(max_keepalive_connections=5, max_connections=10),
        )
    return _mcp_clients[server_name]

async def cleanup_mcp_clients():
    """Call at shutdown — close all persistent connections."""
    for client in _mcp_clients.values():
        await client.aclose()
    _mcp_clients.clear()

HTTP keep-alive connections are reused automatically when the server supports them (all production MCP servers running behind Caddy or nginx do). The client pool ensures each unique server gets one persistent connection, not one connection per tool call.

Monitoring MCP servers in AutoGen workflows

AutoGen workflows are often long-running and unattended: a GroupChat orchestrating research might run for 20–45 minutes, making dozens of MCP tool calls. A server that fails at turn 28 of a 30-turn conversation results in a partial result with no clear failure signal — the conversation ends with the LLM saying it cannot complete the task, and the actual cause (MCP server down) is buried in logs.

Two practices reduce this risk. First, verify MCP server health at the start of each AutoGen conversation before spending any tokens:

async def run_research_conversation(topic: str):
    # Pre-flight check before a potentially expensive conversation
    for server in [
        ("search", "https://search.internal/mcp", os.environ["SEARCH_TOKEN"]),
        ("database", "https://db.internal/mcp", os.environ["DB_TOKEN"]),
    ]:
        client = get_mcp_client(*server)
        try:
            resp = await client.post("/", json={
                "jsonrpc": "2.0", "method": "initialize",
                "params": {"protocolVersion": "2025-03-26", "capabilities": {},
                           "clientInfo": {"name": "preflight", "version": "1"}}, "id": 0,
            })
            if resp.status_code != 200:
                raise RuntimeError(f"MCP server {server[0]} unhealthy: HTTP {resp.status_code}")
        except httpx.ConnectError as e:
            raise RuntimeError(f"Cannot connect to MCP server {server[0]}: {e}") from e

    await user_proxy.a_initiate_chat(manager, message=f"Research: {topic}")

Second, run AliveMCP continuous monitoring on each MCP server. AliveMCP probes every minute independently of your AutoGen code — you get an alert within 60 seconds of a server failure, long before your next scheduled conversation attempt. Monitoring MCP infrastructure separately from AutoGen application logic means failures are visible at the infrastructure layer, not buried in agent conversation logs.

Frequently asked questions

What is the difference between AutoGen v0.2 and v0.4 for MCP integration?

AutoGen v0.2 used ConversableAgent with a function_map dict to register tools. AutoGen v0.4 uses register_function with explicit caller and executor roles, and separates reasoning (AssistantAgent) from execution (UserProxyAgent). The MCP integration pattern is similar in both versions — wrap MCP calls in Python functions and register them — but v0.4's explicit executor model means you can control which agent actually runs the MCP call, which matters for security and auditing.

Can I use the MCP Python SDK directly instead of httpx?

Yes. The MCP Python SDK provides ClientSession with typed methods like session.call_tool(name, arguments). Using the SDK gives you typed responses and handles the JSON-RPC framing. The downside is managing the SDK's async context manager lifecycle alongside AutoGen's conversation lifecycle. Using httpx directly is simpler for most cases — you control the connection lifecycle explicitly and the HTTP protocol layer is straightforward for MCP over HTTP/SSE.

How do I handle MCP authentication in AutoGen when each user has different credentials?

Create separate tool function closures per user, each closing over the user's credentials: def make_search_tool(token): async def search_papers(query: str) -> str: return await call_mcp("search_papers", {"query": query}, token=token); return search_papers. Register the user-specific function for each conversation. This ensures MCP credentials are scoped to the conversation and never bleed between users. For service-level auth (the MCP server uses a service account, not user credentials), a shared client is fine.

Can AutoGen tools call multiple MCP servers in a single function?

Yes. A single Python tool function can call multiple MCP servers, combine their results, and return a merged response. This is the "aggregator tool" pattern — useful when two MCP servers together answer a question that neither can answer alone. The downside: if either server is down, the combined tool fails. Consider separate tool functions and let the LLM compose them, or implement a fallback path in the aggregator function for when one server is unavailable.

AutoGen is terminating my conversation too early. Is this an MCP issue?

Probably not. AutoGen conversation termination is controlled by max_consecutive_auto_reply on the UserProxyAgent and the is_termination_msg function. If the conversation ends before tasks complete, increase max_consecutive_auto_reply or check that your termination message check is not triggering on tool error strings. MCP errors appear as tool result strings in the conversation — if your is_termination_msg pattern matches error text like "Connection error:", the conversation will terminate on the first MCP failure.

Further reading

Know when your MCP server is down — before users do

AliveMCP probes your server's MCP endpoint every minute, detects protocol errors and transport failures, and pages you before users notice.

Start monitoring free