Guide · Agentic Frameworks

MCP server smolagents integration

HuggingFace smolagents is a lightweight Python agent framework designed around two agent architectures: ToolCallingAgent, which asks the LLM to output structured tool-call JSON and executes it, and CodeAgent, which asks the LLM to write Python code that calls tools as functions and then executes that code in a sandboxed interpreter. Both agent types accept any Python callable as a tool. MCP servers integrate by wrapping each MCP tool call in a Tool subclass — a smolagents object with a name, description, typed inputs, and a forward() method. The built-in MCPClient (available in smolagents ≥ 1.9) can auto-discover tools from a live MCP server and wrap them automatically, removing the need to write per-tool wrappers by hand. The critical operational considerations are the MCP connection lifecycle, error handling that keeps the agent loop alive on tool failures, and monitoring the MCP endpoints that your smolagents pipeline depends on.

TL;DR

Use smolagents.MCPClient with either a stdio server spec or an HTTP URL to auto-discover and wrap all MCP tools: with MCPClient({"url": "https://search.internal/sse", ...}) as client: tools = client.get_tools(). Pass the tools list to ToolCallingAgent(tools=tools, model=model). Keep the MCPClient context manager alive for the entire agent run. For code agents, smolagents injects the tools as importable Python functions into the sandboxed execution environment. Monitor your MCP endpoints with AliveMCP since smolagents batch pipelines can run for minutes before a dead server surfaces as a tool error.

ToolCallingAgent vs CodeAgent with MCP tools

smolagents provides two complementary agent architectures, each suited to different use cases:

Architecture	How tools are called	Best fit	MCP consideration
`ToolCallingAgent`	LLM outputs structured JSON; framework calls the matching tool	Simple tool workflows; LLMs with strong function-calling support (GPT-4o, Claude, Gemini)	Tool name + description quality is critical; LLM must choose the right MCP tool from the list
`CodeAgent`	LLM writes Python code; sandbox executes it, calling tools as functions	Complex multi-step logic; arithmetic; conditional tool use; loops over results	MCP tools become Python functions in the sandbox — the LLM can compose them with Python logic
`ManagedAgent`	Sub-agent is itself a tool callable by a parent agent	Hierarchical pipelines with specialized agents per MCP server	Each sub-agent can have its own MCPClient connected to a different MCP server

Using MCPClient for auto-discovery

smolagents ≥ 1.9 ships with MCPClient, which calls tools/list on the server at startup and wraps each tool as a smolagents Tool object automatically:

import os
from smolagents import MCPClient, ToolCallingAgent, LiteLLMModel

# HTTP/SSE transport — production MCP servers
with MCPClient({
    "url": "https://search.internal/sse",
    "headers": {"Authorization": f"Bearer {os.environ['SEARCH_TOKEN']}"},
}) as mcp_client:
    tools = mcp_client.get_tools()  # auto-discovers all MCP tools

    model = LiteLLMModel(
        model_id="anthropic/claude-sonnet-4-6",
        api_key=os.environ["ANTHROPIC_API_KEY"],
    )
    agent = ToolCallingAgent(tools=tools, model=model, max_steps=15)
    result = agent.run("Find the three most-cited papers on MCP server security published in 2026.")
    print(result)

# Multiple MCP servers: combine tool lists
with (
    MCPClient({"url": os.environ["SEARCH_URL"], "headers": {"Authorization": f"Bearer {os.environ['SEARCH_TOKEN']}"}}) as search_client,
    MCPClient({"url": os.environ["DB_URL"],     "headers": {"Authorization": f"Bearer {os.environ['DB_TOKEN']}"}})     as db_client,
):
    tools = search_client.get_tools() + db_client.get_tools()
    agent = ToolCallingAgent(tools=tools, model=model)

The with MCPClient(...) as client context manager opens the connection, performs MCP initialization, and closes the connection cleanly on exit. All agent steps must occur inside the context — tool calls made after __exit__ will raise a ConnectionError.

Manual Tool subclass for custom wrapping

For cases where you need custom error handling, argument preprocessing, or response transformation beyond what MCPClient.get_tools() provides, subclass Tool directly:

from smolagents import Tool
from mcp import ClientSession
from mcp.client.sse import sse_client
import asyncio

class MCPSearchTool(Tool):
    name = "search_papers"
    description = (
        "Search academic papers on a topic. Returns a list of papers with titles, "
        "authors, abstracts, and URLs. Use when the user asks for research papers, "
        "scientific literature, or academic citations."
    )
    inputs = {
        "query": {
            "type": "string",
            "description": "Search terms or a natural language question about the topic",
        },
        "max_results": {
            "type": "integer",
            "description": "Maximum number of papers to return (default 10, max 50)",
            "nullable": True,
        },
    }
    output_type = "string"

    def __init__(self, session: ClientSession):
        super().__init__()
        self._session = session
        self._loop = asyncio.get_event_loop()

    def forward(self, query: str, max_results: int = 10) -> str:
        try:
            result = self._loop.run_until_complete(
                self._session.call_tool(
                    "search_papers",
                    arguments={"query": query, "max_results": max_results},
                )
            )
            if result.isError:
                return f"Search error: {result.content[0].text}. Try a more specific query."
            return result.content[0].text
        except Exception as e:
            # Return error string — never raise from forward()
            return f"MCP search unavailable: {type(e).__name__}: {e}. The server may be down."

smolagents' Tool.forward() is called synchronously. If your MCP session is async (which the Python MCP SDK requires), run the coroutine with asyncio.get_event_loop().run_until_complete() or use asyncio.run() if you are not already inside an event loop. For applications that are already async (FastAPI, async task queues), use await session.call_tool(...) directly by making forward async — smolagents supports async tools via async def forward in agent types that support async execution.

CodeAgent with MCP tools

In CodeAgent, the LLM writes Python code that calls tools as functions. smolagents injects each tool into the sandbox namespace using the tool's name attribute as the function name. MCP tools work identically to any other smolagents tool in this context:

from smolagents import CodeAgent, LiteLLMModel, MCPClient

with MCPClient({"url": os.environ["DATA_MCP_URL"], ...}) as client:
    tools = client.get_tools()
    model = LiteLLMModel("anthropic/claude-sonnet-4-6")

    agent = CodeAgent(
        tools=tools,
        model=model,
        additional_authorized_imports=["json", "re"],  # allow in sandbox
        max_steps=20,
    )

    # The LLM can write code like:
    # results = search_papers("MCP server security", max_results=20)
    # parsed = json.loads(results)
    # titles = [r["title"] for r in parsed if r["year"] == 2026]
    result = agent.run(
        "Find all MCP server security papers from 2026, extract their titles and "
        "citation counts, and return a ranked list sorted by citations descending."
    )

CodeAgent's Python sandbox approach is powerful for tasks that require iteration, filtering, or arithmetic over tool results — tasks that would require many sequential tool calls in a ToolCallingAgent become a single code block. The trade-off: the sandbox executes arbitrary LLM-generated Python, so restrict additional_authorized_imports to the minimum needed and run the agent in an isolated environment (Docker container, virtual machine, or smolagents' built-in LocalPythonExecutor with restricted builtins).

ManagedAgent for hierarchical MCP workflows

ManagedAgent wraps an agent as a tool callable by a parent orchestrator agent. This enables hierarchical workflows where different specialized agents each connect to their own MCP server:

from smolagents import ManagedAgent, ToolCallingAgent, LiteLLMModel, MCPClient

model = LiteLLMModel("anthropic/claude-sonnet-4-6")

# Specialist sub-agents, each with their own MCP connection
with (
    MCPClient({"url": os.environ["SEARCH_URL"], ...}) as search_client,
    MCPClient({"url": os.environ["CODE_URL"],   ...}) as code_client,
    MCPClient({"url": os.environ["DATA_URL"],   ...}) as data_client,
):
    search_agent = ToolCallingAgent(tools=search_client.get_tools(), model=model)
    code_agent   = ToolCallingAgent(tools=code_client.get_tools(),   model=model)
    data_agent   = ToolCallingAgent(tools=data_client.get_tools(),   model=model)

    # Wrap each specialist as a managed tool for the orchestrator
    managed_search = ManagedAgent(
        agent=search_agent,
        name="web_researcher",
        description="Researches current information on a topic using a live web search MCP server. "
                    "Call with a research question; returns a detailed research summary.",
    )
    managed_code = ManagedAgent(
        agent=code_agent,
        name="code_executor",
        description="Executes code analysis and transformation tasks using a code execution MCP server. "
                    "Call with a code task description; returns the execution result.",
    )
    managed_data = ManagedAgent(
        agent=data_agent,
        name="data_analyst",
        description="Queries and analyzes structured datasets using a database MCP server. "
                    "Call with a data question; returns a formatted analysis.",
    )

    orchestrator = ToolCallingAgent(
        tools=[managed_search, managed_code, managed_data],
        model=model,
        max_steps=10,
    )
    result = orchestrator.run(
        "Research the current state of MCP server adoption, analyse the registry data, "
        "and produce a summary report with statistics."
    )

The orchestrator sees web_researcher, code_executor, and data_analyst as tools — it calls them by name with a task description, and each managed agent runs its own ReAct loop using its dedicated MCP server. This isolation means a failure in the search MCP server only affects the web_researcher sub-agent; the data and code agents continue operating.

Monitoring MCP servers in smolagents pipelines

smolagents is commonly used for batch automation: scheduled pipelines, continuous data enrichment, on-demand research workflows triggered by external events. These pipelines can run for minutes or hours. An MCP server that goes down mid-pipeline produces error strings in tool results — the agent may loop, retry, or produce a partial answer without a clear failure signal to the calling system.

Combine a preflight check at pipeline start with AliveMCP continuous monitoring:

from smolagents import MCPClient

def verify_mcp_servers(server_configs: list[dict]) -> None:
    """Raise if any MCP server is unreachable before starting the pipeline."""
    for config in server_configs:
        try:
            with MCPClient(config) as client:
                tools = client.get_tools()
                if not tools:
                    raise RuntimeError(f"MCP server at {config['url']} returned no tools")
        except Exception as e:
            raise RuntimeError(
                f"MCP server preflight failed for {config.get('url', 'stdio')}: {e}\n"
                "Check server status at https://alivemcp.com"
            ) from e

# At pipeline entry point — fail fast before any LLM spend
verify_mcp_servers([
    {"url": os.environ["SEARCH_URL"], "headers": {"Authorization": f"Bearer {os.environ['SEARCH_TOKEN']}"}},
    {"url": os.environ["DB_URL"],     "headers": {"Authorization": f"Bearer {os.environ['DB_TOKEN']}"}},
])

The preflight opens a fresh connection, calls tools/list, verifies at least one tool exists, and closes the connection — all before the main pipeline starts. AliveMCP probes independently every 60 seconds, alerting you within a minute of any failure so you can investigate before the next pipeline run encounters a dead server.

Frequently asked questions

When should I use ToolCallingAgent vs CodeAgent for MCP tool workflows?

Use ToolCallingAgent when each task maps cleanly to a single tool call or a short linear sequence of calls — search, retrieve, summarize. Use CodeAgent when the task requires iteration, filtering, sorting, or arithmetic over tool results that would need many sequential tool calls to accomplish. CodeAgent is more powerful but riskier: it executes LLM-generated Python in a sandbox. Always restrict additional_authorized_imports to what the task actually requires and run the agent in an isolated container for untrusted workloads.

Does smolagents MCPClient support stdio transport?

Yes — pass a stdio server spec dict with {"command": "node", "args": ["path/to/server.js"], "env": {...}} instead of a URL. MCPClient accepts both HTTP/SSE and stdio configurations and wraps both as Tool objects identically. Use stdio for local development and switch to HTTP/SSE for production without changing agent code.

How do I handle MCP tool errors in smolagents without stopping the agent loop?

Return error strings from forward() rather than raising exceptions. smolagents agents treat the return value of any tool call as an observation that goes into the next reasoning step. An error string like "Search server unavailable — try again or use cached data" lets the LLM reason about the failure and attempt an alternative. An uncaught exception inside forward() surfaces as an agent-level error that terminates the current run. Defensive error handling is especially important for MCP tools because network errors, protocol errors, and server-side tool errors all arrive as different exception types.

Can I use smolagents with self-hosted open-source models via HuggingFace Inference Endpoints?

Yes — smolagents is model-agnostic. Use HfApiModel with a model ID hosted on HuggingFace Inference Endpoints, or LiteLLMModel with any LiteLLM-supported provider. For MCP tool calling with open-source models, ToolCallingAgent requires a model that supports structured tool-call output (Mistral, Qwen, Llama 3.x fine-tuned for tool calling). CodeAgent works with any instruction-following model that can write Python, including models without native tool-calling support.

How does smolagents handle MCP tools that return non-text content (images, binary)?

MCP tool results can contain ImageContent and BlobContent in addition to TextContent. smolagents MCPClient.get_tools() currently extracts the TextContent portion of multi-content results for ToolCallingAgent. For tools that return image data, write a manual Tool subclass that reads ImageContent and either encodes it as base64 in the return string or saves it to disk and returns the file path — depending on whether the downstream model supports vision input.