Framework guide · 2026-06-25 · MCP + AI Frameworks
MCP Tools Across Five AI Frameworks
The MCP protocol is the same three steps everywhere: initialize, tools/list, tools/call. A tool registered on your MCP server has no idea whether it is being called by a LlamaIndex FunctionCallingAgent, a Semantic Kernel ChatCompletionAgent, a smolagents ToolCallingAgent, a Mastra workflow step, or a Google ADK LlmAgent. The JSON-RPC wire format is identical. What differs across frameworks is everything above the wire: how each framework discovers and wraps tools, how it manages the connection lifecycle across multiple invocations, how MCP errors surface into the framework's own error model, and — critically — what silent failure mode appears when a dependent MCP server goes down mid-workflow. This post synthesizes five deep-dives — LlamaIndex, Semantic Kernel, smolagents, Mastra, and Google ADK — into a unified picture of what differs and why all five require external monitoring to avoid silent failures.
Five frameworks at a glance
The table below captures each framework's integration entry point, its core lifecycle decision, how MCP errors propagate into framework-native handling, and the silent failure mode that makes external monitoring necessary.
| Framework | Integration entry point | Auto-discovery? | Silent failure mode |
|---|---|---|---|
| LlamaIndex | MCPToolSpec from llama-index-tools-mcp |
Yes — to_tool_list_async() calls tools/list |
Unattended pipeline produces partial answer; error is buried in ReAct observation log |
| Semantic Kernel | Manual @kernel_function wrappers in a plugin class |
No built-in auto-discovery; community packages may exist | LLM loops retrying failed tool; maximum_auto_invoke_attempts must be set |
| smolagents | MCPClient.get_tools() (≥ v1.9) |
Yes — context manager calls tools/list |
Batch pipeline runs for minutes before dead-server error string appears in last step |
| Mastra | MCPConfiguration + getToolset() |
Yes — each getToolset() calls tools/list |
Workflow step returns opaque error; retry of step reconnects but prior work is lost |
| Google ADK | FunctionTool(async_fn); built-in MCPToolset also available |
Manual by default; MCPToolset provides auto-discovery |
Vertex AI logs show generic FunctionTool error; root cause requires log correlation |
LlamaIndex: MCPToolSpec and the session pool pattern
LlamaIndex's MCP integration lives in the llama-index-tools-mcp package, which provides MCPToolSpec. The spec wraps a live ClientSession and translates each discovered MCP tool into a LlamaIndex AsyncTool with a ToolMetadata object derived from the MCP tool's JSON Schema. You pass those tools to FunctionCallingAgent.from_tools() or ReActAgent.from_tools(), and the framework routes LLM tool calls to the appropriate MCP server automatically.
The critical lifecycle rule: pass one persistent ClientSession to MCPToolSpec, keep it alive for the entire agent run, and do not reconnect per call. A ReActAgent may execute 10–20 tool calls over several reasoning steps. Reconnecting per call adds 50–200 ms of handshake overhead per invocation and multiplies load on the MCP server. The solution is a session pool:
from mcp import ClientSession
from mcp.client.sse import sse_client
from llama_index.tools.mcp import MCPToolSpec
from llama_index.core.agent import FunctionCallingAgent
from llama_index.llms.anthropic import Anthropic
async def build_agent_with_pool(pool: MCPSessionPool) -> FunctionCallingAgent:
session = await pool.acquire(
os.environ["SEARCH_URL"], os.environ["SEARCH_TOKEN"]
)
# preflight — fail fast before any LLM cost
result = await asyncio.wait_for(session.list_tools(), timeout=5.0)
if not result.tools:
raise RuntimeError("MCP server returned empty tool list")
spec = MCPToolSpec(session=session)
tools = await spec.to_tool_list_async()
return FunctionCallingAgent.from_tools(
tools=tools,
llm=Anthropic(model="claude-sonnet-4-6"),
system_prompt="You are a research assistant with access to live search.",
)
LlamaIndex also exposes MCP beyond tools: session.list_resources() and session.read_resource(uri) let you ingest MCP resources directly as Document objects for a VectorStoreIndex. The hybrid pattern — vector index for corpus retrieval plus live MCP tools for real-time execution — gives LlamaIndex pipelines a combination that no static embedding approach can match. Combine them by wrapping the index as a QueryEngineTool and including it alongside your MCP AsyncTool objects in FunctionCallingAgent.from_tools().
Error handling in LlamaIndex is explicit: wrap each tool's acall() to catch McpError (protocol errors) and httpx.ConnectError (transport failures) and return error strings. Both FunctionCallingAgent and ReActAgent inject tool results — including error strings — back into the LLM context as observations, letting the model reason about the failure and attempt an alternative path. An uncaught exception propagates out of the agent loop and discards all prior reasoning.
Silent failure mode: LlamaIndex pipelines often run unattended on a schedule — nightly report generation, continuous document ingestion. A dead MCP server mid-run causes the agent loop to exhaust retries, producing a partial answer with no clear error signal. The calling application sees a valid-looking but incomplete response. The only reliable defense is a preflight tools/list check at pipeline start plus external monitoring via AliveMCP to catch servers that go down between runs.
Semantic Kernel: manual wrappers, maximum control
Semantic Kernel takes the opposite approach from LlamaIndex: there is no built-in MCPToolSpec-equivalent that auto-discovers tools from a live server. Instead, you write a plugin class with methods decorated with @kernel_function (Python) or [KernelFunction] (.NET), one method per MCP tool you want to expose. You then call session.call_tool() explicitly inside each method body.
from semantic_kernel.functions import kernel_function
from semantic_kernel import Kernel
from mcp import ClientSession
class SearchMCPPlugin:
def __init__(self, session: ClientSession):
self._session = session
@kernel_function(
name="search_web",
description="Search the web for recent information on a topic. "
"Use for current events or facts that may have changed since the training cutoff.",
)
async def search_web(self, query: str, max_results: int = 5) -> str:
try:
result = await self._session.call_tool(
"search_web", arguments={"query": query, "max_results": max_results}
)
if result.isError:
return f"Search error: {result.content[0].text}"
return result.content[0].text
except Exception as e:
return f"MCP search unavailable: {e}"
kernel = Kernel()
kernel.add_plugin(SearchMCPPlugin(session), plugin_name="search")
The @kernel_function description is the primary signal the LLM uses to decide when and whether to call each function — write it as an intent declaration ("Use when the user asks for X") rather than a capability description ("Returns X"). Docstring Args: sections are parsed by SK into parameter descriptions in the tool schema.
The .NET SDK uses [KernelFunction] attributes and [Description(...)] on method parameters, with an important advantage: the CancellationToken parameter propagates request cancellations into the MCP call, allowing the HTTP connection to abort cleanly on timeout rather than hanging until server timeout.
Enable auto function calling with FunctionChoiceBehavior.Auto() in your OpenAIChatPromptExecutionSettings. Without it, SK will not invoke plugin functions automatically — the LLM will see the tool schemas but SK will not dispatch calls. For production pipelines, set maximum_auto_invoke_attempts to cap the loop at 5–10 calls per turn; without this limit, a tool that persistently returns error strings can cause the LLM to keep retrying indefinitely.
Session lifetime requires a resilient wrapper when using SK in long-lived web applications. The MCP session must outlast the kernel's lifetime, but network partitions and idle-connection timeouts will occasionally break it. A ResilientMCPSession class that reconnects on failure and retries with exponential backoff keeps plugin methods simple — they always get a live session without reconnection logic scattered through every kernel_function.
Silent failure mode: Unlike LlamaIndex where tool failures surface as error strings in the ReAct observation log, SK's ChatCompletionAgent loop can cause the LLM to retry a broken tool repeatedly until maximum_auto_invoke_attempts is exhausted. SK's FunctionInvocationContext filter gives you latency metrics and error counts at the plugin level — pair this application telemetry with AliveMCP infrastructure monitoring to correlate "search_web returned error at 14:23" with "search MCP server went down at 14:22."
smolagents: MCPClient, CodeAgent, and hierarchical MCP
smolagents (≥ v1.9) provides MCPClient, which handles auto-discovery similarly to LlamaIndex's MCPToolSpec: it calls tools/list at startup and wraps each tool as a smolagents Tool object. The integration is clean:
from smolagents import MCPClient, ToolCallingAgent, LiteLLMModel
with MCPClient({
"url": "https://search.internal/sse",
"headers": {"Authorization": f"Bearer {os.environ['SEARCH_TOKEN']}"},
}) as mcp_client:
tools = mcp_client.get_tools()
agent = ToolCallingAgent(
tools=tools,
model=LiteLLMModel("anthropic/claude-sonnet-4-6"),
max_steps=15,
)
result = agent.run("Find the latest MCP server security research.")
The with MCPClient(...) as client context manager is the lifecycle boundary — all agent steps must occur inside it. Tool calls made after __exit__ raise ConnectionError. For multiple MCP servers, combine tool lists with +: tools = search_client.get_tools() + db_client.get_tools().
smolagents is the only framework in this comparison that natively supports two fundamentally different execution modes for tool calling. ToolCallingAgent asks the LLM to output structured tool-call JSON and the framework dispatches it — the standard pattern shared with all other frameworks here. CodeAgent asks the LLM to write Python code that calls tools as functions, then executes that code in a sandboxed interpreter. MCP tools work in CodeAgent identically to any other tool — they become callable Python functions in the sandbox namespace.
The CodeAgent trade-off is significant: tasks that would require 10 sequential tool calls in ToolCallingAgent become a single code block with a loop. Filtering, sorting, and arithmetic over tool results are trivial in code. The cost is sandbox security — restrict additional_authorized_imports to the minimum required and run in an isolated container for any untrusted workload.
smolagents' ManagedAgent wraps an agent as a tool callable by a parent orchestrator. This enables hierarchical architectures where different specialists each connect to their own MCP server:
managed_search = ManagedAgent(
agent=ToolCallingAgent(tools=search_client.get_tools(), model=model),
name="web_researcher",
description="Researches current information using a live web search MCP server.",
)
managed_data = ManagedAgent(
agent=ToolCallingAgent(tools=db_client.get_tools(), model=model),
name="data_analyst",
description="Queries structured datasets via a database MCP server.",
)
orchestrator = ToolCallingAgent(
tools=[managed_search, managed_data], model=model, max_steps=10
)
The isolation is the key benefit: a failure in the search MCP server only affects the web_researcher sub-agent. The data agent continues operating. The orchestrator receives the error string as the sub-agent's return value and can route around it.
Silent failure mode: smolagents is commonly used for batch automation — scheduled pipelines, continuous data enrichment. These pipelines can run for minutes. An MCP server that goes down mid-pipeline produces error strings in tool results, but the agent may loop, retry, or produce a partial answer without any clear failure signal reaching the calling system. The preflight check — opening a fresh MCPClient, calling get_tools(), verifying at least one tool exists, and closing — takes under a second and catches dead servers before any LLM spend.
Mastra: TypeScript-native MCPConfiguration and per-step toolsets
Mastra is the only TypeScript framework in this comparison and it treats MCP as a first-class native protocol, not a bolt-on adapter. The central abstraction is MCPConfiguration — a declarative registry of MCP server connection specs that Mastra uses to connect, discover tools, and handle transport. You do not write connection management code; you describe endpoints:
import { MCPConfiguration } from "@mastra/mcp";
import { createAgent } from "@mastra/core";
const mcpConfig = new MCPConfiguration({
servers: {
search: {
url: new URL(process.env.SEARCH_MCP_URL!),
requestInit: { headers: { Authorization: `Bearer ${process.env.SEARCH_MCP_TOKEN}` } },
},
database: {
command: "node",
args: ["./db-server/build/index.js"],
env: { DATABASE_URL: process.env.DATABASE_URL! },
},
},
});
// getToolset() connects to all servers and calls tools/list on each
const toolset = await mcpConfig.getToolset();
const agent = createAgent({
name: "ResearchAgent",
model: { provider: "ANTHROPIC", name: "claude-sonnet-4-6", toolChoice: "auto" },
tools: { toolsets: [toolset] },
instructions: "...",
});
Tool names are automatically prefixed with the server key: a search_web tool from the search server becomes search_search_web in Mastra. This prevents name collisions when multiple servers expose tools with identical names and helps the LLM understand which server it is targeting.
The architectural insight in Mastra's workflow integration is the separation between MCPConfiguration (static spec, created once at startup) and Toolset (live connection, obtained per agent or per step). In workflows, calling await mcpConfig.getToolset() inside each step's execute function scopes the MCP connection to that step's lifetime:
const researchWorkflow = createWorkflow({ id: "research", ... })
.addStep(createStep({
id: "gather-sources",
async execute({ inputData }) {
// Fresh connection scoped to this step
const toolset = await mcpConfig.getToolset();
const searchTool = toolset.getTools().find(t => t.name === "search_search_web");
const result = await searchTool.execute({ query: inputData.topic });
return { rawResults: result.text };
},
}))
Per-step toolsets are safer than a shared toolset across all steps: if a step fails and is retried, it gets a fresh connection rather than inheriting a potentially broken one from a prior attempt. The cost is slightly more connection overhead, which is acceptable for workflows where each step is individually significant.
Mastra's agent hooks (onToolCall, onToolResult, onToolError) fire on every MCP tool call with a startTime timestamp, enabling precise latency measurement. Emit these to any metrics system (Datadog, OpenTelemetry, Prometheus) without modifying the tool implementation itself.
Silent failure mode: Mastra agents are often deployed as HTTP API servers. When an MCP server goes down, tool calls fail and the agent's final text typically says something like "I was unable to retrieve that information" — indistinguishable from a query the model simply chose not to answer with tools. The startup preflight — calling getToolset(), checking that at least one tool was discovered, then disconnecting — catches this class of failure before the server starts accepting requests. Set process.exit(1) on preflight failure to prevent a broken deployment from going live.
Google ADK: FunctionTool bridges and Vertex AI deployment
Google ADK wraps any Python async callable as a FunctionTool and presents it to Gemini's native function-calling API. The function's docstring and type annotations drive the schema that appears in the tool-use context — Gemini reads these to decide when to invoke the tool. Write docstrings as intent declarations with explicit Args: sections:
from google.adk.tools import FunctionTool
from mcp import ClientSession
_search_session: ClientSession | None = None
async def search_web(query: str, max_results: int = 5) -> dict:
"""Search the web for recent information on a topic.
Use when the user asks about current events, recent releases, or
information that may have changed since the model's knowledge cutoff.
Args:
query: The search query. Narrow queries return better results.
max_results: Number of results to return (1-20, default 5).
Returns:
Dict with "results" (list of {title, url, snippet}) and "total_found".
"""
try:
session = await get_search_session()
result = await session.call_tool("search_web", {"query": query, "max_results": max_results})
if result.isError:
return {"error": result.content[0].text, "results": []}
return json.loads(result.content[0].text)
except Exception as e:
return {"error": f"MCP unavailable: {e}", "results": []}
search_tool = FunctionTool(search_web)
Returning dicts rather than strings is the ADK-specific idiom: Gemini can reason about dict fields directly, and including an "error" key gives the model a structured signal for failure cases that is more reliable than parsing error strings heuristically.
ADK's multi-agent primitives compose MCP-backed sub-agents naturally. SequentialAgent chains agents in order, passing session state between them. ParallelAgent runs sub-agents concurrently — useful for fanning out to multiple MCP servers simultaneously and halving wall-clock time for data-gathering phases:
parallel_gather = ParallelAgent(
name="ParallelGather",
sub_agents=[
LlmAgent(name="WebSearcher", tools=[search_tool, page_tool], ...),
LlmAgent(name="DataAnalyst", tools=[db_query_tool, db_schema_tool], ...),
],
)
synthesis_agent = LlmAgent(
name="Synthesizer",
model=Gemini(model="gemini-2.0-pro"),
tools=[], # works from session state populated by parallel step
...
)
research_pipeline = SequentialAgent(
name="ResearchPipeline",
sub_agents=[parallel_gather, synthesis_agent],
)
A critical subtlety with ParallelAgent: if an MCP tool call times out and raises asyncio.TimeoutError, catching it in the wrapper function returns an error dict and lets the other sub-agents continue. Letting the exception propagate cancels the entire parallel execution — all sub-agents are cancelled, not just the one that timed out. Always handle timeouts explicitly in MCP wrapper functions used in parallel contexts.
ADK agents deploy to Vertex AI Agent Engine with the AdkApp wrapper. MCP connections are established at runtime inside the deployed container — credentials come from environment variables or Vertex AI Secret Manager. For multi-tenant deployments, do not share module-level sessions across users with different permission levels. Per-session MCP connections are the correct pattern when users have distinct MCP server authorization.
ADK v1.x also ships with a built-in MCPToolset (in google.adk.tools.mcp_tool) that auto-discovers tools similarly to LlamaIndex's MCPToolSpec. The manual FunctionTool wrapper pattern described above gives you full control over error handling and argument mapping; MCPToolset reduces boilerplate at the cost of less control per tool.
Silent failure mode: ADK agents on Vertex AI produce generic FunctionTool error log entries when MCP tool calls fail. Determining whether the root cause is "MCP server down" vs. "bad arguments" vs. "server logic error" requires correlating ADK application logs with infrastructure events. AliveMCP monitors your MCP endpoint independently from outside your VPC, producing a minute-resolution alert that makes log correlation straightforward: "AliveMCP alert fired at 14:22" + "ADK FunctionTool errors started at 14:23" = MCP server outage, not a code bug.
The five silent failure modes side by side
Every framework in this comparison shares a common structural problem: when an MCP server goes down, the framework-level error handling produces a response that looks like a partial or degraded answer rather than an infrastructure alert. The failure is silent from the calling system's perspective.
| Framework | What the application sees when MCP server goes down | How long before discovery |
|---|---|---|
| LlamaIndex | Agent returns a partial answer; error string in ReAct observation log (verbose mode only) | At the end of the pipeline run — could be minutes for complex RAG pipelines |
| Semantic Kernel | LLM retries the tool up to maximum_auto_invoke_attempts times, then returns a degraded response |
After N retries × LLM inference time — could be 30–120 seconds of wasted spend |
| smolagents | Error string in tool result; agent may loop, produce a partial answer, or silently omit data | At the end of the pipeline — batch jobs often run for minutes before failing |
| Mastra | Workflow step fails; retry starts with fresh connection (good) but prior step work is lost (bad) | On the failing step — but identifying root cause requires tracing through workflow logs |
| Google ADK | Generic FunctionTool error in Cloud Logging; final agent response says data was unavailable |
Immediately logged, but identifying MCP vs. application error requires log correlation |
The common thread: all five frameworks treat MCP tool failures as agent-level events, not infrastructure events. The agent sees an error string or exception and handles it according to the framework's error model. None of them alert you at the infrastructure level that a dependent MCP server is unreachable. For that, you need monitoring that runs outside the agent process — probing the MCP endpoint independently on a fixed interval.
The preflight pattern across all five frameworks
Every framework benefits from the same pattern: before spending any LLM tokens, verify that each dependent MCP server is reachable and returning a non-empty tool list. The implementation differs by framework, but the logic is identical:
- Open a connection to the MCP server
- Call
tools/list(or the framework's equivalent) - Assert that at least one tool was returned
- Close the probe connection (or keep it as the production connection)
- Fail fast with a clear error if any check fails
In LlamaIndex, wrap this as an async preflight_check(session, server_name) that calls await asyncio.wait_for(session.list_tools(), timeout=5.0) and raises RuntimeError on failure. In smolagents, open a temporary MCPClient, call get_tools(), and close it. In Mastra, call await mcpConfig.getToolset(), check tools.length === 0, and call await config.disconnect(). In ADK, send a raw initialize JSON-RPC request and check the HTTP response code.
The preflight catches "server completely unreachable" failures before any LLM spend. It does not catch failures that occur mid-pipeline — for those, external monitoring is the only defense. AliveMCP probes your MCP endpoint every 60 seconds from outside your deployment, alerting you within a minute of any protocol-level failure. That alert reaches you long before your next scheduled pipeline run or your users notice degraded responses.
Choosing the right framework for your MCP use case
The frameworks in this post overlap in capability but differ in emphasis. Here is a decision guide:
| If you need… | Reach for… | Why |
|---|---|---|
| Python RAG + live MCP tools, minimal boilerplate | LlamaIndex + MCPToolSpec | Auto-discovery, MCP resources as Document objects, hybrid RAG + tool agents |
| Enterprise .NET or Python, Azure OpenAI, fine-grained tool control | Semantic Kernel | Python + .NET SDKs, KernelPlugin model, Azure AI integration, cancellation tokens (.NET) |
| HuggingFace models, open-source stack, code-generation agents | smolagents | MCPClient auto-discovery, CodeAgent for complex iteration, ManagedAgent for hierarchical workflows |
| TypeScript-first, workflow state machines, production TypeScript APIs | Mastra | First-class MCPConfiguration, per-step toolset scoping, TypeScript type safety end-to-end |
| Vertex AI deployment, Gemini models, parallel multi-agent pipelines | Google ADK | SequentialAgent + ParallelAgent, Vertex AI Agent Engine, Gemini native function calling |
If you are already invested in one of these frameworks, the integration pattern is well-defined — the choice has been made. If you are starting fresh, prioritize the language and deployment target: LlamaIndex and smolagents are Python-native; Mastra is TypeScript-native; Semantic Kernel spans both; ADK is Python with Vertex AI as the production target. Secondary considerations are auto-discovery (Mastra, LlamaIndex, smolagents all have it; SK requires manual wrappers), and whether you need the CodeAgent pattern (smolagents only).
Whichever framework you choose, the MCP connection lifecycle and monitoring requirements are the same. One persistent session per MCP server per process. Preflight checks before LLM spend. External protocol monitoring via AliveMCP to catch mid-pipeline failures that your agent's error handling cannot surface as infrastructure alerts.
Further reading
- MCP server LlamaIndex integration — MCPToolSpec, FunctionCallingAgent, and resource indexing
- MCP server Semantic Kernel integration — KernelPlugin, FunctionChoiceBehavior, Python and .NET
- MCP server smolagents integration — MCPClient, ToolCallingAgent vs CodeAgent, ManagedAgent
- MCP server Mastra integration — MCPConfiguration, per-step toolsets, workflow orchestration
- MCP server Google ADK integration — FunctionTool, SequentialAgent, Vertex AI deployment
- MCP in Python Agentic Frameworks: LangChain, LangGraph, CrewAI, AutoGen, Pydantic AI
- MCP server health check — designing a robust /health endpoint
- AliveMCP — continuous protocol monitoring for MCP servers