Platform guide · 2026-06-15 · MCP + AI Platform Integration

MCP Servers Across AI Inference Platforms

The MCP wire protocol is the same regardless of which AI inference platform calls it. initialize, tools/list, tools/call — the same JSON-RPC sequence runs under every integration. What differs across OpenAI Agents SDK, AWS Bedrock, Google Gemini, Ollama, and Groq is the adapter layer each platform requires to bridge between its native function-calling interface and MCP's JSON-RPC protocol. OpenAI Agents SDK ships native MCP support and abstracts the adapter entirely. AWS Bedrock requires a hand-written conversion loop from MCP tool definitions to Bedrock's ToolSpec format, with a second pattern for Lambda-based action groups. Google Gemini converts MCP inputSchema to FunctionDeclaration objects and — unlike the other platforms — returns multiple function calls per turn, making parallel dispatch not optional but mandatory for performance. Ollama and Groq both expose an OpenAI-compatible API, so a single adapter function handles both, but each has characteristics that affect how you integrate MCP tools: Ollama's local inference means unattended deployments can silently lose remote MCP connectivity; Groq's ultra-fast inference means MCP round-trips become a disproportionate share of total latency. All five share one more thing: none of them distinguishes an MCP server failure from an application-layer error, so external monitoring is the only mechanism that catches MCP downtime before it appears as agent misbehavior.

Five platforms at a glance

The table below captures the integration approach, the adapter each platform requires, the key performance or architecture consideration, and the silent failure mode that makes external MCP server monitoring necessary for each one.

Platform Integration approach Adapter type Key consideration Silent failure mode
OpenAI Agents SDK Native MCP support MCPServerHTTP / MCPServerStdio in Agent(mcp_servers=[...]) Open persistent connection at FastAPI lifespan; tool list fetched once and cached for connection lifetime Server down while persistent connection is live → agent sees no tools and hallucinates or loops mid-run
AWS Bedrock Manual adapter (two patterns) boto3 Converse API loop with MCP SDK, or Lambda proxy for Bedrock Agents action group Converse API requires a hand-written ToolUseBlock dispatch loop; Lambda action group can't discover tools at runtime — schema must be committed manually Bedrock errors and MCP errors surface through the same exception type — one dead MCP server makes the entire Converse loop fail without indicating which tier caused it
Google Gemini Manual adapter or Google ADK FunctionDeclaration conversion or Google ADK MCPToolset Gemini returns multiple function calls per turn — parallel asyncio.gather dispatch is mandatory, not optional; latency = max of individual calls One degraded MCP server in a parallel batch blocks the entire batch at its latency — no per-call timeout isolation
Ollama OpenAI-compatible adapter openai.AsyncOpenAI(base_url="http://localhost:11434/v1") with MCP-to-OpenAI conversion Verify tool-calling capability before building — not all Ollama models support tools reliably; inference latency dominates over MCP round-trips Local LLM inference with remote MCP servers — Ollama process restarts silently drop all MCP server connections; no process manager = no alert
Groq OpenAI-compatible adapter groq.AsyncGroq or openai.AsyncOpenAI(base_url="https://api.groq.com/openai/v1") MCP round-trips are 25–35% of total run time (vs <5% on GPT-4o) — parallel dispatch and rate-limit context budgeting are mandatory Slow MCP server eliminates Groq's speed advantage before any timeout fires — response-time degradation is invisible until the Groq rate limit hits

The shared protocol layer

Every platform in this post calls the same MCP protocol to discover and invoke tools. No matter what adapter layer wraps it, the wire protocol is identical:

// 1. Client connects and negotiates the protocol version
{ "jsonrpc": "2.0", "id": 1, "method": "initialize",
  "params": { "protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": { "name": "my-client", "version": "1.0" } } }

// 2. Server acknowledges with its capabilities
{ "jsonrpc": "2.0", "id": 1, "result": { "serverInfo": { "name": "my-tools", "version": "1.0" }, "capabilities": { "tools": {} } } }

// 3. Client fetches the tool list
{ "jsonrpc": "2.0", "id": 2, "method": "tools/list" }

// 4. Client calls a specific tool
{ "jsonrpc": "2.0", "id": 3, "method": "tools/call",
  "params": { "name": "search_docs", "arguments": { "query": "MCP monitoring" } } }

// 5. Server returns the result
{ "jsonrpc": "2.0", "id": 3, "result": { "content": [{ "type": "text", "text": "..." }], "isError": false } }

What differs across the five platforms is everything above this wire protocol: how the platform's function-calling API is structured, how tool definitions from MCP's inputSchema format must be converted to the platform's native format, how the platform dispatches multiple tool calls within a single model turn, and how errors from the MCP layer are surfaced (or absorbed) by the platform's orchestration code.

The adapter layer is the integration surface. A skill learned on one platform transfers directly: flat inputSchema designs perform better than nested schemas on every platform (the LLM fills fewer levels, and validation errors are clearer); connection pooling matters on every platform that charges per-request latency; and MCP server uptime is critical on all five regardless of their orchestration differences.

OpenAI Agents SDK — native MCP and the persistent-connection lifecycle

The OpenAI Agents SDK is the only platform in this group with native MCP support built into its core. You pass MCPServerHTTP or MCPServerStdio objects directly to the Agent constructor; the SDK handles the full protocol lifecycle without any adapter code:

import asyncio
from openai_agents import Agent, Runner
from openai_agents.mcp import MCPServerHTTP

research_agent = Agent(
    name="ResearchAgent",
    model="gpt-4o",
    instructions="Use the search and fetch tools to answer questions thoroughly.",
    mcp_servers=[
        MCPServerHTTP(
            url="https://search.internal/mcp",
            headers={"Authorization": "Bearer sk-..."},
            timeout=30,
        ),
    ],
)

async def main():
    result = await Runner.run(research_agent, "What are common MCP server failure modes?")
    print(result.final_output)

asyncio.run(main())

By default, the SDK opens an MCP connection at the start of each Runner.run() call and closes it when the run completes. For a FastAPI service handling many requests, this means one MCP handshake per request — typically 50–300 ms overhead per call. The remedy is the same as in LangChain and Pydantic AI: open the connection once at service startup using agent.run_mcp_servers():

from contextlib import asynccontextmanager
from fastapi import FastAPI
from openai_agents import Agent, Runner
from openai_agents.mcp import MCPServerHTTP

search_agent = Agent(
    name="SearchAgent",
    model="gpt-4o",
    instructions="Answer questions using the search tools.",
    mcp_servers=[MCPServerHTTP(url="https://search.internal/mcp")],
)

@asynccontextmanager
async def lifespan(app: FastAPI):
    async with search_agent.run_mcp_servers():
        yield  # Connection stays open for all requests

app = FastAPI(lifespan=lifespan)

@app.post("/ask")
async def ask(question: str):
    result = await Runner.run(search_agent, question)
    return {"answer": result.final_output}

The tool list is fetched once when run_mcp_servers() opens the connection and cached for the connection's lifetime. If the MCP server adds or removes tools while the service is running, those changes are invisible until the service restarts. Build MCP servers so that tool additions are backward-compatible — adding tools is safe, removing them breaks cached tool lists.

The Handoffs feature, which routes the conversation to specialist agents, requires a key startup consideration: each agent in a handoff graph carries its own mcp_servers list, and the SDK opens connections for each agent independently. If you're using persistent connections, open connections for all agents in the handoff graph at startup, not just the entry-point agent. An agent that is handed off to mid-conversation and has not pre-opened its MCP connection will open it on demand — adding the handshake latency at the worst possible moment.

The SDK's silent failure mode: when the MCP server goes down while a persistent connection is live, the SDK's next tools/call attempt fails mid-run. The agent receives no tool results and may hallucinate answers or enter a loop. Neither failure is distinguishable from the SDK's side without external observability — the failure looks like an application-layer issue, not an infrastructure failure.

AWS Bedrock — two adapter patterns and structured error isolation

AWS Bedrock has no native MCP support. Connecting MCP tools to Bedrock requires writing one of two adapter patterns: a Converse API loop that manages the tool-call cycle in your own code, or a Lambda proxy that bridges Bedrock Agents' action groups to an MCP server.

The Converse API pattern gives you full control. You call bedrock_client.converse(), inspect the stopReason in the response, and dispatch MCP tool calls when the model requests them:

import asyncio, boto3, json
from mcp import ClientSession
from mcp.client.sse import sse_client

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

async def run_with_mcp(prompt: str) -> str:
    async with sse_client("https://tools.internal/mcp") as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools_result = await session.list_tools()

            # Convert MCP tool definitions to Bedrock ToolSpec format
            bedrock_tools = [
                {
                    "toolSpec": {
                        "name": t.name,
                        "description": t.description,
                        "inputSchema": { "json": t.inputSchema },  # Bedrock wraps in {"json": ...}
                    }
                }
                for t in tools_result.tools
            ]

            messages = [{"role": "user", "content": [{"text": prompt}]}]

            while True:
                response = bedrock.converse(
                    modelId="anthropic.claude-sonnet-4-6-v1:0",
                    messages=messages,
                    toolConfig={"tools": bedrock_tools},
                )
                messages.append({"role": "assistant", "content": response["output"]["message"]["content"]})

                if response["stopReason"] == "end_turn":
                    # Extract final text response
                    for block in response["output"]["message"]["content"]:
                        if "text" in block:
                            return block["text"]

                elif response["stopReason"] == "tool_use":
                    # Dispatch all tool calls in parallel
                    tool_results = []
                    tool_use_blocks = [b for b in response["output"]["message"]["content"] if "toolUse" in b]

                    async def call_tool(block):
                        tool_use = block["toolUse"]
                        try:
                            result = await session.call_tool(tool_use["name"], tool_use["input"])
                            return {
                                "toolResult": {
                                    "toolUseId": tool_use["toolUseId"],
                                    "content": [{"text": result.content[0].text}],
                                    "status": "error" if result.isError else "success",
                                }
                            }
                        except Exception as e:
                            return {
                                "toolResult": {
                                    "toolUseId": tool_use["toolUseId"],
                                    "content": [{"text": f"MCP tool error: {e}"}],
                                    "status": "error",
                                }
                            }

                    tool_results = await asyncio.gather(*[call_tool(b) for b in tool_use_blocks])
                    messages.append({"role": "user", "content": tool_results})

asyncio.run(run_with_mcp("Summarize the latest MCP reliability data."))

The critical difference in the Bedrock ToolSpec format is the inputSchema wrapping: where MCP's tool definition has "inputSchema": { "type": "object", "properties": {...} }, Bedrock requires "inputSchema": { "json": { "type": "object", "properties": {...} } } — the same JSON Schema, but wrapped one level deeper. Missing the wrapper produces a Bedrock validation error that looks like a schema problem, not an adapter problem.

The Lambda proxy pattern serves a different architecture: when you're using Bedrock Agents (not the Converse API directly), Bedrock calls your Lambda function as an action group. The Lambda in turn calls the MCP server. The limitation is that Bedrock Agents' action group schema is defined statically in the Bedrock console or CloudFormation — there is no runtime tools/list discovery. When the MCP server adds or removes tools, the Bedrock action group schema must be updated manually and the agent alias republished. This eliminates one of MCP's key operational advantages (dynamic tool registration) in exchange for Bedrock Agents' orchestration capabilities.

Structured error logging is especially important in the Bedrock integration because boto3 exceptions and MCP SDK exceptions can both arise from the same converse() call path. Without explicit logging at each layer, a dead MCP server produces a Python exception that is indistinguishable from a Bedrock API error, a quota exhaustion, or a network timeout. The try/except blocks in the call_tool function above are a start; pairing them with structured log fields (error_source: "mcp" | "bedrock" | "network") enables log-based alerting that distinguishes MCP infrastructure failures from Bedrock service issues.

Google Gemini — parallel dispatch and the ADK shortcut

Google Gemini requires converting MCP tool definitions to FunctionDeclaration objects — Gemini's native tool format. The conversion is straightforward, but the dispatch pattern is not: Gemini's function-calling model frequently returns multiple function calls in a single model turn, which means your dispatch loop must call MCP tools in parallel, not sequentially:

import asyncio
import google.generativeai as genai
from mcp import ClientSession
from mcp.client.sse import sse_client

async def run_with_gemini_mcp(prompt: str) -> str:
    async with sse_client("https://tools.internal/mcp") as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools_result = await session.list_tools()

            # Convert MCP inputSchema to Gemini FunctionDeclaration format
            gemini_tools = [
                genai.protos.Tool(function_declarations=[
                    genai.protos.FunctionDeclaration(
                        name=t.name,
                        description=t.description,
                        parameters=genai.protos.Schema(
                            type=genai.protos.Type.OBJECT,
                            properties={
                                name: genai.protos.Schema(
                                    type=genai.protos.Type.STRING,
                                    description=prop.get("description", ""),
                                )
                                for name, prop in t.inputSchema.get("properties", {}).items()
                            },
                            required=t.inputSchema.get("required", []),
                        ),
                    )
                    for t in tools_result.tools
                ])
            ]

            model = genai.GenerativeModel("gemini-1.5-pro", tools=gemini_tools)
            chat = model.start_chat()
            messages = [{"role": "user", "parts": [prompt]}]

            while True:
                response = await asyncio.to_thread(chat.send_message, messages[-1]["parts"])
                candidate = response.candidates[0]

                # Check if the model made function calls
                function_calls = [
                    part.function_call
                    for part in candidate.content.parts
                    if hasattr(part, "function_call") and part.function_call.name
                ]

                if not function_calls:
                    # Final text response
                    return "".join(
                        part.text for part in candidate.content.parts if hasattr(part, "text")
                    )

                # Dispatch ALL function calls in parallel — latency = max, not sum
                async def dispatch(fc):
                    try:
                        result = await session.call_tool(fc.name, dict(fc.args))
                        return genai.protos.Part(
                            function_response=genai.protos.FunctionResponse(
                                name=fc.name,
                                response={"result": result.content[0].text if result.content else ""},
                            )
                        )
                    except Exception as e:
                        return genai.protos.Part(
                            function_response=genai.protos.FunctionResponse(
                                name=fc.name,
                                response={"error": str(e)},
                            )
                        )

                parts = await asyncio.gather(*[dispatch(fc) for fc in function_calls])
                messages.append({"role": "model", "parts": list(candidate.content.parts)})
                messages.append({"role": "user", "parts": list(parts)})

asyncio.run(run_with_gemini_mcp("Compare the uptime of MCP servers across major registries."))

The parallel dispatch pattern is not an optimization here — it is architecturally correct behavior. When Gemini returns multiple function calls in a single turn, it expects all results before generating the next response. Sequential dispatch (calling each MCP tool one at a time) works functionally but multiplies latency: if Gemini requests three tools and each takes 200 ms, sequential dispatch takes 600 ms; parallel dispatch takes 200 ms. The failure mode is the inverse: one degraded MCP server in a parallel batch blocks all results at that server's latency. A server that normally completes in 200 ms and starts timing out at 5 seconds turns a typical three-tool response from 200 ms into 5 seconds.

For teams already using Google's Agent Development Kit (ADK), the MCPToolset class provides native integration without manual adapter code:

from google.adk.agents import Agent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, SseServerParams

# ADK agent with MCP tools — no manual FunctionDeclaration conversion
research_agent = Agent(
    name="research_agent",
    model="gemini-1.5-pro",
    description="Research agent with access to external search and data tools.",
    instruction="Answer questions thoroughly using available tools.",
    tools=[
        MCPToolset(
            connection_params=SseServerParams(
                url="https://search.internal/mcp",
                headers={"Authorization": "Bearer sk-..."},
            )
        )
    ],
)

from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService

runner = Runner(agent=research_agent, app_name="research", session_service=InMemorySessionService())
# runner.run_async() handles the tool dispatch loop, including parallel calls

The ADK's MCPToolset handles FunctionDeclaration conversion and parallel dispatch internally. The trade-off vs the manual adapter is flexibility: the manual adapter lets you customize error handling, add structured logging, implement per-tool timeouts, and control the connection lifecycle independently. The ADK handles all of this for you at the cost of control over the internals.

Ollama — local inference with remote MCP tools

Ollama exposes an OpenAI-compatible REST API, which means the same adapter code that works with the OpenAI API works with Ollama by changing only the base URL. The critical first step is confirming that the Ollama model you've chosen actually supports tool calling — not all do, and silent failures are the common outcome when they don't:

import asyncio
from openai import AsyncOpenAI
from mcp import ClientSession
from mcp.client.sse import sse_client

# Ollama's OpenAI-compatible API
ollama = AsyncOpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

async def verify_tool_capability(model: str) -> bool:
    """Verify the model responds to tool calls rather than ignoring them."""
    probe_tool = [{
        "type": "function",
        "function": {
            "name": "health_check",
            "description": "Return the string 'ok'.",
            "parameters": { "type": "object", "properties": {} },
        }
    }]
    try:
        response = await ollama.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": "Call the health_check tool."}],
            tools=probe_tool,
            tool_choice="required",  # Force a tool call — models that don't support tools respond with plain text
        )
        return response.choices[0].message.tool_calls is not None
    except Exception:
        return False

# Tool-capable models as of mid-2026:
# llama3.1:8b     — reliable tool calls, 1-2s on M3 GPU
# llama3.1:70b    — excellent, requires 40+ GB VRAM
# qwen2.5:7b      — reliable, good JSON adherence
# qwen2.5:72b     — excellent, best open-source option for complex tasks
# gemma2:9b       — limited, frequent plain-text fallbacks

async def run_with_ollama_mcp(model: str, prompt: str) -> str:
    if not await verify_tool_capability(model):
        raise RuntimeError(f"Model {model} does not support tool calling — check model selection")

    async with sse_client("https://tools.internal/mcp") as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools_result = await session.list_tools()

            # Convert MCP tools to OpenAI function format — same conversion works for Groq too
            openai_tools = [
                {
                    "type": "function",
                    "function": {
                        "name": t.name,
                        "description": t.description,
                        "parameters": t.inputSchema,
                    }
                }
                for t in tools_result.tools
            ]

            messages = [{"role": "user", "content": prompt}]

            while True:
                response = await ollama.chat.completions.create(
                    model=model,
                    messages=messages,
                    tools=openai_tools,
                )
                choice = response.choices[0]
                messages.append({"role": "assistant", "content": choice.message.content,
                                  "tool_calls": choice.message.tool_calls})

                if not choice.message.tool_calls:
                    return choice.message.content or ""

                # Ollama models rarely return multiple tool calls per turn — sequential is fine
                for tool_call in choice.message.tool_calls:
                    args = json.loads(tool_call.function.arguments)
                    try:
                        result = await session.call_tool(tool_call.function.name, args)
                        tool_content = result.content[0].text if result.content else ""
                        if result.isError:
                            tool_content = f"Error: {tool_content}"
                    except Exception as e:
                        tool_content = f"MCP tool unavailable: {e}"
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": tool_content,
                    })

asyncio.run(run_with_ollama_mcp("llama3.1:8b", "What MCP servers are currently healthy?"))

The latency profile for Ollama integrations is the inverse of every other platform in this group. For cloud platforms, MCP round-trips (50–300 ms) can represent 25–35% of total agent latency because LLM inference is fast (200–500 ms for GPT-4o, <200 ms for Groq). For Ollama on consumer hardware, LLM inference is 1–30 seconds depending on the model and hardware, and MCP round-trips (50–300 ms) are typically under 10% of total latency. Optimizing MCP connection pooling matters less; optimizing model size for the task matters more.

The monitoring gap specific to Ollama is the local + remote split: Ollama runs locally (or on a local network machine) while MCP servers typically run on the public internet or a cloud VPC. When the Ollama process restarts — OS update, crash, container recycle — it does not restore any MCP client connections. Any running agent sessions lose their MCP tool access silently. In production setups without a process manager (systemd, supervisord, Docker restart policies) watching Ollama, crashes go undetected and unrecovered. AliveMCP monitors the remote MCP servers themselves; local Ollama process health needs a separate watchdog.

Groq — ultra-fast inference and the MCP round-trip budget

Groq uses the same OpenAI-compatible adapter pattern as Ollama, so the same openai_tools conversion code works unchanged. The difference is context: Groq's inference is so fast (50–200 ms for most completions) that MCP round-trips become a meaningful share of total agent latency, making parallel dispatch and connection management more important than on slower platforms:

import asyncio, json
from groq import AsyncGroq
from mcp import ClientSession
from mcp.client.sse import sse_client

groq = AsyncGroq()  # Uses GROQ_API_KEY env var

async def run_with_groq_mcp(prompt: str) -> str:
    async with sse_client("https://tools.internal/mcp") as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools_result = await session.list_tools()

            openai_tools = [
                {
                    "type": "function",
                    "function": {
                        "name": t.name,
                        "description": t.description,
                        "parameters": t.inputSchema,
                    }
                }
                for t in tools_result.tools
            ]

            messages = [{"role": "user", "content": prompt}]
            # Token budget tracking — Groq rate limits are per-minute, not per-request
            total_tokens = len(prompt.split()) * 4  # rough estimate

            while True:
                response = await groq.chat.completions.create(
                    model="llama-3.3-70b-versatile",
                    messages=messages,
                    tools=openai_tools,
                    max_tokens=4096,
                )
                choice = response.choices[0]
                total_tokens += response.usage.total_tokens

                messages.append({
                    "role": "assistant",
                    "content": choice.message.content,
                    "tool_calls": [tc.model_dump() for tc in (choice.message.tool_calls or [])],
                })

                if not choice.message.tool_calls:
                    return choice.message.content or ""

                # Parallel MCP dispatch — essential for Groq where LLM completes in <200ms
                async def call_mcp(tool_call):
                    args = json.loads(tool_call.function.arguments)
                    try:
                        result = await session.call_tool(tool_call.function.name, args)
                        content = result.content[0].text if result.content else ""
                        return {
                            "role": "tool",
                            "tool_call_id": tool_call.id,
                            "content": f"Error: {content}" if result.isError else content,
                        }
                    except Exception as e:
                        # Return error as string — never raise; uncaught exceptions break the loop
                        return {"role": "tool", "tool_call_id": tool_call.id, "content": f"Tool error: {e}"}

                tool_results = await asyncio.gather(
                    *[call_mcp(tc) for tc in choice.message.tool_calls],
                    return_exceptions=True,
                )
                for r in tool_results:
                    if isinstance(r, Exception):
                        messages.append({"role": "tool", "content": f"Dispatch error: {r}"})
                    else:
                        messages.append(r)

                # Rolling context trim at 8 turns to manage Groq's context window limits
                if len(messages) > 16:
                    messages = [messages[0]] + messages[-15:]

asyncio.run(run_with_groq_mcp("Analyze MCP server uptime trends over the past 30 days."))

Groq's rate limits are structured around tokens per minute (TPM) rather than requests per minute — at the free tier, approximately 14,400 TPM for Llama 3.3-70B-Versatile. In an agent loop that calls tools on every turn, token consumption accumulates quickly: a prompt of 200 tokens that triggers two tool calls returning 500 tokens each plus a 300-token model response costs roughly 1,500 tokens per turn. At 14,400 TPM, that's roughly 9 turns per minute. Rolling context trimming (keeping the system prompt plus the most recent N turns) prevents the context window from growing linearly with conversation length while staying within rate limits.

The Groq-specific monitoring concern is the interaction between MCP server response time and Groq's speed advantage. When a slow or intermittently degraded MCP server adds 2–5 seconds of latency per tool call, Groq's 100–200 ms inference advantage becomes irrelevant — the total agent latency is dominated by the slow MCP server. Because Groq's inference completes before the MCP tool has even started returning data, the degraded MCP server is the only bottleneck, but nothing in the Groq API or error messages identifies it as such. External monitoring that tracks per-server response time provides the signal that separates "Groq is slow today" from "an MCP dependency is degraded."

The shared failure mode: all five platforms absorb MCP failures silently

Despite their different architectures, all five platforms share a structural blind spot: when an MCP server becomes unavailable mid-run, the failure does not surface immediately as an unambiguous "MCP server down" error. Each platform absorbs the failure in its own way, and each absorption mechanism costs time and compute before the root cause becomes visible:

The pattern is the same across all five: MCP server downtime does not produce an immediate, unambiguous platform-level failure. It produces a series of partial failures that each platform's orchestration layer attempts to absorb, generating LLM token spend in the process, before eventually surfacing as a high-level agent error. The error message that arrives says something about the agent's response, not about the MCP server.

AliveMCP closes this gap by monitoring the MCP server independently of any platform. Probes run the full protocol sequence — initialize, tools/list, and actual tools/call invocations — every 60 seconds. When a probe fails, an alert fires within one check interval. The monitoring is platform-agnostic: one AliveMCP monitor per MCP server endpoint detects failures before any platform's retry cycle has a chance to waste LLM tokens on a server that isn't coming back.

Choosing a platform for a new MCP-backed project

The five platforms cover different positions on the control-vs-abstraction tradeoff. A few heuristics that emerge from their MCP-specific integration characteristics:

All five choices share the same operational requirement: external MCP server monitoring. The platform determines your inference model and orchestration architecture; the MCP server monitoring determines how quickly you know when the tools that architecture depends on become unavailable.

Further reading

Each of the five platform integrations has a dedicated deep-dive with complete code examples, adapter patterns, error handling details, and production monitoring configurations:

For MCP integration with Python agentic frameworks (LangChain, LangGraph, CrewAI, AutoGen, and Pydantic AI) rather than raw inference platforms, see MCP Servers in Python Agentic Frameworks. The protocol-level capabilities underlying all integrations — progress notifications, cancellation, binary content, sessions, and multi-server aggregation — are covered in Beyond Tool Calls: MCP's Full Protocol Surface.

Know when your MCP server is down — before users do

AliveMCP probes your server's MCP endpoint every minute, detects protocol errors and transport failures, and pages you before users notice.

Start monitoring free