Guide · AI Platform Integration
MCP servers with AWS Bedrock
AWS Bedrock offers two distinct integration points for MCP servers, and which one you pick changes the architecture significantly. The Converse API gives you a direct agentic loop — you call bedrock-runtime.converse(), handle ToolUseBlock responses, dispatch to the MCP server via tools/call, and loop. You own the orchestration but get maximum control over retry, timeout, and error handling. Bedrock Agents is Bedrock's managed orchestration layer — it handles the agent loop for you, but MCP servers plug in through Lambda-backed action groups, not natively. Each pattern suits a different team: the Converse API loop is right for teams who want full control and are already writing application code; Bedrock Agents is right for teams who want AWS to manage the agent state and don't want to maintain an orchestration loop. Both patterns require monitoring the underlying MCP servers independently of Bedrock.
TL;DR
For most use cases: use the Converse API loop. Extract MCP tools via tools/list, convert to Bedrock ToolSpec format, pass in toolConfig, loop until stopReason == "end_turn", dispatch each ToolUseBlock to the MCP server, inject ToolResultBlock back into the conversation. For managed orchestration: Bedrock Agents + Lambda action group that forwards to the MCP server's HTTP endpoint. Monitor both patterns with AliveMCP — Bedrock does not surface MCP server failures in a way that distinguishes them from tool logic errors.
Pattern 1 — Converse API with MCP tools
The Converse API provides a model-agnostic chat interface with native tool-calling support. This is the most direct way to connect MCP servers to any Bedrock model (Claude 3.x, Llama 3, Titan, Command R+, etc.).
import asyncio
import json
import boto3
from mcp import ClientSession
from mcp.client.http import http_client
BEDROCK_MODEL = "us.anthropic.claude-sonnet-4-5-20251001-v1:0"
MCP_SERVER_URL = "https://search.internal/mcp"
async def run_bedrock_agent(question: str) -> str:
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
async with http_client(MCP_SERVER_URL) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Convert MCP tools to Bedrock ToolSpec format
tools_result = await session.list_tools()
tool_config = {
"tools": [
{
"toolSpec": {
"name": tool.name,
"description": tool.description or "",
"inputSchema": {
"json": tool.inputSchema
},
}
}
for tool in tools_result.tools
]
}
messages = [{"role": "user", "content": [{"text": question}]}]
# Agentic loop
while True:
response = bedrock.converse(
modelId=BEDROCK_MODEL,
messages=messages,
toolConfig=tool_config,
inferenceConfig={"maxTokens": 4096, "temperature": 0},
)
output_message = response["output"]["message"]
messages.append(output_message)
stop_reason = response["stopReason"]
if stop_reason == "end_turn":
# Extract final text response
for block in output_message["content"]:
if "text" in block:
return block["text"]
return ""
if stop_reason == "tool_use":
# Dispatch all tool calls to MCP server
tool_results = []
for block in output_message["content"]:
if "toolUse" in block:
tool_use = block["toolUse"]
mcp_result = await session.call_tool(
tool_use["name"],
arguments=tool_use["input"],
)
# Convert MCP result content to Bedrock format
result_text = "\n".join(
c.text for c in mcp_result.content
if hasattr(c, "text")
)
tool_results.append({
"toolResult": {
"toolUseId": tool_use["toolUseId"],
"content": [{"text": result_text}],
"status": "error" if mcp_result.isError else "success",
}
})
messages.append({
"role": "user",
"content": tool_results,
})
asyncio.run(run_bedrock_agent("What MCP servers are currently down?"))
The loop continues until stopReason is "end_turn". Bedrock also returns "max_tokens" (increase maxTokens), "stop_sequence", and "content_filtered". Handle all stop reasons in production — an unhandled "max_tokens" will silently truncate the agent's response.
ToolSpec format and inputSchema conversion
Bedrock's ToolSpec uses a slightly different schema structure than MCP's inputSchema. The conversion is simple but required — Bedrock wraps the JSON Schema in an extra json key:
| Field | MCP format | Bedrock ToolSpec format |
|---|---|---|
| Tool name | tool.name | toolSpec.name |
| Description | tool.description | toolSpec.description |
| Input schema | tool.inputSchema (JSON Schema object) | toolSpec.inputSchema.json (wrapped) |
| Tool call ID | not required | toolUse.toolUseId (must echo in result) |
| Error signal | CallToolResult.isError: true | toolResult.status: "error" |
Bedrock requires that each ToolResultBlock includes the toolUseId from the corresponding ToolUseBlock. Match them by toolUseId, not by position — the model can return multiple tool calls in a single turn, and the order of results doesn't need to match the order of calls.
Pattern 2 — Bedrock Agents with Lambda action groups
Bedrock Agents manages the agent loop, memory, and orchestration. MCP servers connect through Lambda-backed action groups — the Lambda function receives the agent's tool call and proxies it to the MCP server:
# lambda_handler.py — Lambda function proxying Bedrock Agent calls to MCP server
import asyncio
import json
import os
from mcp import ClientSession
from mcp.client.http import http_client
MCP_URL = os.environ["MCP_SERVER_URL"]
def lambda_handler(event, context):
"""Bedrock Agents calls this Lambda with the tool use request."""
action_group = event.get("actionGroup", "")
function_name = event.get("function", "")
parameters = event.get("parameters", [])
# Convert Bedrock parameter list to MCP arguments dict
mcp_args = {p["name"]: p["value"] for p in parameters}
# Call MCP server synchronously (Lambda is sync, run asyncio loop)
result = asyncio.run(call_mcp(function_name, mcp_args))
return {
"actionGroup": action_group,
"function": function_name,
"functionResponse": {
"responseBody": {
"TEXT": {"body": result}
}
},
}
async def call_mcp(tool_name: str, arguments: dict) -> str:
async with http_client(MCP_URL) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
result = await session.call_tool(tool_name, arguments=arguments)
if result.isError:
raise RuntimeError(f"MCP tool error: {result.content}")
return "\n".join(c.text for c in result.content if hasattr(c, "text"))
The Lambda function's action group maps to MCP tool names via Bedrock's function schema configuration. You define each MCP tool as a function in the Bedrock Agent's action group schema — this mirrors the MCP tools/list output but is expressed in Bedrock's OpenAPI-like schema format. If the MCP server adds tools, you must update the Bedrock Agent's action group schema and redeploy — Bedrock Agents does not dynamically discover tools at runtime.
IAM configuration for Bedrock + MCP
Both patterns require IAM configuration. The Converse API pattern needs Bedrock model invocation permissions; the Lambda pattern additionally needs the Lambda execution role to call the MCP server and the Bedrock Agent execution role to invoke Lambda:
# Converse API pattern — your application role needs:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["bedrock:InvokeModel", "bedrock:Converse"],
"Resource": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-*"
}
]
}
# Bedrock Agents pattern — Lambda execution role needs:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "logs:CreateLogGroup",
"Resource": "arn:aws:logs:us-east-1:ACCOUNT_ID:*"
},
{
"Effect": "Allow",
"Action": ["logs:CreateLogStream", "logs:PutLogEvents"],
"Resource": "arn:aws:logs:us-east-1:ACCOUNT_ID:log-group:/aws/lambda/mcp-proxy:*"
}
// Add VPC permissions if MCP server is in a private VPC
]
}
For private MCP servers inside a VPC, configure the Lambda function to run inside the same VPC with a security group that allows HTTPS outbound to the MCP server. Use VPC endpoints for Bedrock to avoid public internet egress on the model call path — this matters for compliance-conscious teams and reduces latency by 20–80 ms in most regions.
Region latency and model availability
Bedrock model availability varies by AWS region. Cross-region inference (using inference profiles like us.anthropic.claude-...) routes automatically to the region with available capacity, adding 10–100 ms of variable latency. For latency-sensitive pipelines, run both the MCP server and the Bedrock call from the same region:
| Deployment scenario | Latency profile | Recommendation |
|---|---|---|
| MCP server on EC2 (same region as Bedrock) | 5–20 ms MCP round-trip | Best for latency-sensitive agents |
| MCP server on external host (different region) | 50–200 ms MCP round-trip | Acceptable; monitor P99 latency |
| MCP server on Lambda (cold start) | 300–3000 ms first call | Use provisioned concurrency or keep-warm |
| Cross-region Bedrock inference | +10–100 ms per inference call | Use for availability, not latency |
In agentic loops with 3–8 tool calls per run, MCP round-trip latency compounds — a 200 ms MCP server adds 0.6–1.6 seconds to the total run time before accounting for inference time. Use MCP server timeout configuration to avoid blocking the entire Bedrock converse loop on a single slow tool call.
Monitoring MCP servers in Bedrock pipelines
Bedrock Converse API errors and MCP server errors look similar from the application layer. A ModelTimeoutException from Bedrock and an asyncio.TimeoutError from the MCP server are both exceptions — the stack trace differs, but in a busy pipeline under load, distinguishing them requires structured logging. Add the failure source to every caught exception:
try:
response = bedrock.converse(modelId=BEDROCK_MODEL, messages=messages, ...)
except Exception as e:
logger.error("bedrock_converse_failed", error=str(e), source="bedrock")
raise
# vs.
try:
mcp_result = await session.call_tool(tool_name, arguments=tool_use["input"])
except Exception as e:
logger.error("mcp_tool_call_failed", tool=tool_name, error=str(e), source="mcp_server")
raise
AliveMCP monitors the MCP server independently of your Bedrock pipeline. When AliveMCP fires an alert on your MCP server's health, you know within 60 seconds that the MCP layer is the failure domain — before your Bedrock pipeline's error logs start accumulating. For Lambda-proxied Bedrock Agents, AliveMCP's alerts reach you before Lambda's retry budget is exhausted on a dead MCP server, preventing unnecessary invocation costs.
Frequently asked questions
Which Bedrock models support tool calling for MCP integration?
All Bedrock models that support the Converse API with tool use work with the MCP adapter pattern: Anthropic Claude 3+ models, Meta Llama 3.1+, Cohere Command R+, and Amazon Titan Text. The Converse API abstracts model differences — the same tool-calling loop code works across all supported models. Check the Bedrock documentation for the current list of models that support toolConfig in converse(); not all models in Bedrock's catalog support tool calling.
Can I use Bedrock's streaming Converse API with MCP tool calls?
Yes, via converse_stream(). With streaming, the agent's text output streams token by token, but tool calls still require the complete ToolUseBlock before dispatch — you accumulate the streamed blocks until contentBlockStop on a toolUse block, then dispatch to the MCP server. Streaming is most useful for the final text response to the user, not for the tool-call dispatch path.
How do I handle MCP server timeouts in the Bedrock Converse loop?
Wrap each session.call_tool() call in asyncio.wait_for(session.call_tool(...), timeout=10.0). On timeout, inject a ToolResultBlock with status: "error" and a message like "Tool timed out after 10 seconds" — this lets the Bedrock model decide to retry, use a different tool, or report the failure. Do not raise the timeout exception uncaught, as that terminates the entire agent loop without producing a response.
Does Bedrock Agents support streaming MCP tool responses?
Bedrock Agents action groups use a synchronous request/response model via Lambda. The Lambda function must complete within its timeout (max 15 minutes for Lambda, but Bedrock Agents has its own orchestration timeout). MCP server streaming responses (Server-Sent Events for long-running tools) are not natively supported in the Lambda proxy pattern — you'd need to buffer the full MCP streaming response in Lambda and return it as a complete string. For very long-running tools, consider returning a polling reference and having the agent call a status-check tool instead.
What's the cost difference between Converse API + MCP and Bedrock Agents + Lambda + MCP?
Converse API: you pay for model input/output tokens. No additional orchestration cost. MCP server costs are outside Bedrock. Bedrock Agents: you pay for model tokens plus a Bedrock Agents orchestration fee (priced per agent step). Lambda adds execution costs (typically negligible — milliseconds per tool call). For high-volume workloads (>10,000 agent runs/month), the Converse API + custom loop is usually cheaper. For low-volume workloads where managed orchestration reduces engineering time, Bedrock Agents' cost premium is worth it.
Further reading
- MCP servers with OpenAI Agents SDK — native MCPServerHTTP integration
- MCP servers with Google Gemini — function calling and ADK integration
- MCP server timeout configuration — preventing agent loop stalls
- MCP server SSE transport — HTTP streaming for production deployments
- AWS-hosted MCP server monitoring — CloudWatch and AliveMCP integration
- AliveMCP — continuous protocol monitoring for MCP servers