Guide · Agentic Frameworks

MCP server Semantic Kernel integration

Microsoft Semantic Kernel (SK) is an enterprise-oriented AI SDK available in Python and .NET that structures AI applications around a Kernel — a registry of plugins, services, and memory stores through which an LLM orchestrates work. MCP servers integrate as SK plugins: you wrap each MCP tool call as a kernel_function (Python) or a method decorated with [KernelFunction] (.NET), group them into a KernelPlugin, and register the plugin with the kernel. SK's auto function calling then lets the LLM discover and invoke MCP tools during chat completion without any additional routing code. The non-obvious decisions are maintaining the MCP session lifetime across multiple kernel invocations, writing function descriptions that guide the LLM to the right tool, and monitoring the external MCP servers that your SK pipeline depends on — failures in external MCP dependencies show up as obscure LLM errors, not clear infrastructure alerts.

TL;DR

In SK Python, open a ClientSession to the MCP server, wrap each tool call as an async method decorated with @kernel_function(name=..., description=...) inside a plugin class, then kernel.add_plugin(MyMCPPlugin(session), plugin_name="search"). Enable auto function calling with OpenAIPromptExecutionSettings(function_choice_behavior=FunctionChoiceBehavior.Auto()). Keep the session alive for all kernel invocations — do not reconnect per call. Monitor the MCP endpoint with AliveMCP so you learn about failures before your SK pipeline silently produces incomplete answers.

Semantic Kernel plugin model

SK organizes capabilities as plugins — named collections of functions that the LLM can discover and invoke. The KernelPlugin abstraction maps directly to MCP: one MCP server becomes one SK plugin, and each MCP tool becomes one KernelFunction.

SK concept	MCP equivalent	How it maps
`KernelPlugin`	MCP server	One plugin per MCP server, named after the server's domain
`KernelFunction`	MCP tool	One function per MCP tool; description drives LLM tool selection
`KernelArguments`	Tool `arguments` dict	SK passes typed arguments; wrapper converts to MCP dict
Auto function calling	MCP `tools/call`	SK calls the kernel function; wrapper calls MCP over the session
`FunctionResult`	MCP `CallToolResult`	Wrapper extracts `content[0].text` → string result

Python: wrapping MCP tools as KernelFunctions

Each MCP tool becomes an async method on a plugin class. The @kernel_function decorator provides the name and description that SK exposes to the LLM's function calling API. The method body calls the MCP session:

import os
import asyncio
from mcp import ClientSession
from mcp.client.sse import sse_client
from semantic_kernel import Kernel
from semantic_kernel.functions import kernel_function
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.connectors.ai.function_choice_behavior import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings

class SearchMCPPlugin:
    """SK plugin backed by a live MCP search server."""

    def __init__(self, session: ClientSession):
        self._session = session

    @kernel_function(
        name="search_web",
        description="Search the web for recent information on a topic. "
                    "Use for current events, recent research, or facts that may have changed since the LLM training cutoff.",
    )
    async def search_web(self, query: str, max_results: int = 5) -> str:
        """
        Args:
            query: The search query — be specific for better results
            max_results: Number of results to return (1-20)
        """
        try:
            result = await self._session.call_tool(
                "search_web",
                arguments={"query": query, "max_results": max_results},
            )
            if result.isError:
                return f"Search error: {result.content[0].text}"
            return result.content[0].text
        except Exception as e:
            return f"MCP search unavailable: {e}"

    @kernel_function(
        name="get_page_content",
        description="Retrieve and extract the main text content of a webpage. "
                    "Use after search_web to get the full content of a specific result URL.",
    )
    async def get_page_content(self, url: str) -> str:
        try:
            result = await self._session.call_tool("get_page_content", arguments={"url": url})
            if result.isError:
                return f"Fetch error: {result.content[0].text}"
            return result.content[0].text
        except Exception as e:
            return f"MCP fetch unavailable: {e}"


async def run_sk_with_mcp(question: str) -> str:
    async with sse_client(
        url=os.environ["SEARCH_MCP_URL"],
        headers={"Authorization": f"Bearer {os.environ['SEARCH_MCP_TOKEN']}"},
    ) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            kernel = Kernel()
            kernel.add_service(AzureChatCompletion(
                deployment_name=os.environ["AZURE_DEPLOYMENT"],
                endpoint=os.environ["AZURE_ENDPOINT"],
                api_key=os.environ["AZURE_API_KEY"],
            ))
            kernel.add_plugin(SearchMCPPlugin(session), plugin_name="search")

            settings = OpenAIChatPromptExecutionSettings(
                function_choice_behavior=FunctionChoiceBehavior.Auto(),
                max_tokens=2048,
            )
            from semantic_kernel.contents import ChatHistory
            history = ChatHistory()
            history.add_user_message(question)

            chat_service = kernel.get_service(type=AzureChatCompletion)
            result = await chat_service.get_chat_message_content(
                chat_history=history,
                settings=settings,
                kernel=kernel,
            )
            return str(result)

The @kernel_function description is the primary signal the LLM uses to decide when and whether to call each function. Write descriptions as explicit intent declarations: "Use when the user asks for X" reduces ambiguity more than "Returns X." Docstring Args: sections are parsed by SK into parameter descriptions that appear in the tool schema sent to the LLM.

.NET: KernelFunction attribute pattern

The .NET SK SDK uses C# attributes instead of Python decorators. The MCP .NET SDK (ModelContextProtocol) provides an IMcpClient that you inject into your plugin class:

// NuGet: Microsoft.SemanticKernel, ModelContextProtocol
using Microsoft.SemanticKernel;
using ModelContextProtocol.Client;
using System.ComponentModel;

public class SearchMcpPlugin
{
    private readonly IMcpClient _mcpClient;

    public SearchMcpPlugin(IMcpClient mcpClient)
    {
        _mcpClient = mcpClient;
    }

    [KernelFunction("search_web")]
    [Description("Search the web for recent information. Use for current events or facts " +
                 "that may have changed since the model's training cutoff.")]
    public async Task<string> SearchWebAsync(
        [Description("The search query — be specific for better results")] string query,
        [Description("Maximum results to return, 1-20")] int maxResults = 5,
        CancellationToken cancellationToken = default)
    {
        try
        {
            var result = await _mcpClient.CallToolAsync(
                "search_web",
                new Dictionary<string, object?> { ["query"] = query, ["max_results"] = maxResults },
                cancellationToken: cancellationToken);

            if (result.IsError)
                return $"Search error: {result.Content.FirstOrDefault()?.Text}";

            return result.Content.FirstOrDefault()?.Text ?? string.Empty;
        }
        catch (Exception ex)
        {
            return $"MCP search unavailable: {ex.Message}";
        }
    }
}

// Registration in DI / startup:
// var mcpClient = await McpClientFactory.CreateAsync(...);
// kernel.Plugins.AddFromObject(new SearchMcpPlugin(mcpClient), "search");

The .NET SDK's cancellation token support is a significant advantage: if the SK orchestration loop is cancelled (e.g., request timeout), the cancellation propagates into the MCP call, allowing the underlying HTTP connection to be aborted cleanly rather than hanging until server timeout.

Auto function calling with ChatCompletionAgent

SK's ChatCompletionAgent provides a higher-level abstraction: it manages the chat history, handles multi-turn function calling loops, and collects final responses. Register your MCP plugin on the agent's kernel:

from semantic_kernel.agents import ChatCompletionAgent
from semantic_kernel.contents import AuthorRole

agent = ChatCompletionAgent(
    service=AzureChatCompletion(...),
    kernel=kernel,  # kernel already has SearchMCPPlugin registered
    name="ResearchAgent",
    instructions=(
        "You are a research agent with access to a live web search MCP server. "
        "Always search for current information before answering questions about "
        "recent events, releases, or statistics. Cite your sources."
    ),
    execution_settings=OpenAIChatPromptExecutionSettings(
        function_choice_behavior=FunctionChoiceBehavior.Auto(),
    ),
)

# Multi-turn conversation
history = ChatHistory()
async for response in agent.invoke(
    ChatMessageContent(role=AuthorRole.USER, content="What MCP servers are trending this week?"),
    chat_history=history,
):
    print(response.content)

FunctionChoiceBehavior.Auto() tells SK to let the LLM decide when to call functions. Alternatives are FunctionChoiceBehavior.Required() (forces at least one function call per turn) and FunctionChoiceBehavior.NoneInvoke() (suppresses function calling, for testing). For production pipelines where you want to audit every MCP call, consider FunctionChoiceBehavior.Auto(filters={"included_plugins": ["search"]}) to restrict which plugins the LLM can invoke per turn.

Session lifecycle across multiple kernel invocations

SK kernels are often created once and reused across many requests in a web application. The MCP session — an open TCP connection to the server — must last at least as long as the kernel. The safest pattern for long-lived applications is a session factory that recreates the session on connection error:

class ResilientMCPSession:
    def __init__(self, url: str, token: str):
        self._url = url
        self._token = token
        self._session: ClientSession | None = None
        self._cm = None

    async def get_session(self) -> ClientSession:
        if self._session is None:
            await self._connect()
        return self._session

    async def _connect(self):
        if self._cm:
            try:
                await self._cm.__aexit__(None, None, None)
            except Exception:
                pass
        self._cm = sse_client(
            url=self._url,
            headers={"Authorization": f"Bearer {self._token}"},
        )
        read, write = await self._cm.__aenter__()
        self._session = ClientSession(read, write)
        await self._session.__aenter__()
        await self._session.initialize()

    async def call_tool(self, tool_name: str, arguments: dict) -> any:
        for attempt in range(3):
            try:
                session = await self.get_session()
                return await session.call_tool(tool_name, arguments=arguments)
            except Exception:
                self._session = None  # force reconnect on next call
                if attempt == 2:
                    raise
                await asyncio.sleep(0.5 * (2 ** attempt))

This pattern wraps the reconnection logic so the SK plugin class stays simple. Network partitions, server restarts, and idle-connection timeouts are handled transparently — the plugin always gets a live session without re-initialization code scattered through every kernel_function.

Monitoring MCP servers in Semantic Kernel pipelines

Enterprise SK deployments often integrate with Azure Monitor, Application Insights, or other observability stacks. MCP server health is a separate concern from SK pipeline health: the SK telemetry traces show that a KernelFunction failed, but not why the MCP server returned an error or stopped responding. Monitoring the MCP endpoint directly with AliveMCP gives you infrastructure-level visibility that complements SK-level tracing:

import logging
from semantic_kernel.filters import FunctionInvocationContext

# SK function invocation filter — logs MCP call latency
async def mcp_latency_filter(context: FunctionInvocationContext, next):
    import time
    start = time.perf_counter()
    try:
        await next(context)
    except Exception as e:
        logging.error(
            "MCP function failed",
            extra={
                "plugin": context.function.plugin_name,
                "function": context.function.name,
                "error": str(e),
                "latency_ms": (time.perf_counter() - start) * 1000,
            },
        )
        raise
    else:
        logging.info(
            "MCP function succeeded",
            extra={
                "plugin": context.function.plugin_name,
                "function": context.function.name,
                "latency_ms": (time.perf_counter() - start) * 1000,
            },
        )

kernel.add_filter("function_invocation", mcp_latency_filter)

SK filters fire on every function invocation — use them to track per-tool latency and error rates. Pair this application-level telemetry with AliveMCP uptime monitoring on your MCP endpoint so you correlate application errors ("search_web returned connection error at 14:23") with infrastructure events ("search MCP server went down at 14:22").

Frequently asked questions

Does Semantic Kernel have built-in MCP support, or do I need to write wrappers?

As of SK v1.x (Python) and v1.x (.NET), there is no out-of-the-box MCPPlugin factory class — you write wrapper classes with decorated methods. The MCP SDK provides the session and protocol layer; you provide the glue. This gives you full control over error handling, logging, and schema presentation. Third-party packages (e.g., semantic-kernel-mcp community packages) may provide auto-discovery from a live MCP server's tools/list, similar to LlamaIndex's MCPToolSpec — check the SK plugin gallery for current offerings.

Can I use Semantic Kernel with local stdio MCP servers?

Yes — use the MCP Python SDK's stdio_client or the .NET SDK's StdioClientTransport to connect to a local process. The plugin wrapper code is identical; only the transport changes. Stdio is convenient during development when the MCP server runs as a subprocess. For production, switch to HTTP/SSE transport so the MCP server can be deployed independently of the SK application.

How do I expose MCP resources (not just tools) to Semantic Kernel?

SK's memory and document store abstractions (IVectorStore, ITextSearch) are the natural home for MCP resources. Write a custom ITextSearch implementation backed by MCP's resources/list and resources/read calls. Alternatively, ingest MCP resources once at startup into a local SK memory store (using TextMemoryPlugin or an IVectorStore) and treat the memory store as the retrieval layer. This avoids per-query MCP resource reads for data that changes infrequently.

Does SK's auto function calling loop infinitely if an MCP tool keeps failing?

SK's chat completion loop continues until the LLM returns a response without requesting another function call. If an MCP tool always returns an error string, the LLM may keep retrying (from the LLM's perspective, the tool is available but returning bad results). Set maximum_auto_invoke_attempts in the execution settings to cap the number of automatic function calls per turn. A value of 5–10 is typical for production; set it lower (2–3) during testing to make infinite retry loops obvious.

Is Semantic Kernel appropriate for high-throughput MCP tool calling?

SK adds minimal overhead over direct MCP calls — the kernel function dispatch is a thin Python or .NET method call. The bottleneck is the LLM inference time (typically 1–5 s per turn) and the MCP tool execution time, not SK itself. For very high throughput (hundreds of concurrent agent conversations), focus on MCP connection pooling (one persistent connection per server per process, not per conversation), rate limiting on the MCP server side, and horizontal scaling of both the SK application and the MCP servers.