Guide · MCP Protocol
MCP server sampling
MCP sampling inverts the normal tool call flow: instead of the LLM calling your server, your server asks the LLM a question — routed through the client, with user approval. This is the mechanism that enables agentic loops, self-verification, and recursive reasoning inside MCP tools. sampling/createMessage sends a message array to the client, which presents it to the LLM (and optionally to the user for approval), and returns the model's response back to your tool handler.
TL;DR
Access sampling via the server.server.createMessage() method (on the low-level server) or through the context object passed to tool handlers. Send a messages array with model preferences and optionally a system prompt. The client may show the user the request before sending it to the LLM — the human-in-the-loop approval model is a core property of sampling. Always handle the case where sampling is denied or unsupported (capabilities?.sampling check). Sampling is powerful but adds latency and depends on the client supporting it — design tools so they degrade gracefully without it.
Why sampling exists
Normal MCP tool execution: LLM → tool call → your server → result → LLM. Sampling adds a new direction: your server → LLM (via client) → result → your server. This enables patterns impossible with tools alone:
- Self-verification — generate an output, then ask the LLM to review it for errors before returning it to the user
- Multi-step reasoning — break a complex tool into sub-steps where each step's result informs the next
- Autonomous sub-tasks — delegate subtasks to the model without requiring the user to prompt for each
- Structured extraction — use the LLM to parse unstructured data into structured output inside a tool handler
Sampling is not a backdoor for the server to run arbitrary model calls invisibly — the client controls whether to approve each request. This is the human-in-the-loop guarantee.
Checking sampling capability
Not all clients support sampling. Check the capability before trying to use it — fall back gracefully if unsupported.
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
const server = new McpServer({ name: 'my-server', version: '1.0.0' });
server.tool(
'analyze-and-verify',
'Analyze data and verify the result using the model',
{ data: z.string() },
async ({ data }, context) => {
// Check if the connected client supports sampling
const samplingSupported =
context.server.getClientCapabilities()?.sampling !== undefined;
if (!samplingSupported) {
// Degrade to non-sampling path
return {
content: [{ type: 'text', text: analyzeWithoutLLM(data) }],
};
}
// Sampling path
const analysis = await performAnalysis(data);
const verified = await verifySamplingResult(context, analysis);
return {
content: [{ type: 'text', text: verified }],
};
}
);
Calling sampling/createMessage
Access the sampling API through the server's createMessage method on the underlying low-level server:
async function verifySamplingResult(context: any, analysis: string): Promise<string> {
const response = await context.server.server.createMessage({
messages: [
{
role: 'user',
content: {
type: 'text',
text: [
'Review the following analysis for accuracy and completeness.',
'If it is correct, respond with "VERIFIED: " followed by the analysis.',
'If it has errors, respond with "CORRECTION: " followed by the corrected version.',
'',
analysis,
].join('\n'),
},
},
],
maxTokens: 1024,
systemPrompt: 'You are a careful reviewer. Your job is to catch errors, not to add new information.',
});
if (response.stopReason === 'endTurn' && response.content.type === 'text') {
return response.content.text;
}
// Sampling was cut off or produced non-text — fall back to original
return analysis;
}
Sampling request parameters
The full parameter set for createMessage:
| Parameter | Type | Description |
|---|---|---|
messages | SamplingMessage[] | Required. Array of user/assistant messages forming the prompt |
maxTokens | number | Required. Maximum tokens in the model's response |
systemPrompt | string | Optional. System prompt prepended to the conversation |
modelPreferences | ModelPreferences | Optional. Hints for model selection (see below) |
stopSequences | string[] | Optional. Stop sequences for early termination |
temperature | number | Optional. Sampling temperature (client may ignore) |
includeContext | 'none' | 'thisServer' | 'allServers' | Optional. Whether to include current conversation context |
Model preferences
Sampling model preferences express hints, not requirements. The client chooses the actual model — your server cannot force a specific model. Preferences guide the client toward models suited to your task.
const response = await server.server.createMessage({
messages: [...],
maxTokens: 512,
modelPreferences: {
// Hint: prefer a model good at code analysis
hints: [
{ name: 'claude-opus' }, // prefer Opus-class
{ name: 'claude-sonnet' }, // fall back to Sonnet-class
],
// Priority weights: 0–1 (optional, all default to 0)
costPriority: 0.2, // low cost tolerance (prefer cheaper)
speedPriority: 0.5, // moderate speed preference
intelligencePriority: 0.8, // prefer a smarter model for this task
},
});
The hints array lists model name substrings in preference order. The client matches against model names it has available. If none match, it falls back to a default. The three priority weights sum to any value — the client normalizes them to select a model that best satisfies the trade-off.
The human-in-the-loop model
Sampling goes through the client's approval flow. Depending on the client implementation:
- Full approval — the user sees the sampling request and must explicitly approve before it goes to the model
- Implicit approval — the client shows a notification and proceeds automatically
- Auto-approve — the client sends the request without user interaction (less common)
Claude Desktop, for example, shows a confirmation dialog for sampling requests that include a system prompt or that request models other than the active one. Design sampling requests to be transparent — if a user would be surprised by what you're asking the model, they should see it.
Agentic loop inside a tool
A common pattern: use sampling to drive a multi-step resolution loop inside a single tool call.
server.tool(
'resolve-issue',
'Investigate and resolve a reported issue through iterative analysis',
{ issueId: z.string() },
async ({ issueId }, context) => {
const issue = await db.issues.findById(issueId);
let currentContext = `Issue: ${issue.title}\nDescription: ${issue.description}`;
let steps: string[] = [];
for (let i = 0; i < 3; i++) {
const response = await context.server.server.createMessage({
messages: [
{
role: 'user',
content: {
type: 'text',
text: `${currentContext}\n\nWhat is the next diagnostic step? Reply with STEP: <action> or RESOLVED: <solution>.`,
},
},
],
maxTokens: 256,
});
const text = response.content.type === 'text' ? response.content.text : '';
steps.push(text);
if (text.startsWith('RESOLVED:')) {
break;
}
// Gather result of the suggested step and continue
currentContext += `\nStep ${i + 1}: ${text}`;
}
return {
content: [{ type: 'text', text: steps.join('\n\n') }],
};
}
);
Cap the loop — the example above runs at most three iterations. Unbounded loops will make your tool time out or exhaust the user's model quota.
Error handling for sampling
Sampling can fail for several reasons: the client denies the request, the model returns a stop reason other than endTurn, or the client doesn't support sampling at all. Always handle these cases:
try {
const response = await context.server.server.createMessage({
messages: [...],
maxTokens: 512,
});
if (response.stopReason !== 'endTurn') {
// Truncated (maxTokens hit) or other stop condition
return { content: [{ type: 'text', text: 'Analysis incomplete — model response truncated.' }] };
}
return { content: [{ type: 'text', text: response.content.text }] };
} catch (err) {
// Client rejected the sampling request or sampling unsupported
return {
content: [{ type: 'text', text: 'Could not perform model-assisted analysis. Returning raw result.' }],
isError: true,
};
}
Further reading
- MCP tool design — naming, argument schemas, and return shapes
- MCP server prompts API — reusable prompt templates
- MCP server resources API — expose structured data to LLM clients
- MCP server error handling — isError vs protocol errors
- MCP server JSON-RPC — protocol messages and lifecycle
- MCP server testing — InMemoryTransport and unit tests
- MCP server rate limiting — protecting against excessive calls
- MCP server Streamable HTTP transport — remote deployment
- AliveMCP — uptime monitoring for HTTP-deployed MCP servers