Guide · MCP Protocol

MCP server sampling

MCP sampling inverts the normal tool call flow: instead of the LLM calling your server, your server asks the LLM a question — routed through the client, with user approval. This is the mechanism that enables agentic loops, self-verification, and recursive reasoning inside MCP tools. sampling/createMessage sends a message array to the client, which presents it to the LLM (and optionally to the user for approval), and returns the model's response back to your tool handler.

TL;DR

Access sampling via the server.server.createMessage() method (on the low-level server) or through the context object passed to tool handlers. Send a messages array with model preferences and optionally a system prompt. The client may show the user the request before sending it to the LLM — the human-in-the-loop approval model is a core property of sampling. Always handle the case where sampling is denied or unsupported (capabilities?.sampling check). Sampling is powerful but adds latency and depends on the client supporting it — design tools so they degrade gracefully without it.

Why sampling exists

Normal MCP tool execution: LLM → tool call → your server → result → LLM. Sampling adds a new direction: your server → LLM (via client) → result → your server. This enables patterns impossible with tools alone:

Sampling is not a backdoor for the server to run arbitrary model calls invisibly — the client controls whether to approve each request. This is the human-in-the-loop guarantee.

Checking sampling capability

Not all clients support sampling. Check the capability before trying to use it — fall back gracefully if unsupported.

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';

const server = new McpServer({ name: 'my-server', version: '1.0.0' });

server.tool(
  'analyze-and-verify',
  'Analyze data and verify the result using the model',
  { data: z.string() },
  async ({ data }, context) => {
    // Check if the connected client supports sampling
    const samplingSupported =
      context.server.getClientCapabilities()?.sampling !== undefined;

    if (!samplingSupported) {
      // Degrade to non-sampling path
      return {
        content: [{ type: 'text', text: analyzeWithoutLLM(data) }],
      };
    }

    // Sampling path
    const analysis = await performAnalysis(data);
    const verified = await verifySamplingResult(context, analysis);
    return {
      content: [{ type: 'text', text: verified }],
    };
  }
);

Calling sampling/createMessage

Access the sampling API through the server's createMessage method on the underlying low-level server:

async function verifySamplingResult(context: any, analysis: string): Promise<string> {
  const response = await context.server.server.createMessage({
    messages: [
      {
        role: 'user',
        content: {
          type: 'text',
          text: [
            'Review the following analysis for accuracy and completeness.',
            'If it is correct, respond with "VERIFIED: " followed by the analysis.',
            'If it has errors, respond with "CORRECTION: " followed by the corrected version.',
            '',
            analysis,
          ].join('\n'),
        },
      },
    ],
    maxTokens: 1024,
    systemPrompt: 'You are a careful reviewer. Your job is to catch errors, not to add new information.',
  });

  if (response.stopReason === 'endTurn' && response.content.type === 'text') {
    return response.content.text;
  }
  // Sampling was cut off or produced non-text — fall back to original
  return analysis;
}

Sampling request parameters

The full parameter set for createMessage:

ParameterTypeDescription
messagesSamplingMessage[]Required. Array of user/assistant messages forming the prompt
maxTokensnumberRequired. Maximum tokens in the model's response
systemPromptstringOptional. System prompt prepended to the conversation
modelPreferencesModelPreferencesOptional. Hints for model selection (see below)
stopSequencesstring[]Optional. Stop sequences for early termination
temperaturenumberOptional. Sampling temperature (client may ignore)
includeContext'none' | 'thisServer' | 'allServers'Optional. Whether to include current conversation context

Model preferences

Sampling model preferences express hints, not requirements. The client chooses the actual model — your server cannot force a specific model. Preferences guide the client toward models suited to your task.

const response = await server.server.createMessage({
  messages: [...],
  maxTokens: 512,
  modelPreferences: {
    // Hint: prefer a model good at code analysis
    hints: [
      { name: 'claude-opus' },  // prefer Opus-class
      { name: 'claude-sonnet' }, // fall back to Sonnet-class
    ],
    // Priority weights: 0–1 (optional, all default to 0)
    costPriority: 0.2,          // low cost tolerance (prefer cheaper)
    speedPriority: 0.5,         // moderate speed preference
    intelligencePriority: 0.8,  // prefer a smarter model for this task
  },
});

The hints array lists model name substrings in preference order. The client matches against model names it has available. If none match, it falls back to a default. The three priority weights sum to any value — the client normalizes them to select a model that best satisfies the trade-off.

The human-in-the-loop model

Sampling goes through the client's approval flow. Depending on the client implementation:

Claude Desktop, for example, shows a confirmation dialog for sampling requests that include a system prompt or that request models other than the active one. Design sampling requests to be transparent — if a user would be surprised by what you're asking the model, they should see it.

Agentic loop inside a tool

A common pattern: use sampling to drive a multi-step resolution loop inside a single tool call.

server.tool(
  'resolve-issue',
  'Investigate and resolve a reported issue through iterative analysis',
  { issueId: z.string() },
  async ({ issueId }, context) => {
    const issue = await db.issues.findById(issueId);
    let currentContext = `Issue: ${issue.title}\nDescription: ${issue.description}`;
    let steps: string[] = [];

    for (let i = 0; i < 3; i++) {
      const response = await context.server.server.createMessage({
        messages: [
          {
            role: 'user',
            content: {
              type: 'text',
              text: `${currentContext}\n\nWhat is the next diagnostic step? Reply with STEP: <action> or RESOLVED: <solution>.`,
            },
          },
        ],
        maxTokens: 256,
      });

      const text = response.content.type === 'text' ? response.content.text : '';
      steps.push(text);

      if (text.startsWith('RESOLVED:')) {
        break;
      }
      // Gather result of the suggested step and continue
      currentContext += `\nStep ${i + 1}: ${text}`;
    }

    return {
      content: [{ type: 'text', text: steps.join('\n\n') }],
    };
  }
);

Cap the loop — the example above runs at most three iterations. Unbounded loops will make your tool time out or exhaust the user's model quota.

Error handling for sampling

Sampling can fail for several reasons: the client denies the request, the model returns a stop reason other than endTurn, or the client doesn't support sampling at all. Always handle these cases:

try {
  const response = await context.server.server.createMessage({
    messages: [...],
    maxTokens: 512,
  });

  if (response.stopReason !== 'endTurn') {
    // Truncated (maxTokens hit) or other stop condition
    return { content: [{ type: 'text', text: 'Analysis incomplete — model response truncated.' }] };
  }

  return { content: [{ type: 'text', text: response.content.text }] };
} catch (err) {
  // Client rejected the sampling request or sampling unsupported
  return {
    content: [{ type: 'text', text: 'Could not perform model-assisted analysis. Returning raw result.' }],
    isError: true,
  };
}

Further reading