Guide · MCP Tool Implementation

MCP server web search tools

Web fetch and search tools are some of the most powerful MCP capabilities — they give LLMs access to live information, current documentation, and real-time data beyond their training cutoff. They also introduce significant risks: SSRF attacks, rate-limit violations against target sites, raw HTML flooding the LLM context, and accidental data exfiltration. This guide covers how to build fetch_url, search_web, and extract_content tools with SSRF prevention, HTML-to-text extraction, rate limiting, robots.txt compliance, and response caching.

TL;DR

Block requests to private IP ranges (10.x, 172.16-31.x, 192.168.x, localhost, and link-local addresses) before making any outbound HTTP call — this prevents SSRF attacks where a malicious prompt tricks the server into probing internal infrastructure. Use a dedicated HTTP client with explicit timeouts and response size limits. Strip HTML to clean text before returning to the LLM — raw HTML is mostly noise and wastes context tokens. Cache responses keyed on URL to avoid hammering the same site repeatedly across tool calls in the same session.

SSRF prevention: blocking private network access

Server-Side Request Forgery (SSRF) is the primary security risk in web fetch tools. A prompt like "fetch the contents of http://169.254.169.254/latest/meta-data/" (AWS instance metadata) or "fetch http://10.0.0.1/admin" (internal services) would expose your cloud infrastructure to the LLM and anyone who can craft prompts. Block these before any DNS resolution:

import dns from 'dns/promises';
import net from 'net';

const BLOCKED_CIDRS = [
  { start: ip2int('0.0.0.0'),       end: ip2int('0.255.255.255') },   // "this" network
  { start: ip2int('10.0.0.0'),      end: ip2int('10.255.255.255') },   // private
  { start: ip2int('127.0.0.0'),     end: ip2int('127.255.255.255') },  // loopback
  { start: ip2int('169.254.0.0'),   end: ip2int('169.254.255.255') },  // link-local / AWS metadata
  { start: ip2int('172.16.0.0'),    end: ip2int('172.31.255.255') },   // private
  { start: ip2int('192.168.0.0'),   end: ip2int('192.168.255.255') },  // private
  { start: ip2int('240.0.0.0'),     end: ip2int('255.255.255.255') },  // reserved / broadcast
];

function ip2int(ip: string): number {
  return ip.split('.').reduce((acc, oct) => (acc << 8) + parseInt(oct, 10), 0) >>> 0;
}

async function assertSafeUrl(rawUrl: string): Promise {
  let parsed: URL;
  try {
    parsed = new URL(rawUrl);
  } catch {
    throw new Error(`Invalid URL: ${rawUrl}`);
  }

  if (!['http:', 'https:'].includes(parsed.protocol)) {
    throw new Error(`Unsupported protocol: ${parsed.protocol} (only http/https allowed)`);
  }

  // Resolve hostname to IP and check against blocked ranges
  let addresses: string[];
  try {
    addresses = (await dns.resolve4(parsed.hostname)).concat(
      await dns.resolve6(parsed.hostname).catch(() => [])
    );
  } catch {
    throw new Error(`Could not resolve hostname: ${parsed.hostname}`);
  }

  for (const addr of addresses) {
    if (net.isIPv4(addr)) {
      const n = ip2int(addr);
      if (BLOCKED_CIDRS.some(r => n >= r.start && n <= r.end)) {
        throw new Error(`Access denied: ${parsed.hostname} resolves to a private/reserved address`);
      }
    }
  }

  return parsed;
}

DNS resolution before the request handles cases where an attacker uses a public domain that resolves to a private IP (DNS rebinding variant). Re-check the IP after the TCP connection is established if your HTTP client supports it.

The fetch_url tool

The core web fetch tool: retrieve a URL and return clean text or raw content. Always set explicit timeouts and response size limits — a slow site should not hold a tool call open indefinitely, and a large binary response should not exhaust memory:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({ name: 'web-server', version: '1.0.0' });

const HTTP_TIMEOUT_MS = 10_000;    // 10 second total timeout
const MAX_RESPONSE_BYTES = 500_000; // 500 KB max response

server.tool(
  'fetch_url',
  'Fetch a web page and return its text content',
  {
    url: z.string().url().describe('URL to fetch'),
    extract_text: z.boolean().default(true).describe('Strip HTML tags and return plain text'),
    max_chars: z.number().int().min(100).max(50_000).default(10_000),
  },
  async ({ url, extract_text, max_chars }) => {
    let safeUrl: URL;
    try {
      safeUrl = await assertSafeUrl(url);
    } catch (e) {
      return { isError: true, content: [{ type: 'text', text: `Blocked: ${(e as Error).message}` }] };
    }

    const controller = new AbortController();
    const timer = setTimeout(() => controller.abort(), HTTP_TIMEOUT_MS);

    try {
      const res = await fetch(safeUrl.toString(), {
        signal: controller.signal,
        headers: { 'User-Agent': 'AliveMCP-Bot/1.0 (+https://alivemcp.com)' },
      });
      clearTimeout(timer);

      if (!res.ok) {
        return { isError: true, content: [{ type: 'text', text: `HTTP ${res.status}: ${res.statusText}` }] };
      }

      const contentType = res.headers.get('content-type') ?? '';
      const isText = contentType.includes('text') || contentType.includes('json');
      if (!isText) {
        return { isError: true, content: [{ type: 'text', text: `Non-text content-type: ${contentType}` }] };
      }

      const buffer = await res.arrayBuffer();
      if (buffer.byteLength > MAX_RESPONSE_BYTES) {
        return { isError: true, content: [{ type: 'text', text: `Response too large: ${buffer.byteLength} bytes (limit: ${MAX_RESPONSE_BYTES})` }] };
      }

      const raw = new TextDecoder().decode(buffer);
      const text = extract_text ? htmlToText(raw).slice(0, max_chars) : raw.slice(0, max_chars);
      const truncated = (extract_text ? raw.length : raw.length) > max_chars;

      return {
        content: [{ type: 'text', text: text + (truncated ? `\n\n[truncated — ${(extract_text ? raw : raw).length} chars total]` : '') }]
      };
    } catch (e) {
      clearTimeout(timer);
      const msg = (e as Error).name === 'AbortError' ? 'Request timed out' : (e as Error).message;
      return { isError: true, content: [{ type: 'text', text: `Fetch failed: ${msg}` }] };
    }
  }
);

HTML-to-text extraction

Raw HTML is mostly boilerplate: navigation, scripts, styles, cookie banners, and ads. Passing raw HTML to an LLM wastes thousands of context tokens on noise. A simple extraction function handles the common cases without a heavy DOM library:

function htmlToText(html: string): string {
  return html
    // Remove script and style blocks entirely (including their content)
    .replace(/<script[\s\S]*?<\/script>/gi, '')
    .replace(/<style[\s\S]*?<\/style>/gi, '')
    // Convert block-level elements to newlines for readability
    .replace(/<\/(p|div|section|article|li|h[1-6]|tr|blockquote)>/gi, '\n')
    .replace(/<br\s*\/?>/gi, '\n')
    // Strip all remaining HTML tags
    .replace(/<[^>]+>/g, '')
    // Decode common HTML entities
    .replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>')
    .replace(/"/g, '"').replace(/'/g, "'").replace(/ /g, ' ')
    // Collapse whitespace
    .replace(/[ \t]+/g, ' ')
    .replace(/\n{3,}/g, '\n\n')
    .trim();
}

For richer extraction (preserving heading hierarchy, extracting tables as Markdown, following article-body heuristics), use a library like @mozilla/readability — it applies the same algorithm Firefox uses to strip navigation and ads from articles.

Response caching to avoid hammering sites

LLMs often call the same tool multiple times in one session — re-fetching the same documentation page or the same API reference. A simple in-memory TTL cache avoids redundant HTTP requests and speeds up responses:

interface CacheEntry { text: string; expires: number; }
const responseCache = new Map<string, CacheEntry>();
const CACHE_TTL_MS = 5 * 60 * 1_000; // 5 minutes

function getCached(url: string): string | null {
  const entry = responseCache.get(url);
  if (!entry || Date.now() > entry.expires) {
    responseCache.delete(url);
    return null;
  }
  return entry.text;
}

function setCache(url: string, text: string): void {
  // Evict oldest entries if cache grows too large
  if (responseCache.size > 500) {
    const oldest = [...responseCache.entries()].sort((a, b) => a[1].expires - b[1].expires)[0];
    responseCache.delete(oldest[0]);
  }
  responseCache.set(url, { text, expires: Date.now() + CACHE_TTL_MS });
}

Cache only successful responses. Never cache isError: true results — a transient 503 that gets cached means the LLM sees a stale error for the next 5 minutes even after the target site recovers.

Web search via an API

Most production MCP servers use a search API (Brave Search, SerpAPI, Tavily, or Bing Web Search) rather than scraping search results directly. Direct scraping of Google/Bing violates their ToS and their anti-bot measures evolve constantly. A search API gives consistent JSON results:

server.tool(
  'search_web',
  'Search the web and return top results with titles, URLs, and snippets',
  {
    query: z.string().min(1).max(500).describe('Search query'),
    num_results: z.number().int().min(1).max(10).default(5),
    site_restrict: z.string().optional().describe('Restrict to domain (e.g. "docs.python.org")'),
  },
  async ({ query, num_results, site_restrict }) => {
    const apiKey = process.env.BRAVE_SEARCH_API_KEY;
    if (!apiKey) return { isError: true, content: [{ type: 'text', text: 'Search API key not configured' }] };

    const q = site_restrict ? `site:${site_restrict} ${query}` : query;
    const url = `https://api.search.brave.com/res/v1/web/search?q=${encodeURIComponent(q)}&count=${num_results}`;

    const res = await fetch(url, {
      headers: { 'Accept': 'application/json', 'X-Subscription-Token': apiKey },
      signal: AbortSignal.timeout(8_000),
    });

    if (!res.ok) return { isError: true, content: [{ type: 'text', text: `Search API error: ${res.status}` }] };

    const data = await res.json() as { web?: { results: { title: string; url: string; description: string }[] } };
    const results = data.web?.results ?? [];
    if (results.length === 0) return { content: [{ type: 'text', text: 'No results found.' }] };

    const text = results.map((r, i) =>
      `${i + 1}. ${r.title}\n   ${r.url}\n   ${r.description}`
    ).join('\n\n');

    return { content: [{ type: 'text', text }] };
  }
);

Rate limiting and polite crawling

When fetching multiple pages from the same domain — following links, crawling documentation — add a per-domain rate limit. Without it, a single tool call chain can hammer a site hard enough to trigger IP bans or alert their abuse team:

const domainLastFetch = new Map<string, number>();
const MIN_FETCH_INTERVAL_MS = 1_000; // 1 request per second per domain

async function throttledFetch(url: URL): Promise<Response> {
  const host = url.hostname;
  const lastFetch = domainLastFetch.get(host) ?? 0;
  const waitMs = Math.max(0, MIN_FETCH_INTERVAL_MS - (Date.now() - lastFetch));
  if (waitMs > 0) await new Promise(r => setTimeout(r, waitMs));
  domainLastFetch.set(host, Date.now());
  return fetch(url.toString(), {
    headers: { 'User-Agent': 'AliveMCP-Bot/1.0 (+https://alivemcp.com/robots.txt)' },
    signal: AbortSignal.timeout(HTTP_TIMEOUT_MS),
  });
}

Include a real User-Agent with a contact URL. Sites that want to allow your bot can allowlist it in robots.txt or reach out directly if they see excessive traffic. Anonymous curl User-Agents are the first to get blocked.

Monitoring web-fetching MCP servers

Web fetch tools fail in two distinct layers: the MCP transport layer and the external HTTP layer. A network policy change that blocks outbound HTTP makes every fetch_url call return isError: true, but the server responds normally to the MCP protocol handshake. A rotated or expired search API key causes all search_web calls to fail silently with HTTP 401, but tools/list still returns the tool as available.

Use structured health checks that actually exercise the external dependencies: a canary tool call that fetches a known-good URL (your own homepage, a stable docs page) confirms end-to-end connectivity. AliveMCP probes your MCP endpoint every 60 seconds using the full protocol handshake, catching transport-level failures before users encounter broken web search in their agents.