Guide · MCP Tool Implementation
MCP server web search tools
Web fetch and search tools are some of the most powerful MCP capabilities — they give LLMs access to live information, current documentation, and real-time data beyond their training cutoff. They also introduce significant risks: SSRF attacks, rate-limit violations against target sites, raw HTML flooding the LLM context, and accidental data exfiltration. This guide covers how to build fetch_url, search_web, and extract_content tools with SSRF prevention, HTML-to-text extraction, rate limiting, robots.txt compliance, and response caching.
TL;DR
Block requests to private IP ranges (10.x, 172.16-31.x, 192.168.x, localhost, and link-local addresses) before making any outbound HTTP call — this prevents SSRF attacks where a malicious prompt tricks the server into probing internal infrastructure. Use a dedicated HTTP client with explicit timeouts and response size limits. Strip HTML to clean text before returning to the LLM — raw HTML is mostly noise and wastes context tokens. Cache responses keyed on URL to avoid hammering the same site repeatedly across tool calls in the same session.
SSRF prevention: blocking private network access
Server-Side Request Forgery (SSRF) is the primary security risk in web fetch tools. A prompt like "fetch the contents of http://169.254.169.254/latest/meta-data/" (AWS instance metadata) or "fetch http://10.0.0.1/admin" (internal services) would expose your cloud infrastructure to the LLM and anyone who can craft prompts. Block these before any DNS resolution:
import dns from 'dns/promises';
import net from 'net';
const BLOCKED_CIDRS = [
{ start: ip2int('0.0.0.0'), end: ip2int('0.255.255.255') }, // "this" network
{ start: ip2int('10.0.0.0'), end: ip2int('10.255.255.255') }, // private
{ start: ip2int('127.0.0.0'), end: ip2int('127.255.255.255') }, // loopback
{ start: ip2int('169.254.0.0'), end: ip2int('169.254.255.255') }, // link-local / AWS metadata
{ start: ip2int('172.16.0.0'), end: ip2int('172.31.255.255') }, // private
{ start: ip2int('192.168.0.0'), end: ip2int('192.168.255.255') }, // private
{ start: ip2int('240.0.0.0'), end: ip2int('255.255.255.255') }, // reserved / broadcast
];
function ip2int(ip: string): number {
return ip.split('.').reduce((acc, oct) => (acc << 8) + parseInt(oct, 10), 0) >>> 0;
}
async function assertSafeUrl(rawUrl: string): Promise {
let parsed: URL;
try {
parsed = new URL(rawUrl);
} catch {
throw new Error(`Invalid URL: ${rawUrl}`);
}
if (!['http:', 'https:'].includes(parsed.protocol)) {
throw new Error(`Unsupported protocol: ${parsed.protocol} (only http/https allowed)`);
}
// Resolve hostname to IP and check against blocked ranges
let addresses: string[];
try {
addresses = (await dns.resolve4(parsed.hostname)).concat(
await dns.resolve6(parsed.hostname).catch(() => [])
);
} catch {
throw new Error(`Could not resolve hostname: ${parsed.hostname}`);
}
for (const addr of addresses) {
if (net.isIPv4(addr)) {
const n = ip2int(addr);
if (BLOCKED_CIDRS.some(r => n >= r.start && n <= r.end)) {
throw new Error(`Access denied: ${parsed.hostname} resolves to a private/reserved address`);
}
}
}
return parsed;
}
DNS resolution before the request handles cases where an attacker uses a public domain that resolves to a private IP (DNS rebinding variant). Re-check the IP after the TCP connection is established if your HTTP client supports it.
The fetch_url tool
The core web fetch tool: retrieve a URL and return clean text or raw content. Always set explicit timeouts and response size limits — a slow site should not hold a tool call open indefinitely, and a large binary response should not exhaust memory:
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const server = new McpServer({ name: 'web-server', version: '1.0.0' });
const HTTP_TIMEOUT_MS = 10_000; // 10 second total timeout
const MAX_RESPONSE_BYTES = 500_000; // 500 KB max response
server.tool(
'fetch_url',
'Fetch a web page and return its text content',
{
url: z.string().url().describe('URL to fetch'),
extract_text: z.boolean().default(true).describe('Strip HTML tags and return plain text'),
max_chars: z.number().int().min(100).max(50_000).default(10_000),
},
async ({ url, extract_text, max_chars }) => {
let safeUrl: URL;
try {
safeUrl = await assertSafeUrl(url);
} catch (e) {
return { isError: true, content: [{ type: 'text', text: `Blocked: ${(e as Error).message}` }] };
}
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), HTTP_TIMEOUT_MS);
try {
const res = await fetch(safeUrl.toString(), {
signal: controller.signal,
headers: { 'User-Agent': 'AliveMCP-Bot/1.0 (+https://alivemcp.com)' },
});
clearTimeout(timer);
if (!res.ok) {
return { isError: true, content: [{ type: 'text', text: `HTTP ${res.status}: ${res.statusText}` }] };
}
const contentType = res.headers.get('content-type') ?? '';
const isText = contentType.includes('text') || contentType.includes('json');
if (!isText) {
return { isError: true, content: [{ type: 'text', text: `Non-text content-type: ${contentType}` }] };
}
const buffer = await res.arrayBuffer();
if (buffer.byteLength > MAX_RESPONSE_BYTES) {
return { isError: true, content: [{ type: 'text', text: `Response too large: ${buffer.byteLength} bytes (limit: ${MAX_RESPONSE_BYTES})` }] };
}
const raw = new TextDecoder().decode(buffer);
const text = extract_text ? htmlToText(raw).slice(0, max_chars) : raw.slice(0, max_chars);
const truncated = (extract_text ? raw.length : raw.length) > max_chars;
return {
content: [{ type: 'text', text: text + (truncated ? `\n\n[truncated — ${(extract_text ? raw : raw).length} chars total]` : '') }]
};
} catch (e) {
clearTimeout(timer);
const msg = (e as Error).name === 'AbortError' ? 'Request timed out' : (e as Error).message;
return { isError: true, content: [{ type: 'text', text: `Fetch failed: ${msg}` }] };
}
}
);
HTML-to-text extraction
Raw HTML is mostly boilerplate: navigation, scripts, styles, cookie banners, and ads. Passing raw HTML to an LLM wastes thousands of context tokens on noise. A simple extraction function handles the common cases without a heavy DOM library:
function htmlToText(html: string): string {
return html
// Remove script and style blocks entirely (including their content)
.replace(/<script[\s\S]*?<\/script>/gi, '')
.replace(/<style[\s\S]*?<\/style>/gi, '')
// Convert block-level elements to newlines for readability
.replace(/<\/(p|div|section|article|li|h[1-6]|tr|blockquote)>/gi, '\n')
.replace(/<br\s*\/?>/gi, '\n')
// Strip all remaining HTML tags
.replace(/<[^>]+>/g, '')
// Decode common HTML entities
.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>')
.replace(/"/g, '"').replace(/'/g, "'").replace(/ /g, ' ')
// Collapse whitespace
.replace(/[ \t]+/g, ' ')
.replace(/\n{3,}/g, '\n\n')
.trim();
}
For richer extraction (preserving heading hierarchy, extracting tables as Markdown, following article-body heuristics), use a library like @mozilla/readability — it applies the same algorithm Firefox uses to strip navigation and ads from articles.
Response caching to avoid hammering sites
LLMs often call the same tool multiple times in one session — re-fetching the same documentation page or the same API reference. A simple in-memory TTL cache avoids redundant HTTP requests and speeds up responses:
interface CacheEntry { text: string; expires: number; }
const responseCache = new Map<string, CacheEntry>();
const CACHE_TTL_MS = 5 * 60 * 1_000; // 5 minutes
function getCached(url: string): string | null {
const entry = responseCache.get(url);
if (!entry || Date.now() > entry.expires) {
responseCache.delete(url);
return null;
}
return entry.text;
}
function setCache(url: string, text: string): void {
// Evict oldest entries if cache grows too large
if (responseCache.size > 500) {
const oldest = [...responseCache.entries()].sort((a, b) => a[1].expires - b[1].expires)[0];
responseCache.delete(oldest[0]);
}
responseCache.set(url, { text, expires: Date.now() + CACHE_TTL_MS });
}
Cache only successful responses. Never cache isError: true results — a transient 503 that gets cached means the LLM sees a stale error for the next 5 minutes even after the target site recovers.
Web search via an API
Most production MCP servers use a search API (Brave Search, SerpAPI, Tavily, or Bing Web Search) rather than scraping search results directly. Direct scraping of Google/Bing violates their ToS and their anti-bot measures evolve constantly. A search API gives consistent JSON results:
server.tool(
'search_web',
'Search the web and return top results with titles, URLs, and snippets',
{
query: z.string().min(1).max(500).describe('Search query'),
num_results: z.number().int().min(1).max(10).default(5),
site_restrict: z.string().optional().describe('Restrict to domain (e.g. "docs.python.org")'),
},
async ({ query, num_results, site_restrict }) => {
const apiKey = process.env.BRAVE_SEARCH_API_KEY;
if (!apiKey) return { isError: true, content: [{ type: 'text', text: 'Search API key not configured' }] };
const q = site_restrict ? `site:${site_restrict} ${query}` : query;
const url = `https://api.search.brave.com/res/v1/web/search?q=${encodeURIComponent(q)}&count=${num_results}`;
const res = await fetch(url, {
headers: { 'Accept': 'application/json', 'X-Subscription-Token': apiKey },
signal: AbortSignal.timeout(8_000),
});
if (!res.ok) return { isError: true, content: [{ type: 'text', text: `Search API error: ${res.status}` }] };
const data = await res.json() as { web?: { results: { title: string; url: string; description: string }[] } };
const results = data.web?.results ?? [];
if (results.length === 0) return { content: [{ type: 'text', text: 'No results found.' }] };
const text = results.map((r, i) =>
`${i + 1}. ${r.title}\n ${r.url}\n ${r.description}`
).join('\n\n');
return { content: [{ type: 'text', text }] };
}
);
Rate limiting and polite crawling
When fetching multiple pages from the same domain — following links, crawling documentation — add a per-domain rate limit. Without it, a single tool call chain can hammer a site hard enough to trigger IP bans or alert their abuse team:
const domainLastFetch = new Map<string, number>();
const MIN_FETCH_INTERVAL_MS = 1_000; // 1 request per second per domain
async function throttledFetch(url: URL): Promise<Response> {
const host = url.hostname;
const lastFetch = domainLastFetch.get(host) ?? 0;
const waitMs = Math.max(0, MIN_FETCH_INTERVAL_MS - (Date.now() - lastFetch));
if (waitMs > 0) await new Promise(r => setTimeout(r, waitMs));
domainLastFetch.set(host, Date.now());
return fetch(url.toString(), {
headers: { 'User-Agent': 'AliveMCP-Bot/1.0 (+https://alivemcp.com/robots.txt)' },
signal: AbortSignal.timeout(HTTP_TIMEOUT_MS),
});
}
Include a real User-Agent with a contact URL. Sites that want to allow your bot can allowlist it in robots.txt or reach out directly if they see excessive traffic. Anonymous curl User-Agents are the first to get blocked.
Monitoring web-fetching MCP servers
Web fetch tools fail in two distinct layers: the MCP transport layer and the external HTTP layer. A network policy change that blocks outbound HTTP makes every fetch_url call return isError: true, but the server responds normally to the MCP protocol handshake. A rotated or expired search API key causes all search_web calls to fail silently with HTTP 401, but tools/list still returns the tool as available.
Use structured health checks that actually exercise the external dependencies: a canary tool call that fetches a known-good URL (your own homepage, a stable docs page) confirms end-to-end connectivity. AliveMCP probes your MCP endpoint every 60 seconds using the full protocol handshake, catching transport-level failures before users encounter broken web search in their agents.
Further reading
- MCP server SSRF prevention — blocking private network access in HTTP tools
- MCP server rate limiting — protecting tools from excessive call volume
- MCP server caching — in-memory and Redis response caching
- MCP server error handling — isError patterns for external API failures
- MCP tool design — argument schemas and return shapes for web tools
- MCP server API wrapper — wrapping external REST APIs as MCP tools
- MCP server health check — testing end-to-end tool execution
- AliveMCP — uptime monitoring for HTTP-deployed MCP servers