Implementation guide · 2026-06-12 · Real-world MCP tools
Building Real-World MCP Tools: Filesystem, Web, Databases, Code Execution, and APIs
Most MCP tutorial examples are self-contained: a get_weather tool that calls an API, a calculator tool that does arithmetic. These are clean to reason about. Real MCP tools are different — they reach outside the process boundary to touch the filesystem, the network, a database, a container runtime, or a third-party API. When tool inputs arrive as LLM-generated strings, each of those external interactions becomes a potential attack vector: path traversal for filesystem tools, SSRF for web fetchers, SQL injection for database queries, sandbox escape for code execution, credential leakage for API wrappers. Every tool category has a different attack surface but they all reduce to the same root cause: unsanitized LLM-provided input reaching an external system. This guide synthesizes the filesystem, web search, code execution, database, and API wrapper tool patterns into a unified framework — and explains a second cross-cutting problem that security alone cannot solve: invisible failure modes that break tool execution while leaving the MCP transport layer healthy.
TL;DR
- Filesystem tools — path traversal is the primary risk. Every path argument must pass through
assertSafePath():path.resolve(arg)+ check that the result starts withALLOWED_ROOT + path.sep. Atomic writes via.tmp-{pid}-{timestamp}rename prevent partial-file reads. See the full filesystem guide for the read guard, depth-limited listing, and file resources pattern. - Web / fetch tools — SSRF is the primary risk. Never trust the URL argument: resolve the hostname to an IP address first, then block RFC 1918 + link-local + loopback ranges before connecting. Content-type validation + byte limit enforcement prevent response-exfiltration and OOM. See the full web-fetch guide for DNS rebinding defense and the 5-minute LRU response cache.
- Code execution tools — sandbox escape is the primary risk.
eval()provides no isolation;vm.Scriptcan be escaped via prototype chains. Real isolation requires Docker with six explicit flags:--network none,--memory 256m --memory-swap 256m,--cpus 0.5,--read-only,--no-new-privileges,--pids-limit 64. See the full code execution guide for the per-language container setup and partial-result streaming pattern. - Database tools — SQL injection is the primary risk. Parameterized queries at the driver level eliminate injection; never build SQL strings by concatenation. Pair with a read-only database user and LIMIT injection (
SELECT * FROM (user_sql) _q LIMIT max_rows). See the full database guide for the schema-as-resource pattern and EXPLAIN cost guard. - API wrapper tools — credential leakage is the primary risk. Never accept API keys as tool arguments — LLMs log inputs in context windows and call histories. Inject credentials server-side in the shared
apiFetch()wrapper. Pair with a circuit breaker to avoid cascading failures when upstream APIs degrade. See the full API wrapper guide for the rate limiter andmapApiError()pattern. - Universal failure modes — all five categories share an invisible failure pattern: when an external dependency breaks, tool calls return
isError: truebut the MCP transport (initialize,tools/list) remains healthy. Any monitor that only checks transport liveness will show green while every tool is broken. External protocol monitoring that calls the tools is the only way to catch this.
Pattern 1: Unsanitized LLM input reaching an external system
The common thread across all real-world MCP tool categories is not the technology stack — it is the trust boundary. In each case, input that the LLM generates (a file path, a URL, a code string, a SQL query fragment, a key name) crosses a boundary into an external system that enforces its own rules. If that input is not validated before crossing the boundary, the external system does what it does with any input — which may not be what the developer intended.
The five categories map to five distinct external system types, each with a characteristic attack surface:
| Tool category | External system | Primary attack vector | Root cause | Defense |
|---|---|---|---|---|
| Filesystem | OS file API | Path traversal (../../etc/passwd) |
Relative path segments bypass intended root | path.resolve() + allowed-root prefix check with path.sep suffix |
| Web / fetch | HTTP client + DNS | SSRF to internal network or metadata APIs | DNS resolution happens after URL validation, enabling rebinding | Resolve hostname to IP first; block RFC 1918 + link-local + loopback after resolution |
| Database | SQL engine | SQL injection via string interpolation | Query built by concatenation instead of parameterized binding | Parameterized queries at driver level; read-only DB user; LIMIT injection wrapper |
| Code execution | Runtime / OS | Sandbox escape to host filesystem or network | eval()/vm.Script share process + prototype chain with host |
Docker with --network none, --read-only, --no-new-privileges, memory + PID limits |
| API wrapper | Third-party API | Credential leakage via tool argument logging | API keys passed as arguments appear in context window + call history | Server-side auth injection; never accept keys as tool parameters |
The unifying frame: LLMs are good at generating structured output, but they are not security-aware. A model asked to "read the user's SSH keys" may generate ../../../home/user/.ssh/id_rsa as a file path argument — not because it is malicious, but because that is the answer to the question. The MCP server is responsible for rejecting that input before it reaches the OS. The attack surface is the gap between what the LLM can generate and what the external system will accept.
Filesystem tools: path traversal defense in depth
A filesystem MCP server gives an LLM read and write access to files — which is exactly why it needs the most conservative input validation of any tool category. The core invariant is simple: every file operation must stay within an explicitly declared root directory. The implementation is subtle: path.startsWith(ALLOWED_ROOT) is wrong because /workspace-evil starts with /workspace. The correct check appends path.sep to the root before comparing:
function assertSafePath(userInput) {
const resolved = path.resolve(userInput);
if (!resolved.startsWith(WORKSPACE + path.sep) && resolved !== WORKSPACE) {
throw new Error(`Access denied: path outside workspace`);
}
return resolved;
}
path.resolve() collapses all ../ segments before the prefix check runs, so there is no traversal vector regardless of how many levels of ../ the input contains. The path.sep suffix prevents the /workspace-evil false positive.
Beyond path validation, production filesystem tools need three additional patterns: stat-before-read guards that check file size before reading (prevents OOM from multi-GB files), atomic write patterns using .tmp-{pid}-{timestamp} rename (prevents partial-file reads during writes), and depth-limited directory listings that return relative paths (absolute paths leak server directory structure to the LLM context). The MCP resources API is the right transport for file content injection into context — use file:// URIs as resources rather than read_file tool calls for large reference documents that the LLM needs throughout a session.
Web fetch tools: SSRF and DNS rebinding
SSRF (Server-Side Request Forgery) is the web-fetch equivalent of path traversal: an attacker supplies a URL that causes the server to make requests to internal infrastructure the attacker cannot reach directly. The classic defense — checking whether the URL's hostname is a private IP address — fails against DNS rebinding attacks, where a public hostname is resolved to a private IP after the hostname check passes.
The correct defense resolves the hostname to an IP address first, then checks the IP against blocked ranges before making the connection:
async function assertSafeUrl(rawUrl) {
const url = new URL(rawUrl); // throws on malformed
if (!['http:', 'https:'].includes(url.protocol)) {
throw new Error('Only http/https allowed');
}
const [ip] = await dns.promises.resolve4(url.hostname);
if (isBlockedIp(ip)) {
throw new Error(`Blocked: ${url.hostname} resolves to private IP ${ip}`);
}
return url;
}
function isBlockedIp(ip) {
return /^(10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.|127\.|169\.254\.|24[0-9]\.|25[0-5]\.)/.test(ip);
}
The regex covers RFC 1918 private ranges, loopback, AWS instance metadata (169.254.x.x), and reserved ranges. DNS rebinding is defeated because the resolution happens inside the security check — there is no window between validation and connection where a DNS response could change. See the SSRF prevention guide for the full blocked CIDR list and the IPv6 equivalent patterns.
Beyond SSRF defense, production web-fetch tools need response size limits (500 KB default, checked against Content-Length before reading), AbortController timeouts (10s default), content-type validation, HTML-to-text stripping that removes <script> and <style> blocks before tag stripping, and a URL-keyed LRU response cache (5-minute TTL, 500-entry limit, never caching errors). Per-domain rate limiting at 1 request per second prevents the tool from being used to DoS third-party sites.
Code execution tools: why eval() and vm.Script are not sandboxes
The temptation with code execution tools is to reach for eval() or Node.js's built-in vm.Script because they are readily available and easy to implement. Both provide weak or no isolation:
| Approach | Isolation level | Known escape | Filesystem access | Network access |
|---|---|---|---|---|
eval() |
None | Full process access by definition | Full | Full |
vm.Script (sandbox: false) |
Scope only | Prototype chain escape via constructor.constructor |
Full | Full |
vm.Script (sandbox: true) |
Scope + prototype | Context isolation escape via shared built-ins | Full (if require exposed) |
Full (if require exposed) |
| Docker with flags | Full container isolation | None (with correct flags) | Read-only volume only | None (--network none) |
The six Docker flags that make container isolation real: --network none (no egress — blocks all network calls from executed code), --memory 256m --memory-swap 256m (no swap — OOM kill instead of disk swap), --cpus 0.5 (CPU quota prevents host starvation), --read-only (filesystem immutable except explicit volume mounts), --no-new-privileges (no setuid/setgid escalation), --pids-limit 64 (fork bomb prevention). All six are necessary — omitting any one opens a specific escape vector.
The full code execution guide covers the per-language image setup (Python, Node, bash), the execFile-with-timeout pattern (no shell expansion), volume mounting code read-only at /sandbox, the finally block for tmpDir cleanup, and pre-pulling images at server startup to avoid cold-start latency on the first tool call. For long-running computations, the partial-result pattern emits intermediate output as MCP notifications rather than blocking the tool call until completion.
Database tools: parameterized queries and the read-only user
SQL injection is one of the oldest vulnerabilities in software, and it appears in MCP database tools through the same mechanism it always has: a query built by string interpolation instead of parameterized binding. The correct defense is driver-level parameterized queries — no string concatenation, no template literals embedding user input into SQL:
// Wrong: SQL injection vector
const rows = await db.query(`SELECT * FROM ${table} WHERE id = ${id}`);
// Correct: parameterized binding (pg driver)
const rows = await client.query(
'SELECT * FROM items WHERE id = $1 AND owner = $2',
[id, ownerId]
);
// Correct: better-sqlite3 (synchronous named bindings)
const rows = db.prepare(
'SELECT * FROM items WHERE id = ? AND owner = ?'
).all(id, ownerId);
The parameterized form is not optional based on risk assessment — it applies to every query including queries that look safe, because LLM-generated inputs are not predictable and injection can be composed across multiple tool calls within a session. See the full database guide for the parameterized query syntax table across postgres.js, node-postgres, better-sqlite3, Prisma, and Drizzle.
Beyond parameterized queries, three additional patterns complete the production database tool: a read-only database user (the MCP server's credentials cannot write if they are read-only at the database level — injection that attempts to write fails at the permission level even if it bypasses the parameterized query check), LIMIT injection wrapping any user-supplied query in SELECT * FROM (user_sql) _q LIMIT max_rows (LLMs generating unbounded scans is a real operational problem), and the schema-as-resource pattern exposing table definitions via db://schema/overview and db://schema/tables/{tableName} URIs so the LLM can explore the schema without making tool calls that count against rate limits.
API wrapper tools: server-side auth injection
API wrapper tools are the most tempting to implement insecurely because the natural API design — accept api_key as a tool parameter — matches how users think about authentication. The problem is that LLMs log their inputs. API keys passed as tool arguments appear in:
- The LLM context window (visible to subsequent reasoning steps and potentially included in summarized outputs)
- Structured call logs from every major LLM provider's API logging
- MCP server logs if the server logs tool call arguments for debugging
- Any observability tooling that captures tool inputs (tracing, metrics, replay tools)
The correct pattern injects credentials server-side in a shared fetch wrapper, never surfacing them to the tool parameter schema:
// Credentials loaded once at server startup — never in tool parameters
const GITHUB_TOKEN = process.env.GITHUB_TOKEN;
async function githubFetch(path, options = {}) {
const response = await fetch(`https://api.github.com${path}`, {
...options,
headers: {
'Authorization': `Bearer ${GITHUB_TOKEN}`,
'Accept': 'application/vnd.github.v3+json',
...options.headers,
},
});
return mapGithubError(response);
}
// Tool parameter schema has no api_key field
server.tool('list_github_issues', {
repo: z.string().describe('owner/repo format'),
state: z.enum(['open', 'closed', 'all']).default('open'),
}, async ({ repo, state }) => {
const issues = await githubFetch(`/repos/${repo}/issues?state=${state}`);
return { content: [{ type: 'text', text: JSON.stringify(issues) }] };
});
See the authentication guide and the full API wrapper guide for the complete error mapping pattern, rate limiter implementation (token bucket at 60 req/min), and the circuit breaker pattern (closed/open/half-open states, failure threshold of 5, 30-second recovery window) that prevents a degraded upstream API from causing every tool call in a session to hang until timeout.
Pattern 2: Invisible failure modes
Security hardening addresses what happens when inputs are malicious. The second cross-cutting concern addresses what happens when external dependencies break — not due to attack, but due to normal operational failures: a disk fills up, a network policy changes, a database password rotates, a Docker daemon crashes, a third-party API subscription lapses. Each of these breaks tool execution in a way that is invisible to standard health checks.
The MCP protocol has a natural internal health surface: the initialize handshake and the tools/list response. Both are serviced by the MCP server process itself, with no dependency on any external system. A server with a full disk, a blocked outbound network, a broken database connection, a crashed Docker daemon, and an expired API key will still:
- Accept
initializeand return its server info - Respond to
tools/listwith a full list of registered tools - Return HTTP 200 to any health check endpoint the server exposes
Only when a tool is actually called does the external dependency failure surface — as an isError: true response. The failure-to-health-check gap for each category:
| Tool category | External dependency | Failure scenario | Tool response | Transport response |
|---|---|---|---|---|
| Filesystem | OS filesystem | Disk full (ENOSPC on write) | isError: true |
initialize: healthy |
| Filesystem | Workspace mount | WORKSPACE_DIR env var misconfigured |
All calls: isError: true |
tools/list: healthy |
| Web / fetch | Outbound network | Egress firewall rule blocks outbound HTTP | isError: true (timeout or connection refused) |
initialize: healthy |
| Database | Database server | Password rotated; connection pool exhausted | All queries: isError: true |
tools/list: healthy |
| Code execution | Docker daemon | Docker daemon crashed or socket permissions changed | isError: true (ENOENT on /var/run/docker.sock) |
initialize: healthy |
| API wrapper | Third-party API | API key expired or subscription lapsed (HTTP 401/403) | All calls: isError: true |
tools/list: healthy |
The pattern is consistent across all five categories: the MCP transport layer is decoupled from the external dependencies that tools rely on. This is correct MCP protocol behavior — a server that can still process the protocol should still respond to protocol-level requests. But it means that the de facto "is this server up?" check — pinging initialize — is an incomplete health signal for real-world tool servers.
What internal health checks cannot see
The standard recommendation for MCP server health checks is to expose an HTTP /health endpoint that verifies transport liveness and, optionally, database connectivity. This catches a useful class of failures — the server process has crashed, the database is unreachable at the network level. It does not catch:
- Permission-level database failures — if the database is reachable but the user's password has been rotated, a
SELECT 1health check using a separate admin connection will pass while all tool queries using the application user fail with authentication errors. - Misconfigured environment variables — if
WORKSPACE_DIRpoints to a path that does not exist, the server starts cleanly, passes its health check, and fails every filesystem tool call. - Container daemon failures — if the Docker daemon crashes after the server starts,
/healthreturns 200 while everyexecute_codecall fails. - Upstream API outages — if a third-party API returns 503, the MCP server health check is unaffected. Every tool call that reaches that API fails until the upstream recovers.
- Network policy regressions — if a firewall rule change blocks outbound HTTP on a new deployment but not the existing deployment that passed health checks at startup, web fetch tools fail silently on the new instance.
The only way to detect these failures is by calling the actual tools with representative inputs and observing whether they succeed or return isError: true. This is fundamentally an external probe — it cannot be done from inside the server process, because the server process cannot observe whether its own tool calls are working without making them. It requires a monitor that speaks the MCP protocol, connects as a client, calls tools with safe test inputs, and alerts when the tool response is an error or when the tool takes longer than expected.
This is the architecture behind AliveMCP's monitoring: rather than pinging initialize and declaring the server healthy, AliveMCP's probes speak the full MCP protocol, call registered tools with safe test inputs, measure response latency per tool, and alert when any tool's error rate or latency exceeds its configured threshold. The probe results appear on the server's public status page as a per-tool health matrix, not just a single green/red indicator for the server.
Building the two-layer validation strategy
Given the two cross-cutting concerns — input security and invisible failure modes — a production real-world MCP server needs a two-layer validation strategy at development time and a two-layer monitoring strategy at production time.
Development: security validation + behavior validation
Security validation tests that bad inputs are correctly rejected. For each tool category, the critical tests are:
- Filesystem:
../../../etc/passwd→ throws path-outside-workspace error - Web fetch:
http://169.254.169.254/latest/meta-data/→ throws SSRF blocked error - Database:
'; DROP TABLE items; --as query fragment → parameterized binding makes this a literal string value, not SQL - Code execution: code that reads
/etc/passwd→ fails with permission denied inside container - API wrapper: tool call with no
api_keyparameter compiles without error (the parameter does not exist in the schema)
Behavior validation tests that valid inputs succeed and that error responses are LLM-readable. An isError: true response that says "Internal error: ENOENT" is correct but not useful. An isError: true response that says "File not found: /workspace/reports/q2.csv — check that the filename is correct and the file has been uploaded" is what an LLM can act on.
Production: transport liveness + tool execution monitoring
Transport liveness monitoring — the /health endpoint check — catches process crashes and database network failures. Tool execution monitoring catches everything else. The practical implementation for self-monitoring is a startup probe that calls each tool with safe test inputs and logs the results at INFO level. This catches misconfigured environment variables and permission regressions at deploy time rather than during the first real user tool call.
For ongoing production monitoring, the startup probe pattern does not scale — it only runs at startup, not continuously, and it does not notify anyone when a tool starts failing mid-deployment. External monitoring is necessary for tools that depend on external systems that can fail independently of the server process. See the error handling guide for the structured error taxonomy and the mapping from tool-level errors to alert severity levels.
Implementation checklist by tool category
Use this checklist when building any real-world MCP tool. Each item addresses either the security pattern or the invisible failure pattern for that category.
Filesystem tools
- ☐
assertSafePath()withpath.resolve()+ALLOWED_ROOT + path.sepprefix check on every path argument - ☐ Stat-before-read guard checking file size against
MAX_READ_BYTES - ☐ Atomic writes using
.tmp-{pid}-{timestamp}→fs.rename() - ☐ Directory listing returns relative paths with depth limit (1–5)
- ☐
delete_filerequiresconfirm: trueliteral parameter - ☐ Startup probe writes and reads a test file to verify workspace is writable and correctly mounted
Web / fetch tools
- ☐
assertSafeUrl()resolves hostname to IP viadns.promises.resolve4()before connection - ☐
isBlockedIp()rejects RFC 1918, loopback, 169.254.x.x, reserved ranges - ☐ Protocol whitelist: only
http:andhttps: - ☐
AbortControllerwith 10s timeout on every fetch - ☐ Response size limit checked against
Content-Length+ streaming truncation - ☐ URL-keyed LRU response cache (5-min TTL, 500 entries, errors not cached)
- ☐ Per-domain rate limiter at 1 req/sec
- ☐ Startup probe fetches a known-good URL to verify egress is allowed
Database tools
- ☐ All queries use parameterized bindings — zero string interpolation
- ☐ Database user is read-only (SELECT only, no INSERT/UPDATE/DELETE/DDL)
- ☐ User-supplied query wrapped in
SELECT * FROM (…) _q LIMIT max_rows - ☐
isSelectStatement()guard rejects non-SELECT queries - ☐
SET LOCAL statement_timeout = '5s'per connection - ☐ Schema exposed as MCP resources (
db://schema/overview,db://schema/tables/{name}) - ☐ Startup probe executes
SELECT 1with application credentials (not admin credentials)
Code execution tools
- ☐ Docker with all six flags:
--network none,--memory 256m --memory-swap 256m,--cpus 0.5,--read-only,--no-new-privileges,--pids-limit 64 - ☐ Code mounted read-only at
/sandbox, not copied into image - ☐
execFilewith explicit timeout andmaxBufferlimit (no shell expansion) - ☐ Tmpdir cleaned in
finallyblock regardless of success/failure - ☐ Container images pre-pulled at startup to avoid cold-start latency
- ☐ Startup probe runs a trivial program (print 42) to verify Docker daemon is accessible and functional
API wrapper tools
- ☐ Credentials loaded from environment variables at startup — never accepted as tool parameters
- ☐ Shared
apiFetch()wrapper injects auth headers server-side - ☐ One tool per API operation (not a generic
call_apitool) - ☐
mapApiError()converts HTTP status codes to LLM-readable error messages - ☐ Rate limiter (token bucket) per API to stay within upstream rate limits
- ☐ Circuit breaker (failure threshold 5, 30s recovery) to fail fast during sustained outages
- ☐ Startup probe makes a low-cost API call (e.g.,
GET /user,GET /ping) to verify credential validity
Related guides
- Filesystem MCP server guide — path traversal defense, atomic writes, file resources
- Web search and HTTP fetch MCP tools — SSRF defense, DNS rebinding, LRU caching
- Code execution MCP server — Docker isolation, sandbox flags, partial results
- Database tools MCP server — parameterized queries, read-only user, schema resources
- API wrapper MCP server — server-side auth, circuit breaker, rate limiting
- SSRF prevention for MCP servers — full CIDR blocklist, IPv6 defense
- MCP server authentication — Bearer tokens, API key validation, mTLS
- Circuit breaker pattern for MCP servers — state machine, failure threshold, recovery
- MCP server error handling — isError taxonomy, LLM-readable messages, structured errors
- MCP server health checks — /health endpoint, liveness vs readiness, monitoring gap
- MCP resources API — file:// URIs, db:// URIs, resource subscriptions