Guide · MCP Security

MCP server audit logging

Audit logging records every significant action on your MCP server — which user called which tool, with what arguments, what the result was, and how long it took. This trail is indispensable for security reviews, incident forensics, compliance reporting, and diagnosing unexpected behavior in production. For MCP servers specifically, tool calls are the most important events to capture: they are the interface between an LLM agent and your backend, and they carry real authority (read, write, delete, send).

TL;DR

Wrap every tool handler in middleware that emits a structured JSON log line with: timestamp, actor (the authenticated user/token identity), tool name, args (PII-redacted), outcome (ok or error), durationMs, and requestId for correlation. Redact fields like email, password, token, and ssn before writing. Ship logs to a separate storage location so a compromised server process cannot erase its own trail. Retain for 90 days minimum; 1 year for compliance workloads.

Why MCP servers need audit logs

MCP tool calls are not ordinary HTTP requests. An agent can chain dozens of tool calls in a single session — reading files, querying databases, sending messages, triggering deploys — with minimal human review of each individual step. This autonomy makes audit logs more important, not less:

Post-incident forensics — when a production record is deleted unexpectedly, audit logs tell you which agent session called which tool, with which arguments, at what time
Compliance — SOC 2, HIPAA, and ISO 27001 controls require evidence that privileged actions are logged and monitored
Abuse detection — anomalous tool call volumes, unusual argument patterns, or calls from unexpected IP ranges show up in logs before they show up in user complaints
Debugging agent behavior — when an agent produces a surprising output, the audit log of its tool calls is the ground truth of what actually happened

What to capture per tool call

Every audit log entry should contain enough information to answer: who did what to what, when, and what happened? The minimum viable field set:

Field	Type	Purpose
`timestamp`	ISO 8601 UTC	When the tool call was received (not completed)
`requestId`	UUID	Correlation ID — matches HTTP header or generated; ties log lines to the same session
`actor.id`	string	Authenticated user ID, API key fingerprint, or token sub claim — never the raw token
`actor.ip`	string	Client IP (trust X-Forwarded-For only behind a known proxy)
`tool`	string	Exact tool name as registered (e.g. `delete_file`)
`args`	object	Sanitized argument object — PII fields replaced with `[REDACTED]`
`outcome`	`ok` \| `error`	Whether the tool returned normally or threw
`error`	string \| null	Error message when outcome is `error` (truncate at 500 chars)
`durationMs`	integer	Tool execution time in milliseconds
`serverVersion`	string	Your server's version string — helps correlate behavior changes after deploys

Middleware pattern

Rather than adding logging to each individual tool handler, wrap the tool registration at the SDK level. The MCP SDK does not provide a built-in middleware hook, but you can achieve the same result by wrapping each handler function:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { randomUUID } from 'crypto';

function auditLog(entry: object) {
  // Write to stdout as newline-delimited JSON (NDJSON)
  // Caddy / Docker / systemd captures stdout and ships to your log sink
  process.stdout.write(JSON.stringify(entry) + '\n');
}

const PII_KEYS = new Set(['email', 'password', 'token', 'secret', 'ssn', 'phone', 'creditCard']);

function redactArgs(args: Record<string, unknown>): Record<string, unknown> {
  const result: Record<string, unknown> = {};
  for (const [key, val] of Object.entries(args)) {
    // Redact by key name match
    if (PII_KEYS.has(key.toLowerCase())) {
      result[key] = '[REDACTED]';
    } else if (typeof val === 'string' && val.length > 500) {
      // Truncate large blobs — likely file content, not useful in logs
      result[key] = val.slice(0, 200) + ' ... [TRUNCATED]';
    } else {
      result[key] = val;
    }
  }
  return result;
}

// Wrap a tool handler to emit audit log entries
function withAudit<TArgs extends object, TResult>(
  toolName: string,
  handler: (args: TArgs, context: any) => Promise<TResult>
): (args: TArgs, context: any) => Promise<TResult> {
  return async (args, context) => {
    const requestId = (context.requestId as string | undefined) ?? randomUUID();
    const actor = context.actor ?? { id: 'anonymous', ip: 'unknown' };
    const start = Date.now();
    let outcome: 'ok' | 'error' = 'ok';
    let error: string | null = null;

    try {
      const result = await handler(args, context);
      return result;
    } catch (err) {
      outcome = 'error';
      error = err instanceof Error ? err.message.slice(0, 500) : String(err);
      throw err;
    } finally {
      auditLog({
        timestamp: new Date().toISOString(),
        requestId,
        actor,
        tool: toolName,
        args: redactArgs(args as Record<string, unknown>),
        outcome,
        error,
        durationMs: Date.now() - start,
        serverVersion: process.env.SERVER_VERSION ?? 'unknown',
      });
    }
  };
}

// Usage
const server = new McpServer({ name: 'my-server', version: '1.0.0' });

server.tool(
  'delete_file',
  'Permanently delete a file from disk',
  { path: z.string() },
  withAudit('delete_file', async ({ path: filePath }, context) => {
    await fs.unlink(filePath);
    return { content: [{ type: 'text', text: `Deleted: ${filePath}` }] };
  })
);

PII redaction patterns

Arguments passed to MCP tools often contain user-supplied data. Before writing to the audit log, redact fields that could contain personal information. Key-name matching covers most cases, but pattern matching on values catches data that arrives in generically-named fields:

const EMAIL_RE = /\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b/g;
const CREDIT_CARD_RE = /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g;
const TOKEN_RE = /\b(ghp_|sk-|Bearer |xoxb-)\S+/g;

function redactStringValues(s: string): string {
  return s
    .replace(EMAIL_RE, '[EMAIL]')
    .replace(CREDIT_CARD_RE, '[CARD]')
    .replace(TOKEN_RE, '[TOKEN]');
}

function redactArgs(args: Record<string, unknown>): Record<string, unknown> {
  const result: Record<string, unknown> = {};
  for (const [key, val] of Object.entries(args)) {
    if (PII_KEYS.has(key.toLowerCase())) {
      result[key] = '[REDACTED]';
    } else if (typeof val === 'string') {
      result[key] = redactStringValues(val);
    } else {
      result[key] = val;
    }
  }
  return result;
}

Never log raw JWT tokens, API keys, or passwords — even truncated. Log the sub claim from a decoded JWT, or the fingerprint (first 8 chars) of an API key, not the key itself.

Protecting the audit trail

An audit log that the compromised process can overwrite provides no forensic value. Several protective measures:

Write to stdout, not a local file — your container runtime or systemd captures stdout and ships it to a central log store outside the application's reach. The process cannot retroactively modify captured stdout.
Separate log store — ship to a log aggregation service (Loki, Elasticsearch, CloudWatch Logs) where the MCP server process has append-only credentials. Even if the server is fully compromised, past log entries remain intact.
Immutable S3/GCS bucket — for compliance workloads, enable object lock on the log bucket so entries cannot be deleted within the retention window.
Separate process for sensitive writes — a side-car process with append-only disk access can receive log events over a Unix socket and write them, preventing the main process from corrupting its own trail.

Log retention and volume

Audit log volume depends on your call rate. Each log entry is roughly 500 bytes of NDJSON. At 100 tool calls/minute (moderate agent workload) that's 3 MB/hour or ~2 GB/month — manageable for any log store.

Workload type	Minimum retention	Recommended
Indie / hobby project	30 days	90 days
B2B SaaS / team plan	90 days	1 year
Healthcare / finance	1 year (HIPAA) / 7 years (SOX)	7 years + immutable

Set a log rotation policy at your aggregation layer. Most log stores support TTL-based deletion that satisfies "retain for N days" without manual cleanup.

Querying audit logs for security review

If your logs are in a queryable store (e.g. Loki with LogQL, or a SQLite archive), useful security queries:

-- Destructive tool calls in the last 24 hours
SELECT timestamp, actor_id, tool, args
FROM audit_log
WHERE outcome = 'ok'
  AND tool IN ('delete_file', 'drop_table', 'send_email')
  AND timestamp > datetime('now', '-1 day')
ORDER BY timestamp DESC;

-- High-frequency callers (possible abuse)
SELECT actor_id, COUNT(*) AS call_count
FROM audit_log
WHERE timestamp > datetime('now', '-1 hour')
GROUP BY actor_id
HAVING call_count > 500
ORDER BY call_count DESC;

-- Error rate by tool (detect broken tools before users notice)
SELECT tool,
       SUM(CASE WHEN outcome='error' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS error_pct
FROM audit_log
WHERE timestamp > datetime('now', '-1 day')
GROUP BY tool
HAVING error_pct > 5
ORDER BY error_pct DESC;

Pair these queries with alerts: if destructive-tool call volume doubles in an hour, or any actor exceeds 1,000 calls in a minute, page the on-call engineer.

Correlating audit logs with uptime events

Your audit logs are most powerful when correlated with uptime events. When AliveMCP detects that your server went down, you can query the audit log for the last tool call executed before the failure — often revealing an unhandled exception, a memory-exhausting argument, or a destructive operation that corrupted internal state.

Store a requestId in every log line and propagate it to your structured application logs so you can reconstruct the full execution trace for any tool call that preceded an outage.