Guide · MCP Security

MCP server audit logging

Audit logging records every significant action on your MCP server — which user called which tool, with what arguments, what the result was, and how long it took. This trail is indispensable for security reviews, incident forensics, compliance reporting, and diagnosing unexpected behavior in production. For MCP servers specifically, tool calls are the most important events to capture: they are the interface between an LLM agent and your backend, and they carry real authority (read, write, delete, send).

TL;DR

Wrap every tool handler in middleware that emits a structured JSON log line with: timestamp, actor (the authenticated user/token identity), tool name, args (PII-redacted), outcome (ok or error), durationMs, and requestId for correlation. Redact fields like email, password, token, and ssn before writing. Ship logs to a separate storage location so a compromised server process cannot erase its own trail. Retain for 90 days minimum; 1 year for compliance workloads.

Why MCP servers need audit logs

MCP tool calls are not ordinary HTTP requests. An agent can chain dozens of tool calls in a single session — reading files, querying databases, sending messages, triggering deploys — with minimal human review of each individual step. This autonomy makes audit logs more important, not less:

What to capture per tool call

Every audit log entry should contain enough information to answer: who did what to what, when, and what happened? The minimum viable field set:

FieldTypePurpose
timestampISO 8601 UTCWhen the tool call was received (not completed)
requestIdUUIDCorrelation ID — matches HTTP header or generated; ties log lines to the same session
actor.idstringAuthenticated user ID, API key fingerprint, or token sub claim — never the raw token
actor.ipstringClient IP (trust X-Forwarded-For only behind a known proxy)
toolstringExact tool name as registered (e.g. delete_file)
argsobjectSanitized argument object — PII fields replaced with [REDACTED]
outcomeok | errorWhether the tool returned normally or threw
errorstring | nullError message when outcome is error (truncate at 500 chars)
durationMsintegerTool execution time in milliseconds
serverVersionstringYour server's version string — helps correlate behavior changes after deploys

Middleware pattern

Rather than adding logging to each individual tool handler, wrap the tool registration at the SDK level. The MCP SDK does not provide a built-in middleware hook, but you can achieve the same result by wrapping each handler function:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { randomUUID } from 'crypto';

function auditLog(entry: object) {
  // Write to stdout as newline-delimited JSON (NDJSON)
  // Caddy / Docker / systemd captures stdout and ships to your log sink
  process.stdout.write(JSON.stringify(entry) + '\n');
}

const PII_KEYS = new Set(['email', 'password', 'token', 'secret', 'ssn', 'phone', 'creditCard']);

function redactArgs(args: Record<string, unknown>): Record<string, unknown> {
  const result: Record<string, unknown> = {};
  for (const [key, val] of Object.entries(args)) {
    // Redact by key name match
    if (PII_KEYS.has(key.toLowerCase())) {
      result[key] = '[REDACTED]';
    } else if (typeof val === 'string' && val.length > 500) {
      // Truncate large blobs — likely file content, not useful in logs
      result[key] = val.slice(0, 200) + ' ... [TRUNCATED]';
    } else {
      result[key] = val;
    }
  }
  return result;
}

// Wrap a tool handler to emit audit log entries
function withAudit<TArgs extends object, TResult>(
  toolName: string,
  handler: (args: TArgs, context: any) => Promise<TResult>
): (args: TArgs, context: any) => Promise<TResult> {
  return async (args, context) => {
    const requestId = (context.requestId as string | undefined) ?? randomUUID();
    const actor = context.actor ?? { id: 'anonymous', ip: 'unknown' };
    const start = Date.now();
    let outcome: 'ok' | 'error' = 'ok';
    let error: string | null = null;

    try {
      const result = await handler(args, context);
      return result;
    } catch (err) {
      outcome = 'error';
      error = err instanceof Error ? err.message.slice(0, 500) : String(err);
      throw err;
    } finally {
      auditLog({
        timestamp: new Date().toISOString(),
        requestId,
        actor,
        tool: toolName,
        args: redactArgs(args as Record<string, unknown>),
        outcome,
        error,
        durationMs: Date.now() - start,
        serverVersion: process.env.SERVER_VERSION ?? 'unknown',
      });
    }
  };
}

// Usage
const server = new McpServer({ name: 'my-server', version: '1.0.0' });

server.tool(
  'delete_file',
  'Permanently delete a file from disk',
  { path: z.string() },
  withAudit('delete_file', async ({ path: filePath }, context) => {
    await fs.unlink(filePath);
    return { content: [{ type: 'text', text: `Deleted: ${filePath}` }] };
  })
);

PII redaction patterns

Arguments passed to MCP tools often contain user-supplied data. Before writing to the audit log, redact fields that could contain personal information. Key-name matching covers most cases, but pattern matching on values catches data that arrives in generically-named fields:

const EMAIL_RE = /\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b/g;
const CREDIT_CARD_RE = /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g;
const TOKEN_RE = /\b(ghp_|sk-|Bearer |xoxb-)\S+/g;

function redactStringValues(s: string): string {
  return s
    .replace(EMAIL_RE, '[EMAIL]')
    .replace(CREDIT_CARD_RE, '[CARD]')
    .replace(TOKEN_RE, '[TOKEN]');
}

function redactArgs(args: Record<string, unknown>): Record<string, unknown> {
  const result: Record<string, unknown> = {};
  for (const [key, val] of Object.entries(args)) {
    if (PII_KEYS.has(key.toLowerCase())) {
      result[key] = '[REDACTED]';
    } else if (typeof val === 'string') {
      result[key] = redactStringValues(val);
    } else {
      result[key] = val;
    }
  }
  return result;
}

Never log raw JWT tokens, API keys, or passwords — even truncated. Log the sub claim from a decoded JWT, or the fingerprint (first 8 chars) of an API key, not the key itself.

Protecting the audit trail

An audit log that the compromised process can overwrite provides no forensic value. Several protective measures:

Log retention and volume

Audit log volume depends on your call rate. Each log entry is roughly 500 bytes of NDJSON. At 100 tool calls/minute (moderate agent workload) that's 3 MB/hour or ~2 GB/month — manageable for any log store.

Workload typeMinimum retentionRecommended
Indie / hobby project30 days90 days
B2B SaaS / team plan90 days1 year
Healthcare / finance1 year (HIPAA) / 7 years (SOX)7 years + immutable

Set a log rotation policy at your aggregation layer. Most log stores support TTL-based deletion that satisfies "retain for N days" without manual cleanup.

Querying audit logs for security review

If your logs are in a queryable store (e.g. Loki with LogQL, or a SQLite archive), useful security queries:

-- Destructive tool calls in the last 24 hours
SELECT timestamp, actor_id, tool, args
FROM audit_log
WHERE outcome = 'ok'
  AND tool IN ('delete_file', 'drop_table', 'send_email')
  AND timestamp > datetime('now', '-1 day')
ORDER BY timestamp DESC;

-- High-frequency callers (possible abuse)
SELECT actor_id, COUNT(*) AS call_count
FROM audit_log
WHERE timestamp > datetime('now', '-1 hour')
GROUP BY actor_id
HAVING call_count > 500
ORDER BY call_count DESC;

-- Error rate by tool (detect broken tools before users notice)
SELECT tool,
       SUM(CASE WHEN outcome='error' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS error_pct
FROM audit_log
WHERE timestamp > datetime('now', '-1 day')
GROUP BY tool
HAVING error_pct > 5
ORDER BY error_pct DESC;

Pair these queries with alerts: if destructive-tool call volume doubles in an hour, or any actor exceeds 1,000 calls in a minute, page the on-call engineer.

Correlating audit logs with uptime events

Your audit logs are most powerful when correlated with uptime events. When AliveMCP detects that your server went down, you can query the audit log for the last tool call executed before the failure — often revealing an unhandled exception, a memory-exhausting argument, or a destructive operation that corrupted internal state.

Store a requestId in every log line and propagate it to your structured application logs so you can reconstruct the full execution trace for any tool call that preceded an outage.

Further reading