Guide · Security

MCP server security monitoring

Security monitoring and uptime monitoring ask different questions. Uptime monitoring asks: is the server running? Security monitoring asks: is the server being attacked, compromised, or misused? The two are complementary, not substitutable. A server can be fully available but actively under credential stuffing attack. A server can be healthy from an external probe perspective while a dependency vulnerability allows privilege escalation from inside. This guide covers the security-specific signals MCP server operators should monitor, how to set baselines and alert thresholds for each, and where external probe monitoring like AliveMCP fits in the broader security picture.

TL;DR

Four security monitoring areas for MCP servers: (1) auth failure rate — a spike above your normal 2–5% baseline signals credential stuffing or misconfigured clients; (2) rate anomalies — abnormal call volume per session_id signals automated abuse; (3) tool schema integrity — a changed tools/list hash signals an unexpected update or dependency tampering; (4) TLS certificate expiry — AliveMCP's protocol-layer probing catches certificate issues at the handshake level, not just the port. External probing monitors availability, not security — use it as one layer in a layered security posture, not as a SIEM replacement.

Auth failure rate monitoring

Authentication failures are a normal part of any API's operation — misconfigured clients, expired tokens, and integration bugs all generate 401 and 403 responses. The question isn't whether auth failures occur; it's whether the rate is normal or elevated above baseline.

Establishing your baseline

Log every initialize request with its auth result: auth_result: "success" or auth_result: "failed" plus the failure reason (token_expired, token_invalid, scope_insufficient). Over one to two weeks of normal operation, measure your baseline auth failure rate as a percentage of total initialize attempts. Most well-configured MCP servers see a 2–5% auth failure rate (primarily from token expiry and first-time integration setup). Document this baseline.

Anomaly detection and alerting

Alert when auth failure rate exceeds your baseline by a significant factor:

Credential stuffing against MCP servers is less common than against web applications — MCP endpoints aren't browser-accessible and require protocol-level interaction. But for authenticated MCP servers that hold sensitive data or expensive-to-use capabilities, the attack surface is real. See MCP authentication primer for the full authentication pattern coverage, including the OAuth 2.0 Client Credentials flow that's most common for server-to-server MCP authentication.

Origin and client diversity monitoring

A normal MCP server traffic pattern shows requests from a consistent set of known agent client identifiers or IP ranges. Sudden appearance of large numbers of new, unknown client IDs — especially with rapid-fire initialize attempts in sequence — is a behavioral signal beyond just the failure rate. Log client_id and source IP on every initialize; alert when the 5-minute unique-source count exceeds 3× the 30-day hourly average.

Rate anomaly detection

After authentication, a legitimate agent session has a characteristic tool call pattern: a burst of tool calls (2–10 in a few seconds as the LLM reasons through a task), then a quiet period, then another burst. An automated abuse pattern looks different: sustained high-cadence tool calls from a single session_id with no inter-call pauses, or many concurrent sessions all calling the same expensive tool simultaneously.

Per-session call rate

Track cumulative tool call count per session_id over the session lifetime. Alert when:

When a session hits the threshold, options: return a -32001 error with a "rate limit exceeded" message, or log and alert without blocking (if you're in monitoring-only mode before enforcing limits).

Cross-session fleet anomalies

A coordinated attack may use many short sessions to stay below per-session thresholds. Monitor the aggregate fleet rate: total tool calls per minute across all sessions. Alert when this exceeds your expected peak × 3. If you have IP-level data, check whether the aggregate spike correlates with a single IP or ASN — a signal of botnet origin vs legitimate viral traffic.

Note: legitimate viral traffic (your server gets featured in a blog post) can produce the same aggregate rate spike as a coordinated abuse event. Distinguish them by checking auth failure rate simultaneously — legitimate new users have a normal auth failure rate; credential stuffing has an elevated one. Real viral traffic also tends to produce varied tool call patterns; automated abuse tends to call the same tool repeatedly.

Tool schema integrity monitoring

Your server's tools/list response defines the tool surface your clients see. A change to tools/list is expected when you deploy a new version — but an unexpected change (between deployments, during a period where no deploy occurred) is a signal worth investigating. It could indicate:

Schema hash monitoring

On every tools/list response, compute a hash of the canonical tool definitions (sorted tool names, sorted parameter schemas, stringified). Store the hash with a timestamp. Alert when the hash changes outside of a known deployment window:

const schemaHash = crypto
  .createHash('sha256')
  .update(JSON.stringify(
    toolList.tools.sort((a, b) => a.name.localeCompare(b.name))
  ))
  .digest('hex')
  .slice(0, 16);

AliveMCP's probe collects the tools/list response on every probe cycle and tracks schema drift. An unexpected tools/list change generates a schema_drift_detected event in the monitoring dashboard. This isn't a security alert in isolation — it's an investigation trigger. Check your deployment history first; if no deploy occurred in the window, escalate.

See schema drift in MCP tool definitions for the full schema drift detection and response pattern.

TLS certificate monitoring

An expired TLS certificate causes the same failure signature as a completely downed server: TLS handshake failure → transport-layer probe failure → alert. The difference is the error message and the remediation (renew certificate vs restart process). AliveMCP's protocol-layer probe reaches the TLS handshake before the MCP protocol exchange begins — it can detect a certificate expiry at the probe level, not just via port-scanning tools.

AliveMCP Author tier shows certificate expiry date in the server monitoring dashboard and generates a warning alert 14 days before expiry and a critical alert 3 days before expiry. This gives you time to renew before the certificate actually expires, avoiding the outage.

For Let's Encrypt certificates with auto-renewal (certbot, Caddy's built-in renewal, AWS Certificate Manager auto-renew), certificate expiry monitoring is a belt-and-suspenders check on whether the auto-renewal worked. Let's Encrypt certificates have a 90-day validity period; auto-renewal typically fires 30 days before expiry. If your monitoring shows a certificate expiry 30 days out that should have renewed, your renewal process has failed silently. See MCP server SSL certificate for the full TLS monitoring and renewal pattern.

Dependency vulnerability scanning

Your MCP server's npm or pip dependencies are part of its attack surface. A high-severity vulnerability in a transitive dependency can expose your server to remote code execution even if your own code is clean.

Minimum viable dependency security:

If you're running an MCP server that handles sensitive data or has privileged access to downstream systems (calendar, email, financial APIs), treat dependency vulnerabilities as production incidents, not development backlog items.

Supply chain health monitoring

If your agents pull third-party MCP servers from registries (MCP.so, Smithery, Glama, the Official Registry), those third-party servers are part of your supply chain. A third-party MCP server that's been dormant for 6 months, has a compromised maintainer account, or is silently returning malformed tool definitions is a risk to your agent workflows.

AliveMCP's public registry audit monitors every listed MCP endpoint and tracks health over time. The Q2 2026 audit found 91% of public MCP endpoints either dead or returning protocol errors — see State of the MCP Registry Q2 2026 for the full methodology. For teams that depend on specific third-party MCP servers, monitoring those servers' health status in AliveMCP's public dashboard gives you advance warning when a dependency is degrading — before your agents start failing silently.

Supply chain security for MCP goes beyond uptime. Verify third-party MCP servers you depend on:

What external probing cannot tell you

AliveMCP monitors the availability and protocol health of your MCP endpoint from outside. It is not a Security Information and Event Management (SIEM) system. It cannot:

For these capabilities, you need a dedicated security tool: server-side log analysis (Splunk, Elastic SIEM), runtime security monitoring (Falco for containers), or a managed security service. External probe monitoring from AliveMCP sits at the availability layer — the bottom of the security stack, not the top. A complete security posture layers external availability monitoring, server-side auth and rate monitoring, vulnerability scanning, and (for sensitive workloads) runtime security monitoring.

Related questions

How do I know if my MCP server has been compromised?

The clearest indicators: unexpected changes in tools/list (schema drift outside of deployments); auth failure rate spike followed by a period of new, unknown session IDs authenticating successfully; unusual tool call patterns from previously inactive accounts; or unexpected changes in downstream API usage patterns (more calls, from different times, to different endpoints than usual). None of these are definitive on their own — each requires correlation with deployment history and known traffic patterns. If you suspect a compromise: revoke and rotate all API keys and OAuth client secrets immediately, check your deployment pipeline for unauthorized changes, and audit the server's dependency tree against known vulnerability databases. See MCP server observability for the logging foundations that make this kind of audit possible.

Should I disable AliveMCP probing during a security incident?

Generally no. The probe provides a real-time, outside-in view of whether your server is up and responding to the MCP protocol — useful during incident investigation to confirm whether a remediation step (restart, re-deploy, certificate renewal) actually succeeded from the public internet perspective. However, if your incident response involves taking the server completely offline intentionally, set a maintenance window in AliveMCP to suppress alerts during the offline period. This prevents probe alerts from adding noise to your incident timeline while the server is deliberately down.

Do I need to worry about the AliveMCP probe itself as a security risk?

The probe only performs the initialize handshake and a tools/list call — the minimum valid MCP protocol sequence. It uses a read-only monitoring credential (configurable in Author tier) that has only the scopes required for initialize and tools/list. The probe does not call individual tools, does not send data to your server beyond the standard handshake, and does not retain tool call arguments. If your security policy requires allowlisting probe origins, AliveMCP publishes the IP ranges used by its probe network and will add your monitoring credential to an allowlist on request. For air-gapped or VPC-internal servers, see private MCP monitoring for the agent-based collector pattern that keeps all probe data inside your network boundary.

Further reading