Deep dive · 2026-04-25 · Drift detection

Schema drift in MCP tool definitions — the silent breakage no HTTP probe can catch

Servers don't only fail by going down. They also fail by quietly changing shape — a tool removed in a refactor, a parameter renamed in v0.7, a description rewritten on a whim — while every HTTP probe pointed at the server keeps returning a green dot. This is schema drift. It is the failure mode that ends with a downstream agent calling a tool that no longer exists, with parameters the server hasn't seen since last Tuesday. The probe was up. The tool wasn't. This post is what schema drift looks like in MCP specifically, why it's invisible to every monitor that isn't reading the response body, and the canonical-JSON hash that catches all four shapes drift takes.

TL;DR

An MCP server's tools/list response is its public contract. When that response changes between probes — a tool added, removed, renamed, or its signature rewritten — every cached tool definition in every downstream agent goes stale. We measured this empirically across the 196 healthy servers from the Q2 2026 audit: 14 of them showed a tool-list hash change inside a 48-hour probe window — a 7.1% drift rate, naively extrapolating to roughly a 50% chance of drift over 30 days for any given server. Schema drift is invisible to HTTP probes, partly invisible to JSON-RPC envelope probes, and only fully visible to a probe that round-trips tools/list and hashes the canonical-JSON shape of the result. The rest of this post is the four shapes drift takes, what each one breaks for downstream agents, the hash routine that detects all of them, and how to alert on drift without alerting on noise.

What "schema drift" means in the MCP context

"Schema drift" is a phrase the database world uses for the moment when a column gets renamed in production and three downstream ETL jobs break before anyone notices. The MCP version is the same idea, applied to the tool catalogue an MCP server publishes via its tools/list method.

In MCP, every server is required to respond to tools/list with an array of tool definitions. Each definition has at minimum a name, a description, and an inputSchema — the JSON Schema that tells a calling client which arguments are valid. Many also publish an outputSchema. This array is the server's contract. Anything that calls a tool — an LLM agent, a CLI, a workflow runner, a cached prompt template — depends on the contract being stable enough between calls that the call still validates against the schema the caller saw last.

Drift happens when that contract changes without anyone declaring a release. Sometimes drift is intentional but unannounced: a developer ships v0.6 of their MCP, removes a tool that nobody had used, and forgets to bump the version in the registry. Sometimes drift is accidental: a code-gen step regenerates the tool list with a different parameter ordering on every deploy, and although the semantic schema is identical, the hash isn't. Sometimes drift is a regression: a refactor accidentally drops a tool, and the next deploy reduces the catalogue from twelve tools to eleven without anyone testing the difference. From the outside — from the perspective of a downstream agent or a client cache — all three cases look the same. Yesterday the server published a contract. Today it published a different one. The cache is stale.

This is mode #7 of the seven failure modes laid out in Why MCP servers die silently — and it's the one we ranked as the most underestimated, because it's the only failure mode where the server is technically working perfectly. It's serving JSON-RPC. The envelope is correct. The tool list response parses. The transport is up. The contract just isn't the contract you saw last week.

Why drift is invisible to every probe except one

Walk up the probe-depth ladder. At every layer below the body of the tools/list response, drift is unobservable.

TCP probe. Sees that the socket answers. Tool list is irrelevant.
TLS probe. Sees that the handshake completes and the cert is valid. Tool list is irrelevant.
HTTP-status probe (UptimeRobot's free tier, Pingdom's default, BetterStack's free tier). Sees that the URL returns 2xx. Tool list is irrelevant. Even an HTTP-with-body-substring probe — the slightly fancier one that grep-matches the response body for a known string — only catches drift if you happened to pre-configure a substring that the drifted version doesn't include, which means you have to know in advance what's going to drift, which defeats the point.
JSON-RPC envelope probe. Sees that the response parses as JSON-RPC 2.0 and contains result + id. Tool list is parsed but its shape isn't compared to anything historical. Drift passes.
MCP handshake probe. Sees that initialize succeeds and tools/list returns a well-shaped array. Drift still passes — every drifted version still returns a well-shaped array, it's just a different array.
Schema-drift probe. Compares the canonical-JSON hash of the tool list against the previous probe's hash. This is the only layer where drift becomes visible.

A probe that doesn't get to the last rung is a probe that ships drift to production the same day the server does. MCP server uptime monitoring covers the full stack up to that last rung; this post is the rung itself.

The hash diff — the smallest unit of drift detection

The detection routine is conceptually trivial: every probe, hash the tool list; on every probe, compare the hash against the previous probe's hash; if the hash changed, that's a drift event. The whole thing fits in three lines of pseudocode. The trick is making the hash canonical — the same logical tool list has to produce the same bytes every time — because any non-canonical hash will produce false-positive drift events on every reorder of map keys or every change in JSON-formatter whitespace, and a probe that pages someone every 30 seconds with "drift detected" because the server's JSON encoder switched from sorted to insertion-order keys is worse than no probe at all.

Canonical-JSON is a small spec — a few rules that pick a single byte sequence for any JSON document. The relevant ones for this purpose:

Object keys are emitted in sorted (lexicographic) order at every level.
No whitespace anywhere outside of strings.
Numbers are emitted in their shortest faithful form (no trailing zeros, no leading zeros, no exponential notation unless required).
Strings are UTF-8 with the minimum legal escaping.

Run the tool list through a canonical-JSON serializer, run the resulting bytes through SHA-256, store the hex digest. Any future probe whose hash matches has the same logical tool list. Any future probe whose hash differs has drifted by exactly one of the four shapes laid out below — there's no false-positive class left once the canonical step is in place. Most ecosystems have a one-import canonical-JSON library; for environments that don't, the rule set above is twenty lines of code.

The cost is one SHA-256 per probe. At our 60-second cadence (covered in the JSON-RPC post linked from the previous deep-dive) that is rounding-error CPU on the probe collector and one extra column in the per-probe row in storage. The benefit is that drift becomes a first-class signal alongside latency and availability, observable on the same time-series, alertable on the same channel.

The four shapes drift takes — and what each one breaks

A drift event is a hash mismatch. Once you have it, the question becomes "what kind of drift." The diff between the old tool list and the new one always falls into one of four buckets, in increasing order of impact on downstream agents:

1. Description-only drift — a tool's description rewritten

The smallest shape. A tool's name and inputSchema are unchanged, but its description field has been rewritten. From a strict-contract perspective this is innocuous — every existing call still validates. From an agent-routing perspective it's not innocuous at all: the agent's tool selection decision was made on the basis of the old description, and a rewritten description can flip routing for borderline calls. We have seen rewrites where a tool went from "search the knowledge base by query" to "search the knowledge base by query (deprecated; use search_v2 instead)" — the description literally announced the deprecation, but every cached prompt template was still routing through the old name.

Operationally: log it, don't page on it. The author probably knows. But the drift event lets the author confirm the publish made it to production, and lets a downstream consumer notice that they should re-read the description before the next agent run.

2. Schema-shape drift — a parameter added, removed, or renamed

The most common shape we see. The tool itself still exists, but its inputSchema has changed: a required parameter became optional, an optional parameter became required, a parameter was renamed, a new parameter was added with a default, an enum got a new variant. Each of these is benign in isolation and breaking in aggregate, depending on what the calling agent does. Removed-parameter drift is the most common ship-breaking change — every cached call that passed the now-removed parameter will now fail validation on the server side with a JSON-RPC -32602 invalid params error.

Operationally: page on it, but not at 3am — drift events are pre-incident, not in-flight. A drift queue that batches into a daily digest with the per-tool diff is the right channel. Authors who want it real-time can opt in via Slack alerts; teams typically prefer the digest.

3. Tool-removed drift — the tool no longer exists

The tool's name is no longer present in the array. Every downstream call to that name will now return -32601 method not found. This is the bug class that produced the post that started this whole inquiry: a server lost a tool in a refactor, the registry didn't notice, and three downstream agents kept calling the removed name for two weeks before anyone realised the calls were silently failing into the agent's error-handler.

Operationally: page on it. A tool removal is the closest thing to "down" that drift can be — the contract has lost a member.

4. Tool-added drift — a new tool appeared

The tool array has a new entry that wasn't in the previous probe. This is the rarest shape and the lowest-impact one for callers — nothing was depending on a tool that didn't exist before. But it is the highest-impact one for catalogue-and-discovery — the existence of a new tool changes which agents could plausibly route to this server, and so the tool-added event is the right trigger for downstream catalogue refreshes (registry indexers, agent tool-selection caches, embedding stores keyed on tool descriptions).

Operationally: log it loudly, page on it for catalogue consumers (registries, agent platforms), don't page on it for the author themselves — they're the ones who shipped it.

What we've measured — 7.1% drift over 48 hours

The headline number from the Q2 audit was that 9% of public MCP endpoints answered correctly on a real initialize + tools/list handshake. The follow-up question was: of the 196 servers that did answer correctly, how stable were their tool lists? We re-probed all 196 servers 48 hours after the first probe, hashed the tool list at both points, and compared.

14 of the 196 servers had a different hash on the second probe. That's 7.1%. Of those 14:

4 were description-only drift (shape #1).
7 were schema-shape drift (shape #2) — six added a parameter, one removed one.
2 were tool-removed drift (shape #3) — one server lost two tools, one lost one.
1 was tool-added drift (shape #4) — a new tool appeared.

7.1% over 48 hours, naively extrapolated to 30 days, is roughly a 50% chance that any given server has drifted at least once over a month. The naive extrapolation is wrong in both directions — most authors don't deploy daily, so drift events cluster around release windows; on the other hand, the 48-hour window we sampled is the window after a registry update, which selects for servers that just deployed, so it overstates the rate for the long-tail of stable-for-six-months servers. The honest reading is "drift on the timescale of weeks is normal; drift on the timescale of hours is rare but real." Neither version is acceptable to ship to a downstream agent without a detection layer between the two.

We will re-run this probe with the Q3 2026 audit. The baseline is now public and replicable — anyone with the same probe routine and the same registry crawl can reproduce the number on their own probe collector.

Why drift is more dangerous than downtime

An MCP server that is fully down generates loud feedback. Every downstream agent fails the call, the error propagates to the agent's error-handler, and within minutes someone notices that the integration is broken. Downtime is self-announcing. The user who asked the agent for something and got an error is the smoke detector.

An MCP server that has drifted does not generate loud feedback. Every downstream agent that calls the now-removed tool gets -32601; every agent that calls the now-mandatory parameter gets -32602. These are not 5xx errors at the transport layer; they are well-formed JSON-RPC error envelopes. Many agents catch -32601 and -32602, log them as recoverable warnings, and fall back to a different tool or a different server. The user gets a slightly worse answer. Nobody pages.

From the author's perspective the consequence is invisible: traffic to the drifted tool drops to zero, but traffic-per-tool isn't a metric most authors are monitoring. The first time an author finds out is when a user file an issue saying "your tool stopped working" — and at that point, the user has been silently routed elsewhere for days or weeks. This is the silent-death-spectrum framing from the Q2 audit: drift is the rightmost end of the spectrum, the failure mode where the server does not announce that it has failed and the user does not know they are on the worse side of the failure.

The defense is to make the server's drift events as observable as its downtime events. A health-check stack that surfaces both — uptime on one panel, drift events on another — is the operational floor for any MCP author who cares about the user-experience side of their server, not just the up/down side.

What to do this week if you ship MCP

Three concrete steps, ordered by cost-of-effort:

Hash your own tool list once, locally. Run tools/list against your production server, run the response array through a canonical-JSON serializer, run that through SHA-256, write the hex digest in your README. The next time you ship, repeat. If the hash changed and your changelog doesn't mention what changed, your changelog is incomplete — that's a signal to audit and document. Cost: five minutes.
If you operate any health check today, store the hash on every probe. One column. Storing it makes drift queryable retroactively, even before you wire alerts on it. Cost: one schema migration. The probe sequence post covers the full check that produces the hash; the storage step is appending one field.
Decide whether you want drift events as a first-class signal. If you're running fewer than three MCPs, your engineering time is worth more than the $9/mo Author tier — drift events arrive in your inbox or your Slack with the full diff attached. If you're running more than three, run them under a hosted probe so the hash storage and the diff rendering are someone else's problem. The comparison with UptimeRobot covers when an HTTP-only probe is still the right choice — the answer for static-HTML endpoints is "yes," the answer for any RPC service is "no, because drift is a real failure mode and HTTP-only probes can't see it."

What we'll cover next

Schema drift was the last of the four blog posts the seven failure modes generated as direct follow-ups (Q2 audit, failure-mode taxonomy, JSON-RPC vs HTTP probes, schema drift). The next planned post is the MCP authentication primer — what the auth-walled 16.8% bucket from the Q2 audit actually tells us about how to publish a private MCP without losing it from public discovery. After that, the Q3 2026 registry audit (mid-July) will re-run all of this against a fresh crawl and report what changed.

If you want a heads-up the first time your own server's tool list hash changes, the easiest path is to claim it on the public dashboard. Free for the public-tier alert; $9/mo if you want Slack or webhook delivery the moment the diff lands.

Join the waitlist