Deep dive · 2026-05-01 · Q3 2026 audit pre-work

How We Run the Quarterly MCP Registry Audit

The Q3 2026 MCP registry audit runs in mid-July. It will be the first end-to-end exercise of the full scale stack — collector, archiver, alert router, and operator dashboard — under load, operated by the routines we documented in the small-team-companion arc. This post explains the methodology, what changed since the Q2 baseline, and three new measurement buckets the Q3 run introduces for the first time.

TL;DR

Q2 found 9% of public MCP endpoints healthy across 2,181 servers in six registries. Q3 re-runs the same sweep in mid-July, from all five probe regions in parallel for the first time — a structural upgrade the Q2 methodology couldn't support.
The methodology adds three new failure buckets that Q2 couldn't measure: regionally degraded (healthy from two of five regions, not all five), credentialed-probe degraded (healthy unauthenticated, broken with published demo token), and schema drift confirmed (healthy on initialise, but tool-list hash changed between the three 24-hour-apart probes).
If your MCP is listed on any public registry and you have not verified it recently, now is the time — the audit window opens in ten weeks, and a single afternoon of checking can move your server out of the dead column before the report goes public.

Why we re-run every quarter

The Q2 2026 audit scanned 2,181 remote MCP endpoints across six registries between April 14 and 21. Only 9% returned a valid initialize + tools/list response on all three probe attempts. The other 91% were dead for seven recurring reasons: DNS expired, free-tier hosting reaped, TLS cert lapsed, route moved without a redirect, auth-walled on every tool call, malformed JSON-RPC, or schema-shape violation.

A quarterly cadence matters because the 9% number is meaningless unless it moves. A single snapshot tells you the ecosystem is in poor shape. Three snapshots in a calendar year tell you whether it is getting better, staying flat, or getting worse — and which registries are driving the change. If the Official MCP Registry's 17.2% health rate is rising because the registry team added real-time health gating, that is a different story from a registry growing its listing count while keeping a flat or falling health rate. One data point cannot separate those stories.

We also added something since Q2 that makes the Q3 run structurally different: the full scale stack. The Q2 audit was a single-run scan from one probe region, assembled from raw results. The Q3 audit runs through the same machinery we use for ongoing production monitoring — the multi-tenant probe collector, the shared-state archiver, the per-tenant alert router, and the operator dashboard — all running end-to-end under real registry-scale load for the first time. The audit is not only a data exercise; it is the first stress test of the complete production architecture.

The four-layer audit machinery

The Q3 audit is not a script that runs once and outputs a CSV. It is the same probe-and-verdict pipeline the product runs every 60 seconds for registered tenants, pointed at the entire public registry for a structured audit window. Here is what each layer contributes during the run.

Layer 1 — Multi-tenant probe collector

The multi-tenant collector handles the fan-out. For the Q3 audit, every endpoint we have discovered across MCP.so, Glama, PulseMCP, Smithery, the Official MCP Registry, and GitHub topic feeds is enqueued as a probe job for a single "audit tenant." Workers pull from per-region Redis work queues, with one worker process per endpoint to eliminate noisy-neighbour interference — a slow-responding server cannot block the probe of the one behind it in the queue. The supervisor enforces a 50-second wall-clock cap per probe, SIGKILLs anything that runs long, and writes an audit-log row with CPU, memory, and stdout/stderr byte-count for every kill.

The key difference from an ad-hoc scan: the collector runs all five probe regions in parallel — us-east, us-west, eu-west, ap-southeast, sa-east — and the verdict is sealed only when at least two regions have returned, applying the two-of-N aggregation rule. This surfaces a new failure category the Q2 methodology could not reach: regionally degraded.

The small-team routines for operating the collector at this scale are documented in the collector companion post. During the Q3 audit window the supervisor's queue-depth alert will be calibrated specifically to avoid firing on the expected load spike — per the guidance in that post, the alert uses percentile-and-rate-of-change rather than an absolute threshold, which prevents the audit run from generating a page storm against the on-call rotation every night it runs.

Layer 2 — Shared-state archiver

The collector emits one verdict-minute Redis key per endpoint per probe round. The shared-state archiver drains those keys into a long-term Postgres history table partitioned by month. During the audit window, the archiver writes every verdict to both the per-minute probe table and the daily rollup, so post-audit analysis can see whether an endpoint's verdict was stable across the three 24-hour-apart probes or whether it flapped. Q2 classified endpoints as healthy only if they responded correctly on all three probes; Q3 uses the same three-probe window but records the per-probe verdict, so a server that passed twice and failed once on day two is classified differently from a server that failed all three times.

The archiver also maintains the suppression-cluster materialised view — the cross-endpoint structure that groups failures by error kind, ASN, and registry of origin. During the Q3 audit, that view will tell us whether a spike in dead endpoints on a given day was a registry-wide event driven by one root cause on one ASN, or whether it was distributed individual failure spread across registries. Q2 had no equivalent signal. If a Render.com or Railway outage takes a cluster of hosted MCPs offline on the same day, the suppression-cluster view will surface it in the Q3 data where the Q2 snapshot would have absorbed it silently into the DNS-or-transport-dead bucket.

Layer 3 — Per-tenant alert routing

Alert routing during the audit serves two distinct audiences. For MCP authors who have registered their server with AliveMCP, the audit window is when they will learn whether their server is in the dead or healthy bucket before the report goes public. We run the same multi-tenant paging stack during the audit as during normal monitoring — sink-ownership-verified Slack and webhook channels, per-tenant alert budgets, and the cross-tenant suppression rule that collapses a registry-wide outage into one global notice rather than paging every registered author when a CDN edge fails.

The suppression rule is the component most likely to fire during the audit itself. The Q2 data showed several registries with health rates clustered around a single ASN. If the same pattern repeats in Q3, and more than 10% of our registered servers fail on the same root cause within the same probe minute, the cross-tenant suppression rule will fire — and the audit will capture its first real-world exercise at registry scale. That is data for the report too: we will note in the Q3 analysis how many registry-wide failure events were observed and how many paging-storm scenarios the suppression rule absorbed.

Layer 4 — Operator dashboard

The operator dashboard is how the audit team monitors the run. Every probe verdict, every collector SIGKILL, every archiver watermark advance, and every alert-routing suppression event is surfaced in the dashboard in real time. The audit window is also the first time the dashboard's audit log will record a meaningful volume of operator actions — verdict reviews, threshold adjustments, suppression overrides — all written with the uniform seven-year retention that the architecture requires.

The dashboard's impersonation primitive gets its first practical exercise here. Audit-team operators will impersonate registered tenants to verify that the per-registry health badges in the report look correct from a tenant's perspective before the results are finalised. Every impersonation is bound to a 30-minute hard expiry, a justification field, and a second-approver gate for any read-write action, per the operator companion routines — the same controls that protect tenant data during normal operation apply during the audit window too.

Three new measurement buckets

The Q2 audit had five outcome buckets: Healthy, DNS/transport dead, HTTP alive / MCP dead, Auth-walled on every tool call, and Schema-malformed. Q3 adds three buckets that the Q2 methodology could not populate because the probe infrastructure did not yet support them.

Regionally degraded

An endpoint that responds correctly in us-east and eu-west but times out consistently in ap-southeast is not healthy — but it is not in the same category as an endpoint whose DNS has lapsed. The regionally degraded bucket captures endpoints that pass the two-of-N threshold (answering correctly in at least two of five regions) but fail in at least one region on all three probe attempts. Q2 had no multi-region data, so this failure category was entirely invisible in the results. Based on the multi-region probe deployment work that went into the practical-routine series, we expect this bucket to be non-trivial — somewhere around 3–4% of the servers that appeared healthy in Q2 are likely regionally degraded rather than globally healthy, based on the 3.4%-per-24-hours regional divergence rate we observed across the 196 healthy-bucket endpoints.

The practical consequence for an agent platform pulling registry feeds: a server in the regionally degraded bucket will succeed for users in some geographies and fail silently for users in others. That is a subtler failure than a globally dead server, and it is exactly the class of failure that single-region uptime monitoring — HTTP-based or MCP-protocol-aware — cannot surface.

Credentialed-probe degraded

The Q2 auth-walled bucket captured endpoints where initialize succeeded without credentials but every tool call returned 401 or JSON-RPC -32001. Q3 adds a different and subtler bucket: endpoints where unauthenticated initialize + tools/list succeeds, and the server also has a published demo token in its registry listing, but the authenticated probe with that demo token fails. This is a distinct failure mode — the server is technically healthy to an anonymous prober but broken to the users who have credentials, typically because the demo token expired or was rotated without updating the registry listing. We expect this bucket to be small, but its presence in the Q3 data will say something specific about which registries have a demo-credential freshness problem in their listing metadata.

Schema drift confirmed

The Q2 audit treated schema drift as a separate measurement — 7.1% of healthy servers had their tool-list hash change within 48 hours of the Q2 probe window. In Q3, schema drift becomes a first-class outcome bucket in the registry report. Every endpoint classified as initially healthy gets its tool list hashed on each of the three 24-hour-apart probe rounds. Any endpoint whose canonical-JSON SHA-256 hash changes between rounds is classified as schema drift confirmed rather than simply healthy. This is not necessarily a failure — a server that adds a new tool between probe one and probe two in a backwards-compatible way is still usable — but it is a concrete signal that downstream agents and agent platforms depending on that server should be watching the hash, not assuming stability. Agents that cache the tool list from probe one will be calling a stale schema by probe three.

What the Q3 numbers are likely to show

Three predictions grounded in what continuous monitoring since the Q2 audit and the ecosystem trajectory suggest:

The 91% dead rate will fall, but not dramatically. The ecosystem's net death rate is driven primarily by free-tier hosting reaping and DNS expiry — both of which operate on a timescale of months, not weeks. New registry listings add to the denominator faster than the dead pile shrinks. Our best estimate is a globally healthy rate in the 11–14% range for Q3, with the improvement concentrated in the Official MCP Registry (where the curation team added active health gating after the Q2 report) and Smithery (which added a public health-badge widget that gives authors a visible incentive to verify their listings are alive).

The auth-walled bucket will shrink. The 16.8% auth-walled figure in Q2 was driven heavily by listing-posture mismatch — servers listed as publicly accessible that in practice require credentials nobody issued. Several registries added an explicit auth_required flag to their listing format following the Q2 report, reducing the number of servers that appear public but require auth. We expect 13–15% auth-walled in Q3, with the reduction concentrated in the Official Registry and Smithery where listing metadata is best maintained.

The regionally degraded bucket will be the report's most surprising finding. Nobody expects it because nobody measured it before. If the 3.4%-per-24-hours divergence rate holds at registry scale over the three-day probe window, the regionally degraded bucket will be larger than the schema-malformed bucket from Q2. That shifts the top-line narrative from "dead versus alive" toward "globally available versus regionally spotty" — a more nuanced and actionable framing for both indie authors (who need to know which CDN edge or cloud region is causing divergence) and agent platforms (who need to know whether their geographic user base is experiencing a different health picture than the US-East headline number suggests).

How to get your server into the healthy bucket before July

The audit window opens in mid-July. Ten weeks is enough time to fix almost anything. Three concrete things to do now:

Run the probe against your own server today. The same initialize + tools/list handshake the audit uses takes about thirty seconds from a terminal. See the probe sequence for the exact JSON-RPC request shape. If you see anything other than a valid serverInfo block and a parseable, non-empty tool list, you are in the dead column today.
Check from a second geography, not just your own machine. The Q3 audit will classify servers as regionally degraded that a single-origin probe would call healthy. If your server sits behind a CDN or deploys to a single cloud region, run the same probe from a second location — a $5-per-month VPS in Singapore or London is enough to surface the class of CDN-localisation failure that lands in the regionally-degraded bucket. The curl command is identical; only the source IP changes.
Set up tool-list hash monitoring before the audit window. Schema drift is a first-class Q3 metric. If your tool list changes between the three probe rounds, your server will appear in the schema drift confirmed bucket even if it was healthy on every availability check. AliveMCP monitors the tool-list hash automatically for every registered server on the free tier and sends a diff alert the minute the hash changes — you get notified before the quarterly report does.

The Q3 report ships in July

The report will follow the same format as Q2: full methodology, per-registry breakdown against the Q2 baseline, all five original buckets plus the three new ones, failure-mode taxonomy, and the raw anonymised dataset under a research license. We will add a quarter-over-quarter movement table for registries where the Q2 and Q3 numbers are directly comparable, and the first-ever data on the regionally degraded and schema drift confirmed buckets. If the suppression rule fires during the audit window, we will report the registry-wide failure events it absorbed.

Join the waitlist to receive the numbers the morning they publish — one email, no marketing, no newsletter cadence.

Get the Q3 report when it lands