Guide · Reliability

MCP server cold start

Most public MCP servers run on serverless or free-tier PaaS platforms — Vercel, Railway, Render, Fly.io — that spin instances down after a period of inactivity. The first probe after the idle timeout triggers a cold start: the instance allocates, the runtime initializes, and your probe times out waiting. To a monitor, this looks exactly like an outage. It isn't. Here's how to tell them apart, and what to do about it.

TL;DR

Cold-start failures are single-probe timeouts after an idle gap, followed by immediate recovery on the next probe. Real outages are multi-probe failures that don't self-resolve. The monitoring fix: use N=3 consecutive-failure hysteresis (so a single cold-start timeout never fires an alert) and set your probe timeout to at least 15 seconds on serverless endpoints. The server-side fix: use a keep-alive probe, a scheduled ping, or upgrade to a platform tier that doesn't idle out.

Why serverless MCP servers cold-start

Serverless and free-tier PaaS platforms are designed to scale to zero: when no traffic hits your endpoint for a configured idle timeout, the platform deallocates the running instance. The next request (or probe) hits an empty slot and has to wait for a new instance to allocate, the Node/Python/JVM runtime to initialize, and your application code to complete its startup sequence.

Cold-start latency breaks down by platform and runtime:

How cold-start failure looks in a probe log vs. a real outage

The two patterns are visually distinct once you know what to look for:

Cold-start signature:

14:00:00  ✓ tools/list — 312ms
14:01:00  ✓ tools/list — 289ms
...
14:16:00  ✗ tools/list — TIMEOUT (30001ms) ← idle timeout fired, cold start
14:17:00  ✓ tools/list — 8,342ms           ← cold start latency, but succeeded
14:18:00  ✓ tools/list — 301ms             ← warm instance

Real outage signature:

14:00:00  ✓ tools/list — 312ms
14:01:00  ✗ tools/list — TIMEOUT (30001ms)
14:02:00  ✗ initialize — connection refused
14:03:00  ✗ initialize — connection refused
14:04:00  ✗ initialize — connection refused
...

The differences: cold start produces exactly one failed probe (the timeout), followed by a success with elevated latency, then a return to baseline. A real outage produces consecutive failures with consistent error modes (connection refused, TLS failure, or repeated timeouts without recovery).

Monitoring cold-start servers correctly

Three adjustments to your monitoring config make cold-start servers behave correctly in your alert pipeline:

1. Use N=3 consecutive-failure hysteresis

A single failed probe never fires an alert. Require 3 consecutive failed probes before transitioning the server from UP to DOWN. A cold-start timeout is, by definition, exactly one failed probe — it self-resolves on the next probe. With N=3, cold-start failures are invisible in your alert channel while genuine outages (3+ consecutive failures) still fire within 3 minutes.

2. Set probe timeout to 30 seconds for serverless endpoints

A 10-second probe timeout on a Render free-tier server will timeout on every cold start (Render cold starts average 15–30 seconds). Set 30 seconds. The cost: your probe loop takes 30 seconds per probe instead of 10 — on a 60-second probe cadence, that's still two probes per 2-minute window. On a 90-second cadence it's manageable. The benefit: you stop classifying cold starts as failures.

3. Flag the post-idle probe separately

If your last successful probe was more than T minutes ago (matching the platform's known idle timeout), mark the next probe as "post-idle" and exclude it from uptime SLO calculations even if it times out. AliveMCP does this automatically for recognized serverless domains: *.vercel.app, *.railway.app, *.render.com, *.onrender.com, *.fly.dev. The post-idle probe is logged and visible in the probe timeline, but doesn't count as a downtime event unless it's followed by additional failures.

Server-side mitigations

Monitoring config changes help, but the root cause is the idle timeout. If cold-start latency is causing real user-visible latency (not just probe failures), the server-side options are:

Cold start vs. schema drift vs. real outage: the decision tree

When you see a failed probe on a serverless MCP server, run this tree:

  1. Did exactly one probe fail, followed by immediate recovery? → Likely cold start. Check if the recovery probe had elevated latency (>2× baseline). If yes, cold start confirmed. No action needed.
  2. Did multiple consecutive probes fail? → Genuine outage. Fire P1 alert.
  3. Did probes succeed but tools/list changed?Schema drift, not a cold start. P2 alert.
  4. Did the gap between last success and the failed probe match the platform's idle timeout? → Cold start. Check platform dashboard for instance status. If the instance shows as running but probes are failing, escalate to genuine outage investigation.

Related questions

Does AliveMCP's 60-second probe cadence help keep my server warm?

For most platforms with idle timeouts ≥ 5 minutes (Render: 15 min, Railway: 10 min, Fly.io default: 5 min), yes — the 60-second probe interval keeps the instance warm as a side effect. Vercel Functions with a shorter TTL may still cold-start between probes. If keeping the instance warm is a hard requirement, use the server-side mitigations above rather than relying on probe timing.

My server is on Render free tier and shows 94% uptime. Is that cold starts?

Probably. Render's 15-minute idle timeout means 96 potential cold-start events in 24 hours (one per idle cycle). If your probe log shows single-probe timeouts spaced roughly 15 minutes apart followed by immediate recovery, that's cold start. Upgrade to Render Starter ($7/mo) to eliminate idle timeout, or use the keep-alive ping pattern described above.

How does AliveMCP calculate SLO uptime for serverless endpoints?

Cold-start timeouts (single probe, followed by recovery with elevated latency, after a gap matching the platform's idle timeout) are logged as "post-idle probe" events and excluded from SLO calculations. Consecutive failures — regardless of platform — count as downtime. The SLO calculation is transparent: your AliveMCP dashboard shows both the "wall-clock uptime" (all failed probes counted) and the "SLO uptime" (cold-start events excluded).

Further reading