Guide · Reliability

MCP server cold start

Most public MCP servers run on serverless or free-tier PaaS platforms — Vercel, Railway, Render, Fly.io — that spin instances down after a period of inactivity. The first probe after the idle timeout triggers a cold start: the instance allocates, the runtime initializes, and your probe times out waiting. To a monitor, this looks exactly like an outage. It isn't. Here's how to tell them apart, and what to do about it.

TL;DR

Cold-start failures are single-probe timeouts after an idle gap, followed by immediate recovery on the next probe. Real outages are multi-probe failures that don't self-resolve. The monitoring fix: use N=3 consecutive-failure hysteresis (so a single cold-start timeout never fires an alert) and set your probe timeout to at least 15 seconds on serverless endpoints. The server-side fix: use a keep-alive probe, a scheduled ping, or upgrade to a platform tier that doesn't idle out.

Why serverless MCP servers cold-start

Serverless and free-tier PaaS platforms are designed to scale to zero: when no traffic hits your endpoint for a configured idle timeout, the platform deallocates the running instance. The next request (or probe) hits an empty slot and has to wait for a new instance to allocate, the Node/Python/JVM runtime to initialize, and your application code to complete its startup sequence.

Cold-start latency breaks down by platform and runtime:

Vercel Serverless Functions (Node 20): 200–600ms cold start in the same region as the request. Cross-region requests add propagation latency. A typical MCP server startup (import SDK, open SQLite or connect to a KV store, register tools) adds another 200–800ms.
Railway (free tier, shared runner): 3–8 seconds cold start after the 10-minute idle timeout. Shared runner queuing can push this to 15 seconds under load.
Render (free tier): 15-minute idle timeout, 10–30 second cold start. This is the most common source of cold-start probe failures in the MCP ecosystem — Render's free tier is popular for hobby MCPs, and 30-second cold starts routinely exceed standard 10-second probe timeouts.
Fly.io (free tier, Machines): 5-minute idle timeout by default; 1–3 second cold start for pre-built images. Faster than Render, but still enough to trip a 2-second probe timeout.
AWS Lambda (with SnapStart off, JVM runtime): 2–15 seconds. JVM cold starts are the worst-case in the ecosystem.

How cold-start failure looks in a probe log vs. a real outage

The two patterns are visually distinct once you know what to look for:

Cold-start signature:

14:00:00  ✓ tools/list — 312ms
14:01:00  ✓ tools/list — 289ms
...
14:16:00  ✗ tools/list — TIMEOUT (30001ms) ← idle timeout fired, cold start
14:17:00  ✓ tools/list — 8,342ms           ← cold start latency, but succeeded
14:18:00  ✓ tools/list — 301ms             ← warm instance

Real outage signature:

14:00:00  ✓ tools/list — 312ms
14:01:00  ✗ tools/list — TIMEOUT (30001ms)
14:02:00  ✗ initialize — connection refused
14:03:00  ✗ initialize — connection refused
14:04:00  ✗ initialize — connection refused
...

The differences: cold start produces exactly one failed probe (the timeout), followed by a success with elevated latency, then a return to baseline. A real outage produces consecutive failures with consistent error modes (connection refused, TLS failure, or repeated timeouts without recovery).

Monitoring cold-start servers correctly

Three adjustments to your monitoring config make cold-start servers behave correctly in your alert pipeline:

1. Use N=3 consecutive-failure hysteresis

A single failed probe never fires an alert. Require 3 consecutive failed probes before transitioning the server from UP to DOWN. A cold-start timeout is, by definition, exactly one failed probe — it self-resolves on the next probe. With N=3, cold-start failures are invisible in your alert channel while genuine outages (3+ consecutive failures) still fire within 3 minutes.

2. Set probe timeout to 30 seconds for serverless endpoints

A 10-second probe timeout on a Render free-tier server will timeout on every cold start (Render cold starts average 15–30 seconds). Set 30 seconds. The cost: your probe loop takes 30 seconds per probe instead of 10 — on a 60-second probe cadence, that's still two probes per 2-minute window. On a 90-second cadence it's manageable. The benefit: you stop classifying cold starts as failures.

3. Flag the post-idle probe separately

If your last successful probe was more than T minutes ago (matching the platform's known idle timeout), mark the next probe as "post-idle" and exclude it from uptime SLO calculations even if it times out. AliveMCP does this automatically for recognized serverless domains: *.vercel.app, *.railway.app, *.render.com, *.onrender.com, *.fly.dev. The post-idle probe is logged and visible in the probe timeline, but doesn't count as a downtime event unless it's followed by additional failures.

Server-side mitigations

Monitoring config changes help, but the root cause is the idle timeout. If cold-start latency is causing real user-visible latency (not just probe failures), the server-side options are:

Keep-alive ping. Schedule a no-op request to your own endpoint every 5 minutes (below the idle timeout). On Railway and Render, a simple GET to your health route resets the idle timer and keeps the instance warm. Cost: near-zero (one HTTP request per 5 minutes, within all free-tier limits). Limitation: this only works while your scheduler is running — if the scheduler is also on a serverless platform, it may cold-start too.
Use AliveMCP's public probes as your keep-alive. AliveMCP probes every public endpoint every 60 seconds. For servers hosted on platforms with idle timeouts longer than 60 seconds (Render's free tier: 15 minutes, Railway: 10 minutes), the 60-second probe cadence is sufficient to keep instances warm. This is a side effect, not a guarantee — but for the majority of free-tier MCP servers in the registry, AliveMCP's monitoring incidentally prevents idle timeouts.
Upgrade to a non-zero-scale plan. Railway's Starter plan ($5/mo) allows pinning an instance to always-on. Render's Starter plan ($7/mo) disables the idle timeout. Vercel Edge Functions are warmer than Serverless Functions for low-traffic endpoints. The cost is small relative to the uptime improvement.
Use Fly.io with min_machines_running = 1. Fly.io allows you to set the minimum number of running machines per region. With min_machines_running = 1, the instance is never shut down — cold start is eliminated. Fly's free tier includes 3 shared-CPU machines, so one always-on machine is within the free allocation for most hobby MCPs.

Cold start vs. schema drift vs. real outage: the decision tree

When you see a failed probe on a serverless MCP server, run this tree:

Did exactly one probe fail, followed by immediate recovery? → Likely cold start. Check if the recovery probe had elevated latency (>2× baseline). If yes, cold start confirmed. No action needed.
Did multiple consecutive probes fail? → Genuine outage. Fire P1 alert.
Did probes succeed but tools/list changed? → Schema drift, not a cold start. P2 alert.
Did the gap between last success and the failed probe match the platform's idle timeout? → Cold start. Check platform dashboard for instance status. If the instance shows as running but probes are failing, escalate to genuine outage investigation.