Guide · Reliability
MCP server cold start
Most public MCP servers run on serverless or free-tier PaaS platforms — Vercel, Railway, Render, Fly.io — that spin instances down after a period of inactivity. The first probe after the idle timeout triggers a cold start: the instance allocates, the runtime initializes, and your probe times out waiting. To a monitor, this looks exactly like an outage. It isn't. Here's how to tell them apart, and what to do about it.
TL;DR
Cold-start failures are single-probe timeouts after an idle gap, followed by immediate recovery on the next probe. Real outages are multi-probe failures that don't self-resolve. The monitoring fix: use N=3 consecutive-failure hysteresis (so a single cold-start timeout never fires an alert) and set your probe timeout to at least 15 seconds on serverless endpoints. The server-side fix: use a keep-alive probe, a scheduled ping, or upgrade to a platform tier that doesn't idle out.
Why serverless MCP servers cold-start
Serverless and free-tier PaaS platforms are designed to scale to zero: when no traffic hits your endpoint for a configured idle timeout, the platform deallocates the running instance. The next request (or probe) hits an empty slot and has to wait for a new instance to allocate, the Node/Python/JVM runtime to initialize, and your application code to complete its startup sequence.
Cold-start latency breaks down by platform and runtime:
- Vercel Serverless Functions (Node 20): 200–600ms cold start in the same region as the request. Cross-region requests add propagation latency. A typical MCP server startup (import SDK, open SQLite or connect to a KV store, register tools) adds another 200–800ms.
- Railway (free tier, shared runner): 3–8 seconds cold start after the 10-minute idle timeout. Shared runner queuing can push this to 15 seconds under load.
- Render (free tier): 15-minute idle timeout, 10–30 second cold start. This is the most common source of cold-start probe failures in the MCP ecosystem — Render's free tier is popular for hobby MCPs, and 30-second cold starts routinely exceed standard 10-second probe timeouts.
- Fly.io (free tier, Machines): 5-minute idle timeout by default; 1–3 second cold start for pre-built images. Faster than Render, but still enough to trip a 2-second probe timeout.
- AWS Lambda (with SnapStart off, JVM runtime): 2–15 seconds. JVM cold starts are the worst-case in the ecosystem.
How cold-start failure looks in a probe log vs. a real outage
The two patterns are visually distinct once you know what to look for:
Cold-start signature:
14:00:00 ✓ tools/list — 312ms
14:01:00 ✓ tools/list — 289ms
...
14:16:00 ✗ tools/list — TIMEOUT (30001ms) ← idle timeout fired, cold start
14:17:00 ✓ tools/list — 8,342ms ← cold start latency, but succeeded
14:18:00 ✓ tools/list — 301ms ← warm instance
Real outage signature:
14:00:00 ✓ tools/list — 312ms
14:01:00 ✗ tools/list — TIMEOUT (30001ms)
14:02:00 ✗ initialize — connection refused
14:03:00 ✗ initialize — connection refused
14:04:00 ✗ initialize — connection refused
...
The differences: cold start produces exactly one failed probe (the timeout), followed by a success with elevated latency, then a return to baseline. A real outage produces consecutive failures with consistent error modes (connection refused, TLS failure, or repeated timeouts without recovery).
Monitoring cold-start servers correctly
Three adjustments to your monitoring config make cold-start servers behave correctly in your alert pipeline:
1. Use N=3 consecutive-failure hysteresis
A single failed probe never fires an alert. Require 3 consecutive failed probes before transitioning the server from UP to DOWN. A cold-start timeout is, by definition, exactly one failed probe — it self-resolves on the next probe. With N=3, cold-start failures are invisible in your alert channel while genuine outages (3+ consecutive failures) still fire within 3 minutes.
2. Set probe timeout to 30 seconds for serverless endpoints
A 10-second probe timeout on a Render free-tier server will timeout on every cold start (Render cold starts average 15–30 seconds). Set 30 seconds. The cost: your probe loop takes 30 seconds per probe instead of 10 — on a 60-second probe cadence, that's still two probes per 2-minute window. On a 90-second cadence it's manageable. The benefit: you stop classifying cold starts as failures.
3. Flag the post-idle probe separately
If your last successful probe was more than T minutes ago (matching the platform's known idle timeout), mark the next probe as "post-idle" and exclude it from uptime SLO calculations even if it times out. AliveMCP does this automatically for recognized serverless domains: *.vercel.app, *.railway.app, *.render.com, *.onrender.com, *.fly.dev. The post-idle probe is logged and visible in the probe timeline, but doesn't count as a downtime event unless it's followed by additional failures.
Server-side mitigations
Monitoring config changes help, but the root cause is the idle timeout. If cold-start latency is causing real user-visible latency (not just probe failures), the server-side options are:
- Keep-alive ping. Schedule a no-op request to your own endpoint every 5 minutes (below the idle timeout). On Railway and Render, a simple GET to your health route resets the idle timer and keeps the instance warm. Cost: near-zero (one HTTP request per 5 minutes, within all free-tier limits). Limitation: this only works while your scheduler is running — if the scheduler is also on a serverless platform, it may cold-start too.
- Use AliveMCP's public probes as your keep-alive. AliveMCP probes every public endpoint every 60 seconds. For servers hosted on platforms with idle timeouts longer than 60 seconds (Render's free tier: 15 minutes, Railway: 10 minutes), the 60-second probe cadence is sufficient to keep instances warm. This is a side effect, not a guarantee — but for the majority of free-tier MCP servers in the registry, AliveMCP's monitoring incidentally prevents idle timeouts.
- Upgrade to a non-zero-scale plan. Railway's Starter plan ($5/mo) allows pinning an instance to always-on. Render's Starter plan ($7/mo) disables the idle timeout. Vercel Edge Functions are warmer than Serverless Functions for low-traffic endpoints. The cost is small relative to the uptime improvement.
- Use Fly.io with
min_machines_running = 1. Fly.io allows you to set the minimum number of running machines per region. Withmin_machines_running = 1, the instance is never shut down — cold start is eliminated. Fly's free tier includes 3 shared-CPU machines, so one always-on machine is within the free allocation for most hobby MCPs.
Cold start vs. schema drift vs. real outage: the decision tree
When you see a failed probe on a serverless MCP server, run this tree:
- Did exactly one probe fail, followed by immediate recovery? → Likely cold start. Check if the recovery probe had elevated latency (>2× baseline). If yes, cold start confirmed. No action needed.
- Did multiple consecutive probes fail? → Genuine outage. Fire P1 alert.
- Did probes succeed but tools/list changed? → Schema drift, not a cold start. P2 alert.
- Did the gap between last success and the failed probe match the platform's idle timeout? → Cold start. Check platform dashboard for instance status. If the instance shows as running but probes are failing, escalate to genuine outage investigation.
Related questions
Does AliveMCP's 60-second probe cadence help keep my server warm?
For most platforms with idle timeouts ≥ 5 minutes (Render: 15 min, Railway: 10 min, Fly.io default: 5 min), yes — the 60-second probe interval keeps the instance warm as a side effect. Vercel Functions with a shorter TTL may still cold-start between probes. If keeping the instance warm is a hard requirement, use the server-side mitigations above rather than relying on probe timing.
My server is on Render free tier and shows 94% uptime. Is that cold starts?
Probably. Render's 15-minute idle timeout means 96 potential cold-start events in 24 hours (one per idle cycle). If your probe log shows single-probe timeouts spaced roughly 15 minutes apart followed by immediate recovery, that's cold start. Upgrade to Render Starter ($7/mo) to eliminate idle timeout, or use the keep-alive ping pattern described above.
How does AliveMCP calculate SLO uptime for serverless endpoints?
Cold-start timeouts (single probe, followed by recovery with elevated latency, after a gap matching the platform's idle timeout) are logged as "post-idle probe" events and excluded from SLO calculations. Consecutive failures — regardless of platform — count as downtime. The SLO calculation is transparent: your AliveMCP dashboard shows both the "wall-clock uptime" (all failed probes counted) and the "SLO uptime" (cold-start events excluded).