Guide · Operations

MCP server on-call

On-call for an MCP server means having a clear answer to the question: "when my server goes down at 2am, who gets woken up, what do they do, and what's the escalation path if they don't respond?" Most MCP authors have never formally answered that question — alerts go to a shared email inbox, nothing is documented, and the MTTR for after-hours incidents is "whenever someone notices in the morning." This guide gives you a right-sized on-call structure at every stage: solo indie dev, two-person side project, and small team with paid SLAs.

TL;DR

Right-size your on-call before setting up alerts. Solo indie MCP authors: don't do 24/7 on-call — accept high after-hours MTTD, invest in automatic restart to minimize MTTR instead, and set P1 alerts to push notifications for only the highest-severity events. Two-person teams: informal rotation, one person primary per week. Five-plus-person team with paid SLAs: formalize in PagerDuty or Opsgenie with escalation policies. All tiers benefit from cold-start suppression (avoid false 3am pages), severity-based routing (P3 never pages anyone), and a weekly rotation handoff checklist.

Right-sizing on-call by team size

Solo indie MCP author

Running a 24/7 on-call rotation as a solo developer is a fast path to burnout. A solo MCP server author has no one to escalate to and no one to rotate with. The correct approach: accept that after-hours MTTD will be high (hours, not minutes) and compensate by reducing the impact of high MTTD. Specifically:

The goal isn't zero-downtime nights — it's sustainable operations. A server that's down for 4 hours overnight once a month and restarts automatically is far less painful than a false-positive push notification at 3am every week.

Two-person team

With two engineers, an informal weekly rotation becomes viable. One person is "primary on-call" Sunday through Saturday; the other is secondary. The primary gets P1 push notifications; the secondary only gets notified if the primary doesn't acknowledge within 15 minutes (configurable in PagerDuty, Opsgenie, or AliveMCP's Team tier escalation policy).

This rotation halves the on-call burden per person: each engineer is primary for roughly 26 weeks per year instead of 52. More importantly, it creates a documented handoff pattern that you can scale to larger rotations later.

Five-plus-person team with paid SLAs

When you're charging $49/mo+ Team tier users for private monitoring and SLA guarantees, you need a formal on-call rotation with escalation policies, acknowledgment time commitments, and incident post-mortems. At this scale:

Escalation policy design

An escalation policy answers three questions: who gets notified, how, and in what order if the previous level doesn't respond?

Tier mapping for MCP servers

Map failure severity to escalation tiers using the protocol layer where the failure occurs:

See MCP server downtime alerting for the full tier-to-layer mapping with false-positive probability math.

Escalation sequence

A typical P1 escalation sequence with 5-minute intervals:

Escalation intervals are a judgment call. Compress the intervals for commercial SLAs (customers are impacted now); expand them for internal tools where brief after-hours downtime is acceptable.

Alert fatigue and how to prevent it

Alert fatigue happens when engineers receive so many alerts — especially false positives and low-priority noise — that they start ignoring them. Preventable causes in MCP server monitoring:

Cold-start false positives

Serverless MCP servers (Vercel, Railway, Render, Fly.io) take 3–30 seconds to cold-start from an idle state. If your probe fires immediately after the first request following an idle period, it may exceed your timeout and generate a P1 alert that self-resolves in the next probe cycle. This trains engineers to ignore P1 alerts ("it's probably just a cold start"). Mitigations:

See MCP server cold start for cold-start benchmarks and the platform-specific idle detection thresholds.

Flapping alerts

Flapping happens when a server alternates between passing and failing on consecutive probes. Each state change triggers a new alert: downtime, recovery, downtime, recovery. The on-call inbox fills with alert-recovery pairs for a server that's technically available 50% of the time. Configure a minimum-stable-duration requirement: only send a recovery alert after N consecutive passing probes (same logic as the downtime confirmation window). And only send a new downtime alert if the server has been healthy for at least M minutes since the last recovery. See MCP server flapping.

Maintenance window suppression

Planned deployments generate alerts if your monitoring doesn't know a deployment is in progress. Before any deployment, set a maintenance window in your monitoring configuration. AliveMCP pauses alerting for the window duration while continuing to probe — you can see whether the deployment succeeded without being paged for the expected restart downtime. Cap maintenance windows at 4 hours; if a deployment takes longer than 4 hours, something is wrong and you want to be notified.

On-call tooling recommendations

Tool choice by team size:

Avoid using Slack as your primary on-call tool for P1 alerts. Slack is excellent for P2/P3 notification channels but unreliable for wake-up alerts — Slack mobile notifications have delivery delays, are silenced by Do Not Disturb settings, and have no escalation or acknowledgment tracking built in.

On-call handoff checklist

At the start of each on-call shift, the incoming engineer needs to know:

A two-minute handoff call or async message covering these five points reduces the incoming engineer's ramp-up time from "rediscovering context under pressure at 3am" to "executing a runbook with full context already loaded."

Related questions

Do I need on-call for an MCP server with zero paying users?

No — you need monitoring so you know when the server is down, but you don't need to wake yourself up about it. Configure email or Slack alerts for all severity levels and review them when you're working. Invest in automatic restart (systemd/Docker restart policy) so most crash-induced outages self-resolve without human involvement. When you have paying users with uptime expectations, that's when you add push/SMS for P1 events. Until then, morning review + auto-restart gives you the operational visibility without the burnout risk of pseudo-on-call with no real business need.

How do I know when to add a second person to my on-call rotation?

Two signals: (1) you've been woken up at least twice in a quarter for incidents that required human intervention (not auto-resolved crashes); (2) you're making operational compromises — delaying deployments to avoid being on-call alone, or skipping weekend travel because you're the only one who can respond. Either signal means the current one-person coverage is creating operational risk. Adding a second engineer halves the paging burden and eliminates the single-point-of-failure in your incident response. Even if the second engineer is a co-founder with limited MCP knowledge, having someone to escalate to during an incident is more valuable than having them know every runbook.

What's a reasonable P1 response time target for an indie MCP server?

During business hours: <15 minutes. After hours: accept that response will be slower — 30–60 minutes is realistic for a solo developer without 24/7 on-call coverage. Be transparent with users about after-hours response time expectations. If your commercial SLA requires 5-minute P1 response 24/7, you need at least two engineers in different time zones to cover the rotation sustainably.

Further reading