Guide · Operations

MCP server on-call

On-call for an MCP server means having a clear answer to the question: "when my server goes down at 2am, who gets woken up, what do they do, and what's the escalation path if they don't respond?" Most MCP authors have never formally answered that question — alerts go to a shared email inbox, nothing is documented, and the MTTR for after-hours incidents is "whenever someone notices in the morning." This guide gives you a right-sized on-call structure at every stage: solo indie dev, two-person side project, and small team with paid SLAs.

TL;DR

Right-size your on-call before setting up alerts. Solo indie MCP authors: don't do 24/7 on-call — accept high after-hours MTTD, invest in automatic restart to minimize MTTR instead, and set P1 alerts to push notifications for only the highest-severity events. Two-person teams: informal rotation, one person primary per week. Five-plus-person team with paid SLAs: formalize in PagerDuty or Opsgenie with escalation policies. All tiers benefit from cold-start suppression (avoid false 3am pages), severity-based routing (P3 never pages anyone), and a weekly rotation handoff checklist.

Right-sizing on-call by team size

Solo indie MCP author

Running a 24/7 on-call rotation as a solo developer is a fast path to burnout. A solo MCP server author has no one to escalate to and no one to rotate with. The correct approach: accept that after-hours MTTD will be high (hours, not minutes) and compensate by reducing the impact of high MTTD. Specifically:

Automatic restart: configure Restart=on-failure in systemd, --restart=on-failure:5 in Docker, or the platform-equivalent. Most crash-induced outages self-resolve in under 10 seconds without any human involvement, making after-hours MTTR a non-issue for the most common failure mode.
P1-only push notifications: only send push/SMS for transport-layer failures that can't self-heal. Initialize and tools/list failures often resolve after a restart — configure these as P2 (Slack) or P3 (email) rather than P1 (push). You should only be woken up for failures that require human intervention.
Morning review cadence: check alert history each morning. AliveMCP's Author tier dashboard shows every probe failure from the past 24 hours so you can review overnight events without being paged for each one.

The goal isn't zero-downtime nights — it's sustainable operations. A server that's down for 4 hours overnight once a month and restarts automatically is far less painful than a false-positive push notification at 3am every week.

Two-person team

With two engineers, an informal weekly rotation becomes viable. One person is "primary on-call" Sunday through Saturday; the other is secondary. The primary gets P1 push notifications; the secondary only gets notified if the primary doesn't acknowledge within 15 minutes (configurable in PagerDuty, Opsgenie, or AliveMCP's Team tier escalation policy).

This rotation halves the on-call burden per person: each engineer is primary for roughly 26 weeks per year instead of 52. More importantly, it creates a documented handoff pattern that you can scale to larger rotations later.

Five-plus-person team with paid SLAs

When you're charging $49/mo+ Team tier users for private monitoring and SLA guarantees, you need a formal on-call rotation with escalation policies, acknowledgment time commitments, and incident post-mortems. At this scale:

Use PagerDuty, Opsgenie, or a similar dedicated on-call tool. The overhead of managing schedules, escalations, and acknowledgment SLAs in a general Slack channel doesn't scale past five people.
Define acknowledgment time targets per severity: P1 <5 minutes, P2 <30 minutes, P3 next business day. These become the basis for your internal SLA to customers.
Run monthly incident rehearsal: a table-top or live-fire drill where someone triggers a test alert and the on-call engineer works the incident end-to-end against documented runbooks. The bottleneck in most incident responses is finding the right information under pressure — rehearsal surfaces documentation gaps before they cost you during a real incident.

Escalation policy design

An escalation policy answers three questions: who gets notified, how, and in what order if the previous level doesn't respond?

Tier mapping for MCP servers

Map failure severity to escalation tiers using the protocol layer where the failure occurs:

P1 (immediate page): transport-layer failure (TCP refused, TLS handshake failure, DNS failure) or complete HTTP failure (connection reset, 5xx on every request). The server is unreachable entirely. Notification path: push notification to on-call engineer, SMS if unacknowledged after 5 minutes, secondary on-call after 15 minutes.
P2 (Slack notification, 30-minute escalation window): initialize failure (HTTP/transport healthy but JSON-RPC handshake fails) or tools/list failure (initialize succeeds but tool definitions can't be fetched). The server is reachable but broken at the MCP protocol layer. Notification path: Slack post to #mcp-alerts, escalate to P1 if unacknowledged after 30 minutes.
P3 (async, next-day review): SLO warning (error budget burn rate elevated but not exhausted), latency degradation (p95 above threshold but not causing failures), schema drift detected. Notification path: email digest, logged for morning review. No paging.

See MCP server downtime alerting for the full tier-to-layer mapping with false-positive probability math.

Escalation sequence

A typical P1 escalation sequence with 5-minute intervals:

T+0: Push notification to primary on-call engineer.
T+5: SMS to primary on-call (in case push wasn't seen).
T+15: Push + SMS to secondary on-call. Primary's alert remains open.
T+30: Notify engineering manager or team lead. Both primary and secondary alerts remain open.
T+60: If still unacknowledged, this is a process failure — the post-mortem should address on-call coverage gaps, not just the technical incident.

Escalation intervals are a judgment call. Compress the intervals for commercial SLAs (customers are impacted now); expand them for internal tools where brief after-hours downtime is acceptable.

Alert fatigue and how to prevent it

Alert fatigue happens when engineers receive so many alerts — especially false positives and low-priority noise — that they start ignoring them. Preventable causes in MCP server monitoring:

Cold-start false positives

Serverless MCP servers (Vercel, Railway, Render, Fly.io) take 3–30 seconds to cold-start from an idle state. If your probe fires immediately after the first request following an idle period, it may exceed your timeout and generate a P1 alert that self-resolves in the next probe cycle. This trains engineers to ignore P1 alerts ("it's probably just a cold start"). Mitigations:

Use an N=3 consecutive-failure confirmation window — only alert after 3 consecutive probe failures. At 60-second cadence, this means 3 minutes of confirmed downtime before any alert fires. A cold-start resolves in seconds, not minutes, so it never completes the confirmation window.
Configure a post-idle probe suppression flag: AliveMCP suppresses the first probe after an idle period exceeding your cold-start duration for recognized serverless platforms.

See MCP server cold start for cold-start benchmarks and the platform-specific idle detection thresholds.

Flapping alerts

Flapping happens when a server alternates between passing and failing on consecutive probes. Each state change triggers a new alert: downtime, recovery, downtime, recovery. The on-call inbox fills with alert-recovery pairs for a server that's technically available 50% of the time. Configure a minimum-stable-duration requirement: only send a recovery alert after N consecutive passing probes (same logic as the downtime confirmation window). And only send a new downtime alert if the server has been healthy for at least M minutes since the last recovery. See MCP server flapping.

Maintenance window suppression

Planned deployments generate alerts if your monitoring doesn't know a deployment is in progress. Before any deployment, set a maintenance window in your monitoring configuration. AliveMCP pauses alerting for the window duration while continuing to probe — you can see whether the deployment succeeded without being paged for the expected restart downtime. Cap maintenance windows at 4 hours; if a deployment takes longer than 4 hours, something is wrong and you want to be notified.

On-call tooling recommendations

Tool choice by team size:

Solo: AliveMCP Author tier direct push via webhook → mobile notification, or a free PagerDuty single-user account. No rotation tooling needed.
2–4 people: PagerDuty free tier (up to 5 users) handles basic rotation and escalation policies. Or a self-hosted alternative like Grafana OnCall (open source). AliveMCP Team tier webhook routes to PagerDuty automatically.
5+ people: PagerDuty Growth/Business or Opsgenie. The cost is justified by schedule management, on-call compensation tracking, and stakeholder communication features. Integrate AliveMCP via webhook to route MCP-specific alerts into your existing incident management flow.

Avoid using Slack as your primary on-call tool for P1 alerts. Slack is excellent for P2/P3 notification channels but unreliable for wake-up alerts — Slack mobile notifications have delivery delays, are silenced by Do Not Disturb settings, and have no escalation or acknowledgment tracking built in.

On-call handoff checklist

At the start of each on-call shift, the incoming engineer needs to know:

Current server health: any open incidents or known degraded states from the previous shift.
Scheduled deployments: any deployments or maintenance windows planned for the shift, with expected timing.
Unresolved alerts from prior shift: P3 items logged for follow-up that weren't actioned yet.
Access verification: confirm SSH access, platform dashboard credentials, runbook location, and PagerDuty acknowledgment app installed on phone.
Recent incidents: any incidents from the past 7 days worth flagging, especially if they indicate a recurring pattern that hasn't been root-caused yet.

A two-minute handoff call or async message covering these five points reduces the incoming engineer's ramp-up time from "rediscovering context under pressure at 3am" to "executing a runbook with full context already loaded."