Guide · Deployment
MCP server deployment
Deploying an MCP server differs from deploying a REST API in two ways that matter: the session-scoped connection model means rolling restarts can terminate active sessions mid-call, and the protocol handshake means your load balancer's TCP health check is insufficient — you need a readiness probe that completes the full initialize sequence before traffic is routed to a new instance.
TL;DR
Use HTTP/SSE transport for any deployment that runs behind a reverse proxy, load balancer, or container orchestrator — stdio only works when the client spawns the server process directly. Write a readiness probe that sends initialize and verifies the protocolVersion response before the instance receives live traffic. For zero-session-loss deploys, prefer blue-green over rolling. After every deploy, run a full post-deploy check: initialize → tools/list → hash comparison with the pre-deploy snapshot. AliveMCP monitors this sequence for you every 60 seconds once you're live.
Choosing your transport: stdio vs HTTP/SSE
The MCP specification supports two transports: stdio and HTTP/SSE. They have fundamentally different deployment models.
stdio transport is the right choice when:
- The client (Claude Desktop, a local agent script) spawns your server process directly.
- The server runs on the same machine as the client — no network hop.
- You need zero infrastructure: no port, no firewall rule, no reverse proxy.
With stdio, the client forks your process, writes JSON-RPC messages to stdin, and reads responses from stdout. The connection is a pipe, not a socket. This means there's nothing to deploy in the traditional sense — the server is installed as a binary or npm package and the client configuration points at it.
HTTP/SSE transport is required when:
- The client connects over a network — the server lives on a different host or in a container.
- Multiple clients need to connect to the same server instance.
- The server is deployed behind a reverse proxy, load balancer, or container orchestrator (Docker, Kubernetes, Fly.io, Railway).
- You want uptime monitoring — external probes can only reach HTTP endpoints, not stdio pipes.
HTTP/SSE uses two endpoints: a POST endpoint for client-to-server messages and an SSE stream for server-to-client events. The connection is established via HTTP and held open for the duration of the session.
If you're deploying to anything beyond a developer's local machine, use HTTP/SSE. All monitoring tools, including AliveMCP, require HTTP/SSE to probe your server.
The three probe types: startup, readiness, liveness
A standard HTTP service health check sends a GET to /health and expects a 200. This is insufficient for MCP servers because a 200 from /health tells you only that the HTTP layer is up — it says nothing about whether the MCP protocol handshake succeeds, which is the actual thing clients care about.
Startup probe: runs once at process start. The goal is to confirm that initialization is complete before the process accepts traffic. For an MCP server this means: connect, send initialize, verify protocolVersion in the response, send tools/list, verify the response is non-empty. Only after this succeeds should the startup probe return success. Until it does, no traffic should be routed to the instance.
Readiness probe: runs periodically while the instance is live. When the readiness probe fails, the orchestrator stops routing new sessions to this instance but does not kill it — existing sessions continue. A failing readiness probe usually indicates temporary overload or a downstream dependency failure. The probe should run the same initialize + tools/list sequence as the startup probe, with a timeout of 5–10 seconds.
Liveness probe: runs periodically. When it fails, the orchestrator kills and restarts the process. Reserve liveness for detecting true deadlock or unrecoverable state — not transient failures. A liveness probe that's too aggressive kills sessions that would have recovered. A reasonable liveness probe sends tools/list to an already-initialized internal connection; if no response within 30 seconds, assume deadlock.
# Example readiness probe script (run by orchestrator)
#!/bin/sh
# Send initialize, check for protocolVersion in response
RESPONSE=$(curl -s -X POST https://localhost:3000/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"probe","version":"1.0"}}}' \
--max-time 10)
echo "$RESPONSE" | grep -q '"protocolVersion"' && exit 0
exit 1
See MCP server health checks for the full sequence including schema-drift detection.
Environment variables and secrets management
MCP servers often need credentials for the tools they expose — API keys, database connection strings, OAuth tokens. The correct pattern is always environment variables injected at runtime, never baked into the image or committed to the repository.
Deployment-time variable management by platform:
- Fly.io:
fly secrets set MY_API_KEY=xxx— injected as environment variables at container start. Never visible in logs or the Fly dashboard after setting. - Railway: Variables tab in the service configuration. Values are encrypted at rest and injected at deploy time.
- Docker Compose:
.envfile withenv_file:directive in compose.yml. Never commit.env— add it to.gitignore. - Kubernetes:
Secretobjects referenced viaenvFrom. Use a secrets manager (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) rather than base64-encoded Secrets for sensitive values.
Secrets that change (OAuth refresh tokens, rotating API keys) need a reload mechanism. Design your server to re-read environment variables on SIGHUP, or implement a hot-reload endpoint that replaces the in-process credential without a full restart — which would terminate active sessions.
Rolling vs blue-green deploys
Rolling deploys replace instances one at a time. For stateless HTTP services, this is low-risk: a request that arrives at a draining old instance is retried against a new one. MCP sessions are not retryable — the client has already sent initialize, negotiated capabilities, and begun a tool sequence. Terminating an instance mid-session loses the session.
The safe rolling deploy process for MCP servers:
- Stop routing new sessions to the instance being replaced (
kubectl drain, Fly.iosuspend, etc.). - Wait for in-flight sessions to complete. Set a session drain timeout — 60 to 300 seconds depending on typical session duration. After the timeout, forcibly terminate.
- Kill the old instance and start the new one.
- Wait for the startup probe to succeed on the new instance before routing traffic.
- Repeat for the next instance.
Blue-green deploys are safer for session-heavy MCP servers. You deploy the new version in parallel (green), run full verification against it, then switch the load balancer to route new sessions to green. Old sessions on blue drain naturally. No active session is terminated. The tradeoff is double the infrastructure cost during the transition window — typically 5–15 minutes.
For most indie MCP server authors, the practical answer is: deploy during low-traffic periods, use a short rolling drain window (30 seconds), and accept that a small number of sessions will be terminated. Monitor session termination events in your server logs. If termination rate is unacceptably high, invest in blue-green.
See MCP server reliability for MTTR engineering and the deploy-failure failure mode.
Post-deploy verification
After every deploy, run a full verification sequence before marking the deploy complete:
- Initialize handshake: connect to the production URL, send
initialize, verifyprotocolVersionmatches expected. A version mismatch usually means the wrong image was deployed. - Tools list hash: send
tools/list, compute a SHA-256 of the sorted tools names + schemas. Compare against the pre-deploy snapshot. A drift means a tool was added, removed, or its schema changed — which may break clients that depend on the previous schema. See schema drift in MCP tool definitions. - Tool invocation smoke test: invoke one lightweight tool with known inputs and verify the response structure. This catches cases where the tool's implementation code fails to load even though the tools/list registration succeeded.
- Latency baseline: compare the
initializeresponse time against the pre-deploy P95. A new dependency or regression in the initialization path often shows up first as increased startup latency. See MCP server latency for acceptable baselines.
Automate this check in CI — run it as a post-deploy step against the production URL before the deploy pipeline marks success. If any check fails, roll back.
Containerized deployment quick reference
For authors deploying their first MCP server to a container host:
- Fly.io:
fly launchfrom the project root. Fly detects the Dockerfile, provisions an instance, and handles TLS. Setinternal_port = 3000andforce_https = trueinfly.toml. Add a health check that hits your MCP initialization endpoint. - Railway: Connect your GitHub repo, set environment variables in the Variables tab, set the start command to
node index.js. Railway provides a public HTTPS URL automatically. - Docker + VPS: Build the image, push to a registry, pull on the VPS, run with
docker run --env-file .env -p 3000:3000. Use Caddy as a reverse proxy for automatic TLS. See MCP server Docker setup for the full Dockerfile. - Kubernetes: Use a Deployment with 2+ replicas, an HTTP readiness probe against your initialize endpoint, and a pod disruption budget of 1 max unavailable. See MCP server on Kubernetes.
All of these require an HTTP/SSE transport implementation. If your server only supports stdio today, adding HTTP/SSE is the first step before any container deployment.
Monitoring after deployment
A deployed server that no one is watching is an unmonitored server. AliveMCP runs the full initialize → tools/list probe sequence against your server every 60 seconds. If the server goes down, returns a broken response, or drifts in its tool schema, AliveMCP alerts you before your users notice.
Public MCP servers listed in the major registries (MCP.so, Glama, PulseMCP, Smithery) are automatically included in the AliveMCP public dashboard — no sign-up required. For private endpoints, custom alert webhooks, and SLA reports, see Author and Team tiers. See MCP server uptime monitoring for what to expect after you're wired up.
Related questions
Can I deploy an MCP server to a serverless platform?
Yes, with caveats. Serverless platforms (AWS Lambda, Vercel Functions, Cloudflare Workers) support HTTP/SSE but introduce cold starts and strict execution time limits. An MCP session that involves a slow tool call can exceed the function timeout, terminating the session mid-call. Cold starts add latency to the initialize step — often 500ms to 2s for Node.js. For lightweight, latency-tolerant MCP servers, serverless is fine. For servers with long-running tools or strict latency requirements, use a persistent process. See MCP server cold starts.
How do I handle zero-downtime deploys?
Blue-green is the safest pattern: spin up the new version in parallel, verify it with a full post-deploy check, switch the load balancer. For rolling deploys, implement a drain period: stop accepting new sessions on the instance being replaced, wait for active sessions to finish (up to your configured drain timeout), then terminate. Most platforms support drain via a SIGTERM handler — catch SIGTERM, set a flag to reject new sessions, wait for active sessions to complete with a timeout, then exit.
What happens if a deploy fails mid-session?
The client loses its session and must reconnect. The MCP protocol doesn't include session resumption — reconnecting starts a fresh initialize sequence. From the user's perspective, their agent tool call fails with a connection error. Design your deploy process to minimize this: drain sessions before replacing instances, deploy during low-traffic windows, and use blue-green for session-heavy servers. Monitor session termination events to understand the real impact of your deploy strategy.
Should I run multiple instances for availability?
Yes, if you have paid users or SLA commitments. A single instance means one hardware failure or one bad deploy takes your server offline. Two instances with a load balancer gives you a rolling-deploy path and basic redundancy. If your server maintains per-session state beyond the MCP protocol (e.g., in-memory context for tool chains), session affinity at the load balancer level prevents requests from being routed to a new instance that lacks the state. See multi-region MCP deployment for geographic distribution.
Further reading
- MCP server Docker — Dockerfile, signal handling, and HEALTHCHECK
- MCP server on Kubernetes — readiness probes, PDBs, and session affinity
- MCP server testing — protocol compliance, schema snapshots, and CI integration
- MCP server health checks — the full initialize probe sequence
- MCP server reliability — MTTD and MTTR targets
- Schema drift in MCP tool definitions — detection and rollback
- AliveMCP — post-deploy monitoring that runs the initialize probe every 60 seconds