Guide · Enterprise Security

Enterprise MCP server deployment

A single-instance MCP server is fine for a personal project. An enterprise team running internal MCP servers as part of agent pipelines used by hundreds of engineers has different requirements: planned maintenance must not interrupt running agent sessions; a schema change to an existing tool must not silently break agents that cached the previous tool list; an availability zone failure must not take down the entire MCP surface. These aren't theoretical risks — they're the categories of incidents that enterprise teams discover in their first production quarter. This guide covers the deployment patterns that prevent them.

TL;DR

Run at least two MCP server replicas behind a load balancer with health-check gates — use HTTP probes, not TCP probes, because a TCP-open port doesn't mean the MCP initialize handshake will succeed. Use blue-green deployments for schema changes: spin up the new version alongside the old, verify it with a protocol probe, then shift traffic. Add AliveMCP monitoring immediately after deployment to verify the new version is serving correctly from outside your network — internal health checks miss network-layer failures that external users hit first.

High availability: replicas and health-check gates

The minimum HA configuration for a production MCP server is two replicas in separate availability zones with a health-checking load balancer. The critical detail: configure the load balancer health check to probe the MCP protocol endpoint, not just the HTTP port.

# Nginx upstream with MCP-aware health check (Nginx Plus / OpenResty)
upstream mcp_search {
    zone mcp_search 64k;
    server mcp-search-1.internal:8080 max_fails=2 fail_timeout=10s;
    server mcp-search-2.internal:8080 max_fails=2 fail_timeout=10s;
    keepalive 32;
}

# Custom health check: send MCP initialize request
# (Nginx Plus feature — community Nginx uses passive checks only)
match mcp_health {
    send "POST /mcp HTTP/1.1\r\nHost: mcp-search.internal\r\nContent-Type: application/json\r\nContent-Length: 77\r\n\r\n{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2024-11-05\"}}";
    expect ~ "\"protocolVersion\"";
}

server {
    listen 443 ssl;
    server_name mcp-search.internal;

    location /mcp {
        proxy_pass http://mcp_search;
        proxy_set_header Connection "";  # HTTP/1.1 keepalive
        proxy_read_timeout 60s;
    }
}

If you're using Caddy (common for MCP servers on Fly.io or VPS), use an application-level health route that calls initialize internally:

// health.ts — expose /health route that verifies MCP server is ready
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";

export function addHealthRoute(app: Express, mcpServer: McpServer) {
    app.get("/health", async (_req, res) => {
        try {
            // Verify tool list is populated (initialize was successful)
            const tools = await mcpServer.getTools();
            if (tools.length === 0) {
                return res.status(503).json({ status: "degraded", reason: "no tools registered" });
            }
            res.json({ status: "ok", tool_count: tools.length });
        } catch (err) {
            res.status(503).json({ status: "error", reason: String(err) });
        }
    });
}
Health check typeWhat it verifiesWhat it misses
TCP probe (port open)Process is runningMCP server startup errors, tool registration failures, auth misconfiguration
HTTP 200 on /healthHTTP server is respondingTool list empty, database connection down, schema not loaded
MCP initialize probeFull protocol handshake succeeds, tool list non-emptyIndividual tool failures (tools/call errors caught separately)

Blue-green deployment for schema changes

MCP tool schema changes (adding tools, removing tools, changing inputSchema) are high-risk deployments. Agent frameworks typically cache the tool list from the initialize handshake for the duration of a connection. If you update the schema mid-run, the agent is using a stale tool list and may call tools with arguments the new schema rejects, or fail to discover newly added tools.

Blue-green deployment solves this by running old (blue) and new (green) versions simultaneously:

# blue-green-deploy.sh — safe schema change deployment

#!/bin/bash
set -e

SERVICE_NAME="mcp-search"
NEW_IMAGE="$1"
HEALTH_URL="https://mcp-search.internal/health"
MCP_URL="https://mcp-search.internal/mcp"

echo "=== Blue-green deployment: $SERVICE_NAME ==="

# 1. Start green instance (new version) on alternate port
docker run -d \
    --name ${SERVICE_NAME}-green \
    -p 8081:8080 \
    -e MCP_ENV=production \
    "$NEW_IMAGE"

# 2. Wait for green to be healthy
echo "Waiting for green instance health..."
for i in $(seq 1 30); do
    if curl -sf http://localhost:8081/health >/dev/null; then
        echo "Green instance healthy after ${i}s"
        break
    fi
    if [ $i -eq 30 ]; then
        echo "ERROR: Green instance failed health check after 30s"
        docker rm -f ${SERVICE_NAME}-green
        exit 1
    fi
    sleep 1
done

# 3. Verify MCP protocol handshake on green
echo "Verifying MCP initialize on green..."
INIT_RESPONSE=$(curl -sf -X POST http://localhost:8081/mcp \
    -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","clientInfo":{"name":"deploy-check","version":"1.0"}}}')

if ! echo "$INIT_RESPONSE" | grep -q '"protocolVersion"'; then
    echo "ERROR: MCP initialize failed on green: $INIT_RESPONSE"
    docker rm -f ${SERVICE_NAME}-green
    exit 1
fi
echo "MCP initialize succeeded on green"

# 4. Shift load balancer to green (Nginx upstream weight update)
# Replace with your LB API call (AWS ALB, Nginx Plus, etc.)
echo "Shifting traffic to green..."
./update-lb-weight.sh blue=0 green=100

# 5. Wait for in-flight blue requests to drain (30s grace period)
sleep 30

# 6. Stop blue
docker rm -f ${SERVICE_NAME}-blue
docker rename ${SERVICE_NAME}-green ${SERVICE_NAME}-blue

echo "=== Deployment complete ==="

Step 3 (MCP initialize verify) is the gate that prevents deploying a schema regression — if the new version fails the protocol handshake, you know before routing any production traffic to it.

Multi-region active-active

Enterprise teams with globally distributed agent workloads run MCP servers in multiple regions to reduce latency and provide regional fault isolation. The key design decision: are regions active-active (all regions serve traffic) or active-passive (one primary, others warm standby)?

Active-active is preferred for MCP servers because MCP protocol calls are stateless at the tool level — each tools/call is independent. Route clients to their nearest region for minimum latency, with automatic failover to a healthy region on errors:

# Caddy Caddyfile — global reverse proxy with health-check routing
# (Deploy on Fly.io, which provides anycast routing by default)

{
    # Fly.io handles anycast routing — all regions serve via the same IP
    # Region selection happens at the edge based on client location
}

mcp-search.fly.dev {
    reverse_proxy {
        # Primary: same region as client (Fly.io resolves this via anycast)
        to http://mcp-search.internal:8080

        # Health check — probes the MCP health endpoint
        health_uri /health
        health_interval 10s
        health_timeout 5s

        # Failover: if local region unhealthy, Fly.io routes to next closest
        lb_policy first
        fail_duration 30s
    }
}

One consideration for active-active MCP: tool schemas must be identical across all regions, and schema changes must be deployed to all regions before traffic is shifted. A region running an old schema version while another runs a new one creates inconsistent tool lists across connections.

RegionLatency to EU clientsLatency to US clientsFailover target
Frankfurt (primary EU)10–30ms80–120msAmsterdam or London
US East (primary US)80–120ms10–20msUS West
Singapore (primary APAC)150–200ms160–220msTokyo or Sydney

Change management integration

Enterprise IT environments typically require changes to production systems to go through a formal change management process — a Change Advisory Board (CAB) review, a ticket in ServiceNow or Jira, and a defined approval workflow. MCP server deployments, especially schema changes, should be treated as production changes.

# Makefile — change management gate before MCP deploy
.PHONY: deploy-production

deploy-production: check-change-ticket verify-schema-diff deploy-notify
	@echo "Production deployment complete"

check-change-ticket:
	@# Require a JIRA ticket number to be set
	@test -n "$(CHANGE_TICKET)" || (echo "ERROR: Set CHANGE_TICKET=PLAT-XXXX" && exit 1)
	@# Verify ticket is in 'Approved' state via JIRA API
	@./scripts/verify-change-ticket.sh "$(CHANGE_TICKET)"

verify-schema-diff:
	@# Generate diff of MCP tool list between current and new version
	@docker run --rm "$(NEW_IMAGE)" node -e "require('./dist').listTools().then(t => console.log(JSON.stringify(t)))" \
	    > /tmp/new-schema.json
	@curl -sf https://mcp-search.internal/mcp \
	    -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' \
	    | jq '.result.tools' > /tmp/current-schema.json
	@diff /tmp/current-schema.json /tmp/new-schema.json \
	    | tee /tmp/schema-diff.txt
	@# Fail if any tools were REMOVED (breaking change)
	@if grep "^<" /tmp/schema-diff.txt | grep -q '"name"'; then \
	    echo "ERROR: Schema diff shows tool removal — requires extended review"; \
	    exit 1; \
	fi

deploy-notify:
	@# Post deployment notification to ops Slack channel
	@./scripts/slack-notify.sh "#mcp-deployments" \
	    "Deploying $(NEW_IMAGE) for $(CHANGE_TICKET) — schema diff: $$(wc -l /tmp/schema-diff.txt | awk '{print $$1}') lines"

The verify-schema-diff target catches tool removals — the most breaking type of schema change. Tool additions are backward-compatible (agents that don't know about the new tool just don't call it). Tool removals are breaking (agents that cached a tools/list response with the deleted tool will fail when they try to call it).

Rollback procedures for schema regressions

Schema rollback is more complex than binary rollback because agents may have cached the new (broken) tool list. The rollback sequence must account for connection state:

  1. Deploy previous image version — roll back the container to the last known-good image
  2. Verify protocol handshake — confirm the previous schema is being served (use curl to hit tools/list and check expected tool names)
  3. Force-close persistent connections — agents holding persistent connections to the broken version need to reconnect to pick up the restored schema. If you're using Caddy or Nginx, issue a graceful reload to terminate keepalive connections
  4. Verify AliveMCP shows green — external probe confirms the rollback succeeded from outside your network
  5. Notify affected teams — if schema was broken for more than 5 minutes, agent runs during that window may have failed; teams running automated pipelines need to know to re-run

Out-of-band monitoring for enterprise deployments

Enterprise MCP servers are typically deployed inside private networks, behind VPNs or corporate firewalls. This creates a monitoring blind spot: your internal health checks confirm the server is up from inside the network, but they can't detect failures that affect external clients (VPN connectivity issues, DNS resolution failures, TLS certificate expiry on the edge terminator).

AliveMCP probes MCP endpoints from outside your network — the same path an external agent client takes to reach your server. This catches failures that internal probes miss:

For private MCP servers (Team or Enterprise tier), AliveMCP can probe from a dedicated IP range you whitelist in your firewall — giving you external verification without opening your MCP endpoints to the public internet. The probe frequency (every 60 seconds) means network-layer failures are detected within a minute rather than waiting for a user to report that their agent run failed.

Frequently asked questions

How many MCP server replicas does an enterprise deployment need?

Minimum two replicas in separate availability zones for production resilience. For high-traffic deployments (100+ concurrent agent sessions), three or more replicas allow rolling deployments without reducing capacity during an update (take one replica out of rotation, update it, verify, put back — net capacity stays at two replicas throughout). More than five replicas is rarely necessary for MCP servers unless you're running thousands of concurrent connections, because MCP connections are long-lived SSE streams rather than per-request HTTP, which means connection count scales differently than for stateless APIs.

Should MCP servers be deployed with Kubernetes or simpler orchestration?

Kubernetes is appropriate if your organisation already uses it for other services — standardised deployment tooling is valuable. If you're starting fresh and not already Kubernetes-native, platforms like Fly.io or Railway handle multi-region deployment, health checks, rolling updates, and automatic failover with far less operational overhead. The MCP protocol doesn't require Kubernetes — it requires the deployment guarantees Kubernetes provides. Match your tooling to your team's expertise, not to what sounds most enterprise.

How do we handle MCP tool schema versioning across environment promotions?

Tag each schema version with a semantic version (e.g., v2.3.0) and require the schema version to be pinned in the MCP server's manifest. Promote schema versions through environments (dev → staging → production) with a minimum soak time at each stage (24 hours in staging before production is a reasonable baseline). Block promotion if any tool was removed relative to the currently-deployed version in that environment. Track the schema version in your deployment system alongside the image tag so rollbacks can restore both simultaneously.

What's the right change freeze policy for MCP servers?

Align MCP server change freeze with your general application change freeze — typically 2 weeks before and 1 week after major corporate events (earnings calls, annual product releases, holiday periods). The additional MCP-specific rule: freeze schema changes (tool additions, removals, description updates) separately from infrastructure changes. A replica count increase or a TLS certificate renewal is low-risk and should be permitted during freeze; adding a new tool or changing a tool's inputSchema carries agent compatibility risk and should require CAB approval even during freeze-light periods.

How do we manage dev/staging/production MCP server isolation for the same tool set?

Use environment-specific subdomains (mcp-search.dev.internal, mcp-search.staging.internal, mcp-search.internal for production) rather than path prefixes. Configure each environment's MCP server with environment-specific credentials, data sources, and rate limits. Maintain separate AliveMCP monitor configurations per environment — you want independent uptime tracking per environment, not a shared monitor that passes when dev is up but prod is down. The most common isolation failure: dev and staging sharing a database, so data corruption from a dev test run affects staging validation.

Further reading

Know when your MCP server is down — before users do

AliveMCP probes your server's MCP endpoint every minute, detects protocol errors and transport failures, and pages you before users notice.

Start monitoring free