Guide · Kubernetes

MCP server on Kubernetes

Running an MCP server on Kubernetes requires four adaptations that stateless HTTP services don't need: HTTP/SSE transport (stdio doesn't work across pod boundaries), session-aware readiness probes (TCP health checks pass when the protocol handshake fails), pod disruption budgets (rolling updates kill sessions without a PDB), and session affinity if your server maintains per-session state beyond the MCP protocol itself.

TL;DR

Use HTTP/SSE transport — stdio is incompatible with K8s networking. Write readiness probes that complete the initialize handshake, not just TCP. Set a PodDisruptionBudget with minAvailable: 1 to prevent simultaneous pod kills during node upgrades. If your server is stateless (no in-memory session state beyond MCP), HPA works without affinity. If it's stateful, use a sticky service annotation. Load secrets from Kubernetes Secret objects or an external secrets manager — never bake them in the image. Monitor from outside the cluster with AliveMCP for a view the cluster's own health checks can't provide.

Why stdio doesn't work on Kubernetes

Stdio transport works by forking the server as a child process and piping messages through stdin/stdout. Kubernetes Pods run in their own network namespace. A client outside the cluster can't fork a process inside a Pod. Even if you connect to the Pod via kubectl exec, you get a shell session, not an MCP session. For any Kubernetes deployment, HTTP/SSE is the only viable transport.

HTTP/SSE also enables the health check infrastructure that Kubernetes relies on: readiness probes, liveness probes, and startup probes all require an HTTP endpoint. See MCP server deployment for the transport selection decision in the broader deployment context.

Deployment manifest

A minimal but correct Deployment for an MCP server:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: mcp-server
          image: registry.example.com/mcp-server:latest
          ports:
            - containerPort: 3000
          envFrom:
            - secretRef:
                name: mcp-server-secrets
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "1000m"
          startupProbe:
            httpGet:
              path: /healthz
              port: 3000
            failureThreshold: 30
            periodSeconds: 2
          readinessProbe:
            httpGet:
              path: /healthz
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /healthz
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 30
            failureThreshold: 3

The terminationGracePeriodSeconds: 60 gives the pod 60 seconds to drain active sessions after SIGTERM before SIGKILL is sent. Your server's SIGTERM handler (see MCP server Docker — signal handling) should use the same drain timeout.

Writing a correct readiness probe

Kubernetes's httpGet readiness probe sends an HTTP GET to the specified path. If your /healthz endpoint simply returns 200 without running the MCP handshake, the probe passes even when the MCP layer is broken — the pod receives traffic it can't serve.

The /healthz endpoint should run a real MCP probe sequence:

// Express route — also works with Fastify, Hono, etc.
app.get('/healthz', async (req, res) => {
  try {
    // Probe the server's own MCP endpoint
    const response = await fetch('http://localhost:3000/mcp', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        jsonrpc: '2.0', id: 1, method: 'initialize',
        params: {
          protocolVersion: '2024-11-05',
          capabilities: {},
          clientInfo: { name: 'healthz', version: '1' }
        }
      }),
      signal: AbortSignal.timeout(5000)
    });
    const data = await response.json();
    if (!data.result?.protocolVersion) throw new Error('bad response');
    res.status(200).json({ status: 'ok' });
  } catch (err) {
    res.status(503).json({ status: 'unhealthy', error: err.message });
  }
});

When the readiness probe returns 503, Kubernetes removes the pod from the Service endpoints — new sessions won't be routed to it. Existing sessions are unaffected (the pod isn't killed). The pod re-enters rotation once the probe returns 200 again. This is correct behavior for a temporarily overloaded pod: it stops accepting new sessions while it recovers, without losing active ones.

Pod Disruption Budget

Without a PodDisruptionBudget, a node upgrade or cluster autoscaling event can evict all your MCP server pods simultaneously, causing a complete outage. A PDB sets the minimum number of pods that must remain available during voluntary disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: mcp-server-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: mcp-server

With replicas: 2 and minAvailable: 1, Kubernetes will evict at most one pod at a time during node drains or rolling updates. One pod always remains available to serve sessions while the other is being replaced.

Choose between minAvailable and maxUnavailable based on your availability requirement: minAvailable: 1 guarantees at least one pod runs regardless of replica count; maxUnavailable: 1 allows one pod to be down at a time regardless of replica count. For small deployments (2–3 replicas), minAvailable: 1 is more predictable.

Horizontal Pod Autoscaling and session affinity

HPA scales the replica count based on CPU, memory, or custom metrics. For MCP servers, the right scaling metric is usually active session count — a custom metric emitted from your server and consumed by HPA via KEDA or the custom metrics adapter.

Stateless MCP servers (no in-memory state per session beyond what the MCP protocol itself tracks) work with standard HPA and round-robin load balancing. Scale horizontally without affinity.

Stateful MCP servers (in-memory context, tool-call chain state, session-scoped caches) need session affinity: all requests from the same session must route to the same pod. Configure this at the Service level with the client's IP as the affinity key, or at the ingress level with a cookie-based sticky session:

apiVersion: v1
kind: Service
metadata:
  name: mcp-server
  annotations:
    # nginx-ingress cookie-based affinity (alternative to sessionAffinity: ClientIP)
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "mcp-session"
    nginx.ingress.kubernetes.io/session-cookie-expires: "3600"
spec:
  selector:
    app: mcp-server
  ports:
    - port: 80
      targetPort: 3000

Session affinity limits how evenly load can be distributed — a heavy session sticks to one pod while others are underutilized. Design stateless MCP servers where possible. If session state is necessary, consider externalizing it to Redis so any pod can serve any session.

Secrets management

Reference credentials from Kubernetes Secret objects or an external secrets manager, not hardcoded in the Deployment manifest or Docker image:

apiVersion: v1
kind: Secret
metadata:
  name: mcp-server-secrets
type: Opaque
stringData:
  MY_API_KEY: "your-api-key-here"
  DATABASE_URL: "postgres://user:pass@host/db"

The Deployment references this with envFrom.secretRef. For more sensitive secrets (credentials that rotate, secrets shared across clusters), use an external secrets manager: AWS Secrets Manager + External Secrets Operator, or HashiCorp Vault + the Vault Agent Injector. These sync secrets into Kubernetes Secrets automatically when the source changes, without requiring a pod restart for the new value to take effect.

Avoid kubectl apply -f deployment.yaml with secrets inlined in the manifest — that writes the secret to shell history and potentially to version control. Use kubectl create secret generic or a secrets management tool.

External monitoring beyond the cluster

Kubernetes health checks tell you whether pods are healthy from inside the cluster. They don't tell you whether your ingress, DNS, or TLS certificate is functioning for clients outside the cluster. An ingress controller misconfiguration, a let's-encrypt certificate renewal failure, or a DNS TTL that hasn't propagated all cause external clients to see a broken MCP server while internal health checks show green.

AliveMCP probes your public MCP endpoint from outside the cluster, running the full initialize → tools/list sequence and verifying TLS. This is the view your actual users have. See MCP server observability for how to combine internal K8s metrics, distributed tracing, and external probing into a complete picture.