Guide · Kubernetes
MCP server on Kubernetes
Running an MCP server on Kubernetes requires four adaptations that stateless HTTP services don't need: HTTP/SSE transport (stdio doesn't work across pod boundaries), session-aware readiness probes (TCP health checks pass when the protocol handshake fails), pod disruption budgets (rolling updates kill sessions without a PDB), and session affinity if your server maintains per-session state beyond the MCP protocol itself.
TL;DR
Use HTTP/SSE transport — stdio is incompatible with K8s networking. Write readiness probes that complete the initialize handshake, not just TCP. Set a PodDisruptionBudget with minAvailable: 1 to prevent simultaneous pod kills during node upgrades. If your server is stateless (no in-memory session state beyond MCP), HPA works without affinity. If it's stateful, use a sticky service annotation. Load secrets from Kubernetes Secret objects or an external secrets manager — never bake them in the image. Monitor from outside the cluster with AliveMCP for a view the cluster's own health checks can't provide.
Why stdio doesn't work on Kubernetes
Stdio transport works by forking the server as a child process and piping messages through stdin/stdout. Kubernetes Pods run in their own network namespace. A client outside the cluster can't fork a process inside a Pod. Even if you connect to the Pod via kubectl exec, you get a shell session, not an MCP session. For any Kubernetes deployment, HTTP/SSE is the only viable transport.
HTTP/SSE also enables the health check infrastructure that Kubernetes relies on: readiness probes, liveness probes, and startup probes all require an HTTP endpoint. See MCP server deployment for the transport selection decision in the broader deployment context.
Deployment manifest
A minimal but correct Deployment for an MCP server:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server
spec:
replicas: 2
selector:
matchLabels:
app: mcp-server
template:
metadata:
labels:
app: mcp-server
spec:
terminationGracePeriodSeconds: 60
containers:
- name: mcp-server
image: registry.example.com/mcp-server:latest
ports:
- containerPort: 3000
envFrom:
- secretRef:
name: mcp-server-secrets
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "1000m"
startupProbe:
httpGet:
path: /healthz
port: 3000
failureThreshold: 30
periodSeconds: 2
readinessProbe:
httpGet:
path: /healthz
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthz
port: 3000
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
The terminationGracePeriodSeconds: 60 gives the pod 60 seconds to drain active sessions after SIGTERM before SIGKILL is sent. Your server's SIGTERM handler (see MCP server Docker — signal handling) should use the same drain timeout.
Writing a correct readiness probe
Kubernetes's httpGet readiness probe sends an HTTP GET to the specified path. If your /healthz endpoint simply returns 200 without running the MCP handshake, the probe passes even when the MCP layer is broken — the pod receives traffic it can't serve.
The /healthz endpoint should run a real MCP probe sequence:
// Express route — also works with Fastify, Hono, etc.
app.get('/healthz', async (req, res) => {
try {
// Probe the server's own MCP endpoint
const response = await fetch('http://localhost:3000/mcp', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
jsonrpc: '2.0', id: 1, method: 'initialize',
params: {
protocolVersion: '2024-11-05',
capabilities: {},
clientInfo: { name: 'healthz', version: '1' }
}
}),
signal: AbortSignal.timeout(5000)
});
const data = await response.json();
if (!data.result?.protocolVersion) throw new Error('bad response');
res.status(200).json({ status: 'ok' });
} catch (err) {
res.status(503).json({ status: 'unhealthy', error: err.message });
}
});
When the readiness probe returns 503, Kubernetes removes the pod from the Service endpoints — new sessions won't be routed to it. Existing sessions are unaffected (the pod isn't killed). The pod re-enters rotation once the probe returns 200 again. This is correct behavior for a temporarily overloaded pod: it stops accepting new sessions while it recovers, without losing active ones.
Pod Disruption Budget
Without a PodDisruptionBudget, a node upgrade or cluster autoscaling event can evict all your MCP server pods simultaneously, causing a complete outage. A PDB sets the minimum number of pods that must remain available during voluntary disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: mcp-server-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: mcp-server
With replicas: 2 and minAvailable: 1, Kubernetes will evict at most one pod at a time during node drains or rolling updates. One pod always remains available to serve sessions while the other is being replaced.
Choose between minAvailable and maxUnavailable based on your availability requirement: minAvailable: 1 guarantees at least one pod runs regardless of replica count; maxUnavailable: 1 allows one pod to be down at a time regardless of replica count. For small deployments (2–3 replicas), minAvailable: 1 is more predictable.
Horizontal Pod Autoscaling and session affinity
HPA scales the replica count based on CPU, memory, or custom metrics. For MCP servers, the right scaling metric is usually active session count — a custom metric emitted from your server and consumed by HPA via KEDA or the custom metrics adapter.
Stateless MCP servers (no in-memory state per session beyond what the MCP protocol itself tracks) work with standard HPA and round-robin load balancing. Scale horizontally without affinity.
Stateful MCP servers (in-memory context, tool-call chain state, session-scoped caches) need session affinity: all requests from the same session must route to the same pod. Configure this at the Service level with the client's IP as the affinity key, or at the ingress level with a cookie-based sticky session:
apiVersion: v1
kind: Service
metadata:
name: mcp-server
annotations:
# nginx-ingress cookie-based affinity (alternative to sessionAffinity: ClientIP)
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-name: "mcp-session"
nginx.ingress.kubernetes.io/session-cookie-expires: "3600"
spec:
selector:
app: mcp-server
ports:
- port: 80
targetPort: 3000
Session affinity limits how evenly load can be distributed — a heavy session sticks to one pod while others are underutilized. Design stateless MCP servers where possible. If session state is necessary, consider externalizing it to Redis so any pod can serve any session.
Secrets management
Reference credentials from Kubernetes Secret objects or an external secrets manager, not hardcoded in the Deployment manifest or Docker image:
apiVersion: v1
kind: Secret
metadata:
name: mcp-server-secrets
type: Opaque
stringData:
MY_API_KEY: "your-api-key-here"
DATABASE_URL: "postgres://user:pass@host/db"
The Deployment references this with envFrom.secretRef. For more sensitive secrets (credentials that rotate, secrets shared across clusters), use an external secrets manager: AWS Secrets Manager + External Secrets Operator, or HashiCorp Vault + the Vault Agent Injector. These sync secrets into Kubernetes Secrets automatically when the source changes, without requiring a pod restart for the new value to take effect.
Avoid kubectl apply -f deployment.yaml with secrets inlined in the manifest — that writes the secret to shell history and potentially to version control. Use kubectl create secret generic or a secrets management tool.
External monitoring beyond the cluster
Kubernetes health checks tell you whether pods are healthy from inside the cluster. They don't tell you whether your ingress, DNS, or TLS certificate is functioning for clients outside the cluster. An ingress controller misconfiguration, a let's-encrypt certificate renewal failure, or a DNS TTL that hasn't propagated all cause external clients to see a broken MCP server while internal health checks show green.
AliveMCP probes your public MCP endpoint from outside the cluster, running the full initialize → tools/list sequence and verifying TLS. This is the view your actual users have. See MCP server observability for how to combine internal K8s metrics, distributed tracing, and external probing into a complete picture.
Related questions
How many replicas should I run?
Minimum 2 for any production MCP server — 1 replica means a single pod crash or rolling update causes a complete outage. With a PDB of minAvailable: 1 and 2 replicas, you have no headroom during disruptions. 3 replicas with minAvailable: 2 is a better starting point for a production workload: one pod can be disrupted while two continue serving sessions.
Can I run an MCP server as a StatefulSet?
Usually not necessary. StatefulSets are for workloads that need stable network identifiers (pod-0.service, pod-1.service) or persistent storage per pod. MCP servers don't need either — sessions are established by the client on each connection, not by a stable pod identity. Use a Deployment with session affinity if needed, not a StatefulSet.
How do I handle schema drift between pod versions during rolling updates?
During a rolling update, old pods (version N) and new pods (version N+1) coexist. If N+1 removes a tool that N had, a client that connected to an N pod and then has a subsequent request routed to an N+1 pod may fail. The safest approach: ensure backward compatibility across adjacent versions (additive-only tool changes), or use a blue-green deploy to avoid mixed versions. See schema drift in MCP tool definitions.
What namespace should I put the MCP server in?
Use a dedicated namespace per environment (e.g., mcp-prod, mcp-staging). This scopes RBAC, network policies, and resource quotas. Within the namespace, the MCP server, its Service, PDB, and HPA all live together. Keep secrets in the same namespace as the workload that consumes them — cross-namespace secret references require additional RBAC configuration and are error-prone.
Further reading
- MCP server deployment — transport selection and rolling-restart safety
- MCP server Docker — Dockerfile and signal handling
- MCP server health checks — the full initialize probe sequence
- MCP server multi-region deployment
- MCP server observability — metrics, tracing, and external probing
- Schema drift in MCP tool definitions — detection and rollback
- AliveMCP — external monitoring for your K8s-hosted MCP server