Guide · Deployment

MCP server Fly.io deployment

Fly.io is a popular deployment platform for MCP servers: it handles TLS termination automatically, deploys Docker containers to hardware in 30+ regions, and provides persistent volumes for SQLite storage. The two configuration points specific to MCP servers on Fly.io are (1) the proxy timeout — Fly's default 60-second idle timeout terminates SSE connections that are quiet for more than 60 seconds, which disconnects active MCP sessions — and (2) session affinity — Fly's load balancer must route each client consistently to the same machine when running multiple instances. This guide covers both, along with the full fly.toml, secrets management, volume mounts, and graceful shutdown.

TL;DR

Set http_options.idle_timeout = 3600 in fly.toml to keep SSE connections alive. Enable http_options.h2_backend = true for HTTP/2 multiplexing. Mount a persistent volume at /data for SQLite. Use fly secrets set for all credentials. Implement /health returning 200 when ready. Enable sticky sessions via the fly-prefer-region header or use [http_service.concurrency] to route SSE sessions to a single machine.

fly.toml configuration

# fly.toml — MCP server on Fly.io
app = 'your-mcp-server'
primary_region = 'iad'   # US East — change to your primary user region

[build]
  # Fly builds from your Dockerfile automatically
  # Make sure your Dockerfile has a multi-stage build (see mcp-server-docker guide)

[env]
  NODE_ENV = 'production'
  PORT = '3000'
  LOG_LEVEL = 'info'
  # MCP session drain timeout — must be less than Fly's kill timeout
  DRAIN_TIMEOUT_MS = '20000'

[[mounts]]
  source = 'mcp_data'    # Fly persistent volume — run: fly volumes create mcp_data
  destination = '/data'
  # Your app reads DATABASE_URL = sqlite:///data/mcp.db

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = 'stop'   # or 'off' if sessions should never be interrupted
  auto_start_machines = true
  min_machines_running = 1       # keep at least one machine warm at all times

  # Critical for SSE: increase the idle connection timeout
  # Default is 60s — SSE connections idle between tool calls will be terminated
  [http_service.http_options]
    idle_timeout = 3600    # 1 hour — match your longest expected session
    h2_backend = true

  # Concurrency limits for load balancing — helps route SSE sessions stably
  [http_service.concurrency]
    type = 'connections'
    hard_limit = 200
    soft_limit = 150    # new sessions prefer machines below soft_limit

  [[http_service.checks]]
    grace_period = '15s'
    interval = '30s'
    method = 'GET'
    path = '/health'
    port = 3000
    timeout = '5s'
    tls_skip_verify = false

[[vm]]
  size = 'shared-cpu-1x'   # 256 MB RAM — sufficient for most MCP servers
  memory = '512mb'          # override if your tool handlers are memory-intensive

The idle_timeout = 3600 is the most important setting. Without it, Fly's proxy terminates any HTTP connection that has been idle for more than 60 seconds. An MCP session that is waiting for user input (the agent is thinking, not calling tools) will be disconnected at the 60-second mark. The client sees a broken SSE stream and must reconnect. With the timeout extended to 3600 seconds (1 hour), sessions survive periods of inactivity up to one hour. Adjust down to a value appropriate for your expected session duration.

Create the app and persistent volume

# Create the Fly app (one-time setup)
fly apps create your-mcp-server

# Create a persistent volume for SQLite data
# Volumes are region-specific — create in your primary region
fly volumes create mcp_data \
  --region iad \
  --size 10 \         # 10 GB — adjust to your data needs
  --app your-mcp-server

# Verify volume was created
fly volumes list --app your-mcp-server

Fly volumes are attached to a specific machine. If you scale to multiple machines, each machine gets its own volume with independent SQLite data. This means read/write SQLite works for single-machine deployments (the common case for indie MCP servers) but not for multi-machine deployments where data must be consistent. For multi-machine MCP servers with shared state, use an external PostgreSQL database (fly postgres create) instead of SQLite. See MCP server multi-region deployment for multi-instance state management.

Secrets management

# Set secrets via the Fly CLI — these are injected as environment variables
# at runtime and are never stored in fly.toml or your source repository

fly secrets set \
  JWT_SECRET="your-256-bit-jwt-secret" \
  WEBHOOK_SIGNING_KEY="your-webhook-key" \
  ALIVEMCP_PROBE_KEY="mcp_live_..." \
  --app your-mcp-server

# List secret names (values are never displayed)
fly secrets list --app your-mcp-server

# Rotate a secret (triggers a rolling restart automatically)
fly secrets set JWT_SECRET="new-secret-value" --app your-mcp-server

# Remove a secret
fly secrets unset WEBHOOK_SIGNING_KEY --app your-mcp-server

Fly secrets are encrypted at rest and injected as environment variables when the machine starts. They are never written to disk on your local machine (unlike .env files) and are not visible in fly.toml (which is committed to your repository). Running fly secrets set triggers a rolling restart by default — the new version of each secret is available to the restarted machine immediately. See MCP server secrets management for the full pattern including JWKS key rotation on Fly.

Session affinity for multi-machine deployments

Fly.io's load balancer distributes requests across machines based on connection count (controlled by the concurrency block in fly.toml). For SSE-based MCP sessions, all requests from a single session must reach the same machine — the SSE connection is established once and subsequent requests (tool calls) flow through it. A single-machine deployment (1 VM) has no routing problem. Multi-machine deployments require sticky routing.

# Option 1: fly-prefer-region header
# Your MCP client can send a fly-prefer-region header to pin to a specific region.
# This does not pin to a specific machine within a region — only useful for
# geo-affinity, not for same-machine affinity.

# Option 2: Externalise session state (recommended for multi-machine)
# Store session state in an external store (Redis, Fly-hosted Postgres) rather than
# in process memory. Each machine can then handle any request for any session.
# This is the scalable approach — see mcp-server-multi-region for details.

# Option 3: Limit to one machine (simplest — works for most indie MCP servers)
fly scale count 1 --app your-mcp-server
# One machine = no routing ambiguity. Use volumes for SQLite persistence.

For most indie MCP servers, a single Fly machine handles hundreds of concurrent SSE sessions comfortably. A single shared-cpu-1x machine with 512 MB RAM is sufficient for an MCP server serving 50–200 concurrent sessions. Scale to multiple machines only when you need geographic distribution (put a machine near your users) or when a single machine is memory-saturated. See MCP server load balancing for the session-state externalisation pattern.

Deploy and verify

# Deploy (builds from Dockerfile and pushes to Fly's registry)
fly deploy --app your-mcp-server

# Watch the deployment
fly status --app your-mcp-server

# Tail logs in real time
fly logs --app your-mcp-server

# SSH into the running machine for debugging
fly ssh console --app your-mcp-server

# Check the health endpoint from outside
curl https://your-mcp-server.fly.dev/health

# Run the MCP smoke test against the production endpoint
node scripts/mcp-smoke-test.js https://your-mcp-server.fly.dev

Fly builds and deploys in one command. The build runs on Fly's infrastructure (not your local machine), so you do not need Docker installed locally. The first deploy may take 2–5 minutes as Fly pulls the base image and builds. Subsequent deploys are faster if the Docker layer cache is warm. See MCP server CI/CD for a GitHub Actions workflow that runs fly deploy automatically on push to main.

Auto-stop and cold start

Fly's auto_stop_machines = 'stop' shuts down machines that have no active connections, reducing costs when the server is idle. When a new connection arrives, Fly starts a stopped machine in 1–3 seconds. This is acceptable for many MCP use cases but introduces cold start latency.

Cold starts affect MCP servers differently than REST APIs. A REST client that receives a 503 during cold start can immediately retry. An MCP client waiting for the server to start must wait 1–3 seconds before the SSE connection is established, which may trigger a timeout in the client. Consider setting auto_stop_machines = 'off' for production MCP servers where cold start latency is unacceptable, or setting min_machines_running = 1 to always keep at least one machine warm.

# Keep one machine always running (no cold starts)
# In fly.toml:
[http_service]
  auto_stop_machines = 'off'
  min_machines_running = 1

# The cost: a shared-cpu-1x machine with 256 MB RAM costs ~$1.94/month
# (well within the free allowance for most Fly accounts)

AliveMCP detects cold starts as elevated connection time (above the server's normal baseline). A probe that normally completes the MCP initialize handshake in 50ms showing 2000ms is a cold start signal. AliveMCP distinguishes cold starts (one slow probe followed by normal probes) from genuine slowness (all probes above baseline). See MCP server cold start for optimisation techniques if your server has a slow initialisation path.