Guide · Deployment

MCP server Fly.io deployment

Fly.io is a popular deployment platform for MCP servers: it handles TLS termination automatically, deploys Docker containers to hardware in 30+ regions, and provides persistent volumes for SQLite storage. The two configuration points specific to MCP servers on Fly.io are (1) the proxy timeout — Fly's default 60-second idle timeout terminates SSE connections that are quiet for more than 60 seconds, which disconnects active MCP sessions — and (2) session affinity — Fly's load balancer must route each client consistently to the same machine when running multiple instances. This guide covers both, along with the full fly.toml, secrets management, volume mounts, and graceful shutdown.

TL;DR

Set http_options.idle_timeout = 3600 in fly.toml to keep SSE connections alive. Enable http_options.h2_backend = true for HTTP/2 multiplexing. Mount a persistent volume at /data for SQLite. Use fly secrets set for all credentials. Implement /health returning 200 when ready. Enable sticky sessions via the fly-prefer-region header or use [http_service.concurrency] to route SSE sessions to a single machine.

fly.toml configuration

# fly.toml — MCP server on Fly.io
app = 'your-mcp-server'
primary_region = 'iad'   # US East — change to your primary user region

[build]
  # Fly builds from your Dockerfile automatically
  # Make sure your Dockerfile has a multi-stage build (see mcp-server-docker guide)

[env]
  NODE_ENV = 'production'
  PORT = '3000'
  LOG_LEVEL = 'info'
  # MCP session drain timeout — must be less than Fly's kill timeout
  DRAIN_TIMEOUT_MS = '20000'

[[mounts]]
  source = 'mcp_data'    # Fly persistent volume — run: fly volumes create mcp_data
  destination = '/data'
  # Your app reads DATABASE_URL = sqlite:///data/mcp.db

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = 'stop'   # or 'off' if sessions should never be interrupted
  auto_start_machines = true
  min_machines_running = 1       # keep at least one machine warm at all times

  # Critical for SSE: increase the idle connection timeout
  # Default is 60s — SSE connections idle between tool calls will be terminated
  [http_service.http_options]
    idle_timeout = 3600    # 1 hour — match your longest expected session
    h2_backend = true

  # Concurrency limits for load balancing — helps route SSE sessions stably
  [http_service.concurrency]
    type = 'connections'
    hard_limit = 200
    soft_limit = 150    # new sessions prefer machines below soft_limit

  [[http_service.checks]]
    grace_period = '15s'
    interval = '30s'
    method = 'GET'
    path = '/health'
    port = 3000
    timeout = '5s'
    tls_skip_verify = false

[[vm]]
  size = 'shared-cpu-1x'   # 256 MB RAM — sufficient for most MCP servers
  memory = '512mb'          # override if your tool handlers are memory-intensive

The idle_timeout = 3600 is the most important setting. Without it, Fly's proxy terminates any HTTP connection that has been idle for more than 60 seconds. An MCP session that is waiting for user input (the agent is thinking, not calling tools) will be disconnected at the 60-second mark. The client sees a broken SSE stream and must reconnect. With the timeout extended to 3600 seconds (1 hour), sessions survive periods of inactivity up to one hour. Adjust down to a value appropriate for your expected session duration.

Create the app and persistent volume

# Create the Fly app (one-time setup)
fly apps create your-mcp-server

# Create a persistent volume for SQLite data
# Volumes are region-specific — create in your primary region
fly volumes create mcp_data \
  --region iad \
  --size 10 \         # 10 GB — adjust to your data needs
  --app your-mcp-server

# Verify volume was created
fly volumes list --app your-mcp-server

Fly volumes are attached to a specific machine. If you scale to multiple machines, each machine gets its own volume with independent SQLite data. This means read/write SQLite works for single-machine deployments (the common case for indie MCP servers) but not for multi-machine deployments where data must be consistent. For multi-machine MCP servers with shared state, use an external PostgreSQL database (fly postgres create) instead of SQLite. See MCP server multi-region deployment for multi-instance state management.

Secrets management

# Set secrets via the Fly CLI — these are injected as environment variables
# at runtime and are never stored in fly.toml or your source repository

fly secrets set \
  JWT_SECRET="your-256-bit-jwt-secret" \
  WEBHOOK_SIGNING_KEY="your-webhook-key" \
  ALIVEMCP_PROBE_KEY="mcp_live_..." \
  --app your-mcp-server

# List secret names (values are never displayed)
fly secrets list --app your-mcp-server

# Rotate a secret (triggers a rolling restart automatically)
fly secrets set JWT_SECRET="new-secret-value" --app your-mcp-server

# Remove a secret
fly secrets unset WEBHOOK_SIGNING_KEY --app your-mcp-server

Fly secrets are encrypted at rest and injected as environment variables when the machine starts. They are never written to disk on your local machine (unlike .env files) and are not visible in fly.toml (which is committed to your repository). Running fly secrets set triggers a rolling restart by default — the new version of each secret is available to the restarted machine immediately. See MCP server secrets management for the full pattern including JWKS key rotation on Fly.

Session affinity for multi-machine deployments

Fly.io's load balancer distributes requests across machines based on connection count (controlled by the concurrency block in fly.toml). For SSE-based MCP sessions, all requests from a single session must reach the same machine — the SSE connection is established once and subsequent requests (tool calls) flow through it. A single-machine deployment (1 VM) has no routing problem. Multi-machine deployments require sticky routing.

# Option 1: fly-prefer-region header
# Your MCP client can send a fly-prefer-region header to pin to a specific region.
# This does not pin to a specific machine within a region — only useful for
# geo-affinity, not for same-machine affinity.

# Option 2: Externalise session state (recommended for multi-machine)
# Store session state in an external store (Redis, Fly-hosted Postgres) rather than
# in process memory. Each machine can then handle any request for any session.
# This is the scalable approach — see mcp-server-multi-region for details.

# Option 3: Limit to one machine (simplest — works for most indie MCP servers)
fly scale count 1 --app your-mcp-server
# One machine = no routing ambiguity. Use volumes for SQLite persistence.

For most indie MCP servers, a single Fly machine handles hundreds of concurrent SSE sessions comfortably. A single shared-cpu-1x machine with 512 MB RAM is sufficient for an MCP server serving 50–200 concurrent sessions. Scale to multiple machines only when you need geographic distribution (put a machine near your users) or when a single machine is memory-saturated. See MCP server load balancing for the session-state externalisation pattern.

Deploy and verify

# Deploy (builds from Dockerfile and pushes to Fly's registry)
fly deploy --app your-mcp-server

# Watch the deployment
fly status --app your-mcp-server

# Tail logs in real time
fly logs --app your-mcp-server

# SSH into the running machine for debugging
fly ssh console --app your-mcp-server

# Check the health endpoint from outside
curl https://your-mcp-server.fly.dev/health

# Run the MCP smoke test against the production endpoint
node scripts/mcp-smoke-test.js https://your-mcp-server.fly.dev

Fly builds and deploys in one command. The build runs on Fly's infrastructure (not your local machine), so you do not need Docker installed locally. The first deploy may take 2–5 minutes as Fly pulls the base image and builds. Subsequent deploys are faster if the Docker layer cache is warm. See MCP server CI/CD for a GitHub Actions workflow that runs fly deploy automatically on push to main.

Auto-stop and cold start

Fly's auto_stop_machines = 'stop' shuts down machines that have no active connections, reducing costs when the server is idle. When a new connection arrives, Fly starts a stopped machine in 1–3 seconds. This is acceptable for many MCP use cases but introduces cold start latency.

Cold starts affect MCP servers differently than REST APIs. A REST client that receives a 503 during cold start can immediately retry. An MCP client waiting for the server to start must wait 1–3 seconds before the SSE connection is established, which may trigger a timeout in the client. Consider setting auto_stop_machines = 'off' for production MCP servers where cold start latency is unacceptable, or setting min_machines_running = 1 to always keep at least one machine warm.

# Keep one machine always running (no cold starts)
# In fly.toml:
[http_service]
  auto_stop_machines = 'off'
  min_machines_running = 1

# The cost: a shared-cpu-1x machine with 256 MB RAM costs ~$1.94/month
# (well within the free allowance for most Fly accounts)

AliveMCP detects cold starts as elevated connection time (above the server's normal baseline). A probe that normally completes the MCP initialize handshake in 50ms showing 2000ms is a cold start signal. AliveMCP distinguishes cold starts (one slow probe followed by normal probes) from genuine slowness (all probes above baseline). See MCP server cold start for optimisation techniques if your server has a slow initialisation path.

Related questions

How do I set up a custom domain on Fly.io for my MCP server?

Run fly certs create your-domain.example.com --app your-mcp-server to generate a Let's Encrypt certificate automatically. Fly handles renewal. Then create a CNAME DNS record pointing your-domain.example.com to your-mcp-server.fly.dev. TLS terminates at Fly's edge — your app sees plain HTTP internally on port 3000. The force_https = true setting in fly.toml ensures HTTP requests are redirected to HTTPS.

Can I use SQLite on Fly.io with multiple machines?

No — SQLite is file-based and files on Fly volumes are not shared between machines. If you scale to two machines, each machine has its own independent SQLite file. Writes to one machine are invisible to the other. For multi-machine Fly deployments, use fly postgres create to provision a managed Postgres database, or use LiteFS (Fly's distributed SQLite layer that replicates writes across machines). For a single-machine deployment (the common case for indie MCP servers), SQLite on a Fly volume is the simplest and most cost-effective option.

How do I roll back a bad deploy on Fly.io?

Run fly releases list --app your-mcp-server to see all releases, then fly deploy --image registry.fly.io/your-mcp-server:<previous-version> to roll back to a specific image. For automated rollback on health check failure, configure your CI/CD pipeline to check the health endpoint after deploy and run fly releases rollback if the smoke test fails. See MCP server CI/CD for the rollback-on-failure pipeline.

How does AliveMCP monitor a Fly.io MCP server?

AliveMCP probes your Fly.io URL (e.g. https://your-mcp-server.fly.dev) the same way it probes any MCP endpoint: an HTTP GET to /health every 60 seconds, and a full MCP initialize + tools/list handshake every 5 minutes. Fly's automatic TLS means AliveMCP also validates the certificate as part of its probe. Certificate expiry is detected before it affects users. See MCP server SSL certificate monitoring for the TLS check details.

Further reading