Guide · Authentication

MCP server JWKS key rotation

JWKS key rotation is how authorization servers replace their JWT signing keys without permanently breaking clients that hold tokens signed by the old key. The critical insight for MCP servers is that a key rotation event that removes the old key immediately will invalidate every in-flight MCP session — the next HTTP request to /mcp triggers a JWKS re-fetch, the old key is gone, and the session's existing token fails signature verification. The solution is a grace period: publish the new key alongside the old key for at least as long as your longest token TTL, only removing the old key after all tokens signed by it have expired. This guide covers the rotation mechanics, the grace period strategy, and how to monitor rotation events with AliveMCP.

TL;DR

When rotating: add the new key to JWKS first (do not remove the old key). Start signing new tokens with the new key. Keep the old key in JWKS for at least max(token_ttl, max_session_duration) — typically 24 hours for short-lived tokens, 7 days for long-lived sessions. Only remove the old key after that window. The kid field in the JWT header tells your MCP server which key to use for verification — jose's createRemoteJWKSet handles kid-based key selection automatically.

Why rotation breaks MCP sessions

MCP sessions are long-lived. A user might authenticate, receive a JWT, and then use that session for an hour or more. If the authorization server rotates signing keys mid-session — removing the old key from the JWKS endpoint — the next JWKS cache miss on the MCP server will fetch a JWKS that does not contain the key that signed the user's token. Validation fails with "signature verification failed" and the session receives a 401.

This is worse than a normal token expiry because:

The token is still within its TTL — it has not expired
The client cannot refresh its way out — it needs to re-authenticate from scratch
The failure is silent until the next JWKS re-fetch, creating an unpredictable delay between the rotation event and the 401 spike
All active sessions are affected simultaneously, not gradually as tokens expire

The MCP session model makes this worse than equivalent REST API failures because a REST client can immediately retry with a fresh token — a MCP client must tear down the session, start a new initialize handshake, and rebuild all session state.

The grace period strategy

The correct rotation procedure publishes both old and new keys simultaneously for a transition window:

// Phase 1: JWKS contains both old and new keys (grace period)
{
  "keys": [
    { "kid": "key-2024-01", "alg": "RS256", "use": "sig", "kty": "RSA", ... }, // OLD
    { "kid": "key-2025-01", "alg": "RS256", "use": "sig", "kty": "RSA", ... }  // NEW
  ]
}

// Phase 2: After grace period — JWKS contains only the new key
{
  "keys": [
    { "kid": "key-2025-01", "alg": "RS256", "use": "sig", "kty": "RSA", ... }  // NEW only
  ]
}

During Phase 1, the authorization server begins signing all new tokens with key-2025-01. Existing tokens signed with key-2024-01 are still verifiable because both keys are in JWKS. The JWKS response includes the new key, so any MCP server instance that re-fetches JWKS during the grace period will cache both keys and can verify both old-key and new-key tokens.

Grace period duration must cover the overlap of token TTL and session duration:

Scenario	Minimum grace period
Short-lived tokens (15min), short sessions (<1h)	1 hour
Short-lived tokens (15min), long sessions (up to 8h)	8 hours (session duration is the constraint)
Long-lived tokens (24h), any session	24 hours (token TTL is the constraint)
Refresh tokens (30d)	30 days — or revoke refresh tokens separately before key removal

How jose handles kid-based key selection

createRemoteJWKSet from jose reads the kid field from the JWT header and selects the matching key from the JWKS. If the kid is not in the cached JWKS, it re-fetches the JWKS endpoint once (subject to the cooldownDuration) and retries. This means your MCP server handles rotation transparently — no restart required, no code change needed:

// This code handles rotation automatically via kid-based selection
const JWKS = createRemoteJWKSet(
  new URL(`${process.env.AUTH_ISSUER}/.well-known/jwks.json`),
  {
    cacheMaxAge: 10 * 60 * 1000,  // 10 minutes — balance freshness vs. JWKS traffic
    cooldownDuration: 30 * 1000,  // 30 seconds — prevent flood on unknown kid
  }
);

// jwtVerify selects the key matching the JWT's kid header automatically
const { payload } = await jwtVerify(token, JWKS, {
  algorithms: ['RS256', 'ES256'],
  issuer: process.env.AUTH_ISSUER,
  audience: process.env.AUTH_AUDIENCE,
});

The cooldownDuration is your defence against key-confusion attacks: an attacker sending tokens with arbitrary kid values would otherwise trigger a JWKS re-fetch on every request, exhausting the authorization server's rate limits. The cooldown ensures a maximum of one re-fetch per cooldownDuration per unknown kid.

Zero-downtime rotation procedure

Follow this sequence to rotate keys without any session disruption:

## Step 1: Generate the new key pair (on the auth server)
openssl genrsa -out new-private.pem 2048
openssl rsa -in new-private.pem -pubout -out new-public.pem

## Step 2: Add the new public key to JWKS with a new kid
## DO NOT remove the old key yet
## Auth server JWKS endpoint now returns both keys

## Step 3: Verify JWKS contains both keys
curl https://auth.example.com/.well-known/jwks.json | jq '.keys | length'
# Should return 2

## Step 4: Switch the auth server to sign new tokens with the new key
## Old tokens (signed by old key) remain verifiable for the grace period

## Step 5: Wait for grace period
## Duration = max(token_ttl, max_session_lifetime)
## For 1h tokens and 8h sessions: wait 8 hours

## Step 6: Verify no active sessions hold old-key tokens
## Check auth server session store or wait for certainty

## Step 7: Remove the old key from JWKS
## JWKS endpoint now returns only the new key

## Step 8: Verify JWKS contains only the new key
curl https://auth.example.com/.well-known/jwks.json | jq '.keys | length'
# Should return 1

## Step 9: Archive the old private key securely (do not delete immediately)
## Required for forensic investigation if a token signed by the old key appears after rotation

Detecting bad rotations with AliveMCP

A misconfigured rotation — removing the old key before the grace period ends — produces a sudden 401 spike across all active MCP sessions. AliveMCP's continuous probes detect this as an authentication failure event: the probe token (signed by the old key, with a TTL longer than the rotation window) begins failing with a signature verification error the moment the old key disappears from JWKS.

Because AliveMCP probes run every 60 seconds, the maximum time between a bad rotation and the alert is 60 seconds. Without external probing, you would only discover the failure when users begin reporting errors — typically minutes to hours later depending on how many users are active.

The AliveMCP probe alert should name the expected behaviour: "HTTP 401 from a server that was healthy 60 seconds ago — likely key rotation without grace period." This context is included in the AliveMCP incident payload alongside the raw HTTP status code, so you immediately know what to check (your JWKS endpoint) rather than starting a blind investigation. See MCP server security monitoring for distinguishing rotation-induced 401 spikes from credential enumeration attacks.

Algorithm migration (RS256 to ES256)

Algorithm migration is a rotation where you change both the key and the algorithm. The procedure is the same as key rotation, with one addition: your MCP server must accept both algorithms during the grace period.

// During migration: accept both RS256 (old) and ES256 (new)
const { payload } = await jwtVerify(token, JWKS, {
  algorithms: ['RS256', 'ES256'], // both accepted during grace period
  issuer: process.env.AUTH_ISSUER,
  audience: process.env.AUTH_AUDIENCE,
});

// After migration: restrict to ES256 only
// algorithms: ['ES256']

Do not remove RS256 from the algorithms list until after the grace period ends. Removing it early causes the same sudden 401 spike as removing the old key from JWKS prematurely.