Authentication guide · 2026-06-04 · Production MCP servers

MCP Server Authentication and Authorization Guide: JWT Validation, JWKS Rotation, RBAC, OAuth Device Flow, and API Key Management

Most MCP server tutorials add a single Authorization: Bearer check and call it done. That is authentication — the verification that a caller holds a credential issued by an authority you trust. Authorization — the determination of what that verified caller is permitted to call — is a separate concern that most tutorials skip entirely. And neither concern covers the operational question: what happens to in-flight sessions when you rotate the signing keys that underpin your token verification? A complete authentication and authorization system for a production MCP server has five concerns: JWT validation as the verification layer, JWKS key rotation as the operational layer that keeps verification working over time, RBAC as the authorization layer that maps verified identity to permitted tool calls, OAuth 2.0 device flow as the token acquisition mechanism for LLM clients, and API key management as the simpler alternative for deployments you control end-to-end. This guide covers them as a system — how each concern addresses a distinct part of the auth problem, how they compose, and what remains invisible to internal auth checks even when all five are correctly implemented.

TL;DR

OAuth 2.0 device flow is how tokens get issued to LLM clients. Device flow works for any client that can display a URL and poll — the LLM client shows the user a verification URI, the user authenticates in a browser, the client polls until the authorization server confirms and returns a token. This is the correct OAuth flow for MCP clients because they cannot perform browser redirects.
JWT validation verifies every token at the transport boundary — once per session, not per tool call. JWT validation uses jose's createRemoteJWKSet + jwtVerify with explicit algorithms, issuer, and audience options. Omitting any of these degrades verification from "this token is for my service, from my auth server, and not expired" to "this token has a valid signature from someone" — a meaningfully weaker check. The verified sub, scopes, and custom claims are stored in res.locals.identity at session start and reused for all tool calls in the session.
JWKS rotation keeps key verification working without breaking in-flight sessions. The most dangerous rotation mistake is removing an old key from JWKS immediately after publishing a new one. JWKS rotation requires a grace period equal to max(token_ttl, max_session_lifetime) — during that window, the old key stays in the JWKS endpoint so that tokens signed with it (and sessions holding those tokens) remain valid.
RBAC maps verified identity to permitted tool calls without per-handler role checks. RBAC centralises the permission model in a TOOL_PERMISSIONS map and a requireScopes wrapper that returns an MCP isError: true response (not an HTTP 403) on denial. Scope inheritance is handled at identity extraction time: a ROLE_SCOPE_EXPANSION map expands roles to their full scope set once, so individual tool handlers receive a fully resolved scope list and never check roles directly.
API keys are the simpler alternative when you control both client and server. API key management eliminates OAuth complexity at the cost of losing federated identity. Keys use crypto.randomBytes(32).toString('hex') for 256 bits of entropy, a mcp_{env}_{prefix}_{secret} format for scanner detectability, prefix-first database lookup, and timingSafeEqual for constant-time comparison. Per-key scoping in the database means each key has its own permission set — the same RBAC model applies, with the key's scopes standing in for the JWT's scope claim.

Why Auth for MCP Servers Is Different

A conventional HTTP API authenticates each request independently. A client sends credentials with every call; the server verifies them and returns a response. State is request-scoped. If a token expires mid-session, the client retries with a fresh token and the next request works.

MCP servers have two properties that change this model significantly.

First, MCP sessions are long-lived. An MCP session opens with initialize and may persist for minutes or hours while an LLM iterates over a complex task, issuing dozens of tool calls. Authentication happens at session open; subsequent tool calls reuse the session context. If you re-validate the bearer token on each tool call, you will eventually validate a token that has expired mid-session — not because the caller is unauthorised, but because the session outlasted the token TTL. The correct model is: authenticate at initialize, bind the verified identity to the session, reuse it for all subsequent calls.

Second, LLM clients cannot perform browser redirects. The standard OAuth 2.0 authorization code flow assumes the client can open a browser, handle a redirect to a callback URL, and exchange a code for a token. An LLM agent running a task cannot do this. OAuth 2.0 device flow was designed for exactly this constraint: the client requests a device code, displays a short URL and user code, and polls for the token while the user authenticates in a separate browser session. Device flow is the correct acquisition mechanism for MCP clients.

These two properties — session-scoped auth and non-browser clients — mean that a complete auth system for MCP servers must cover: token acquisition (device flow), verification at session open (JWT validation), key rotation that does not break in-flight sessions (JWKS rotation), per-tool-call permission enforcement that does not re-check the token (RBAC), and optionally a simpler credential type for controlled deployments (API keys).

The Five Concerns and Their Roles

Concern	Phase	What it answers	What it cannot do alone
OAuth 2.0 device flow	Token acquisition	How does an LLM client obtain a valid token?	Does not verify the token on the resource server side; does not control what the token can call
JWT validation	Authentication	Is this token valid, unexpired, and issued for my service?	Does not enforce what the verified caller can call; does not handle key rotation grace periods
JWKS key rotation	Key operations	How do we rotate signing keys without breaking in-flight sessions?	Does not generate tokens; does not enforce permissions; only manages the key lifecycle
RBAC	Authorization	Given verified identity, which tools can this caller invoke?	Does not verify the token; depends on identity already being extracted and bound to the session
API key management	Alternative credential	How do controlled clients authenticate without OAuth?	Does not federate identity; the issuing system must be the same system that validates — no third-party auth server

The table shows the composition logic: OAuth produces the credential; JWT validation verifies it; JWKS rotation keeps verification infrastructure current; RBAC turns verified identity into an access decision; API keys are a parallel path that skips OAuth and JWT entirely but produces the same result (a verified identity with a scope list) that RBAC consumes.

OAuth 2.0 Device Flow: How LLM Clients Get Tokens

Device flow is the token acquisition layer — the mechanism by which an LLM client gets a JWT it can present to your MCP server. It exists because the standard authorization code flow assumes browser redirect capability that LLM agents do not have.

The flow has four phases. First, the client posts to the device authorization endpoint to get a device code and a verification URI. Second, it displays the URI and user code to the user (or passes them to the human-in-the-loop approval step). Third, it polls the token endpoint with the device code. Fourth, it receives an access token (and optionally a refresh token) when the user completes authorization.

// Phase 1: request device and user codes
const deviceResponse = await fetch(metadata.device_authorization_endpoint, {
  method: 'POST',
  headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
  body: new URLSearchParams({
    client_id: process.env.OAUTH_CLIENT_ID!,
    scope: 'openid profile mcp:tools',
  }),
});
const { device_code, user_code, verification_uri_complete, interval } =
  await deviceResponse.json();

// Phase 2: show the user where to go
console.log(`Authorize at: ${verification_uri_complete}`);
console.log(`Or visit ${verification_uri_complete.split('?')[0]} and enter: ${user_code}`);

// Phase 3: poll
let pollInterval = interval ?? 5;
while (true) {
  await new Promise(r => setTimeout(r, pollInterval * 1000));
  const tokenResponse = await fetch(metadata.token_endpoint, {
    method: 'POST',
    headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
    body: new URLSearchParams({
      client_id: process.env.OAUTH_CLIENT_ID!,
      device_code,
      grant_type: 'urn:ietf:params:oauth:grant-type:device_code',
    }),
  });
  const result = await tokenResponse.json();
  if (result.access_token) return result; // Phase 4: done
  if (result.error === 'slow_down') { pollInterval += 5; continue; }
  if (result.error !== 'authorization_pending') throw new Error(result.error);
}

The slow_down error is mandatory to handle: if the client polls too frequently, the authorization server returns slow_down and requires the interval to increase by 5 seconds. Ignoring this causes the client to be rate-limited and the device code to expire before authorization completes.

For machine-to-machine MCP integrations where there is no user — for example, an AliveMCP probe authenticating against a monitored MCP server — the client credentials grant (grant_type=client_credentials) is the correct choice. Client credentials require no user interaction and produce a token immediately. The access token is then presented to your MCP server as a Bearer token in the Authorization header on the initialize request, where JWT validation takes over.

JWT Validation: Verification at the Transport Boundary

JWT validation is the authentication layer — it answers whether the bearer token presented at session open is valid, unexpired, and issued by the correct authority for your specific service. It runs exactly once per session, in HTTP middleware before the MCP initialize message is processed.

The three options that must always be set on jwtVerify are algorithms, issuer, and audience. Omitting any degrades the check:

Omitted option	What an attacker can now do
`algorithms`	Present a token with `"alg": "none"` or with HS256 using a brute-forced or extracted symmetric key
`issuer`	Present a valid token issued by a different authorization server — one the attacker controls
`audience`	Present a valid token issued for a different resource server — one that shares the same authorization server

import { createRemoteJWKSet, jwtVerify, errors as JoseErrors } from 'jose';

// Module-level singleton — do NOT create per-request
const JWKS = createRemoteJWKSet(
  new URL(`${process.env.AUTH_ISSUER}/.well-known/jwks.json`),
  { cacheMaxAge: 10 * 60 * 1000, cooldownDuration: 30_000 }
);

async function jwtAuthMiddleware(req, res, next) {
  const authHeader = req.headers['authorization'];
  if (!authHeader?.startsWith('Bearer ')) {
    return res.status(401).json({ error: 'missing_token' });
  }
  const token = authHeader.slice(7);
  try {
    const { payload } = await jwtVerify(token, JWKS, {
      algorithms: ['RS256', 'ES256'],   // never allow HS256 or none
      issuer:    process.env.AUTH_ISSUER,
      audience:  process.env.AUTH_AUDIENCE,
    });
    res.locals.identity = {
      sub:       payload.sub,
      scopes:    (payload['scope'] as string)?.split(' ') ?? [],
      plan:      payload['plan']      as string | undefined,
      tenant_id: payload['tenant_id'] as string | undefined,
    };
    next();
  } catch (err) {
    if (err instanceof JoseErrors.JWTExpired)
      return res.status(401).json({ error: 'token_expired' });
    if (err instanceof JoseErrors.JWTClaimValidationFailed)
      return res.status(401).json({ error: 'invalid_claims' });
    return res.status(401).json({ error: 'invalid_token' });
  }
}

Two error distinctions matter for client behaviour. token_expired tells the client its token has expired — it should use its refresh token to get a new one and retry. invalid_token tells the client the token is corrupt or was issued for the wrong service — there is no recovery without re-authentication. Clients that cannot distinguish these errors will either hammer the server with unrecoverable retry loops or fail to refresh when they could have.

The cacheMaxAge and cooldownDuration options on createRemoteJWKSet are not optional. Without a cooldownDuration, an attacker can exploit JWKS cache misses by sending tokens with arbitrary kid values — each unknown kid triggers a fetch to the JWKS endpoint, potentially rate-limiting the authorization server. A 30-second cooldown limits this to two fetches per minute regardless of how many unknown kid values are presented.

JWKS Key Rotation: Zero-Downtime Key Operations

JWKS key rotation is the operational layer — the procedure for retiring old signing keys and introducing new ones without terminating in-flight MCP sessions. It is the concern that most auth documentation omits entirely, despite being the most operationally disruptive if done incorrectly.

The failure mode is specific to MCP's long-lived session model. In a stateless REST API, token expiry is a normal event: the client gets a 401, uses its refresh token, and retries with a fresh access token signed by the new key. In an MCP session, the session is open and the LLM is mid-task. If the old key disappears from JWKS while a session is in progress, the next JWKS cache refresh will cause all tokens signed by the old key to fail validation — not because they expired, but because their signing key no longer exists.

The solution is a grace period:

Scenario	Grace period
Short-lived tokens (1h TTL), short sessions (<1h)	1 hour
Short-lived tokens (1h TTL), long sessions (up to 8h)	8 hours
Long-lived tokens (24h TTL)	24 hours
Emergency rotation (key compromised)	0 — accept session disruption

The zero-downtime rotation sequence is: generate the new key pair → publish the new public key to the JWKS endpoint alongside the old one → begin signing new tokens with the new private key → wait for the grace period → verify no tokens signed by the old key are still in active sessions (check last_used_at in the authorization server's key table) → remove the old key from JWKS → archive the old private key.

# Generate new key (example: RS256)
openssl genrsa -out keys/new-private.pem 2048
openssl rsa -in keys/new-private.pem -pubout -out keys/new-public.pem

# At this point: JWKS has old key + new key
# Authorization server signs new tokens with new private key
# Old tokens (signed by old key) remain valid while old key is in JWKS

# After grace period: check last_used_at for old key
# If recent use: extend grace period
# If no recent use: proceed

# Remove old key from JWKS (update key-set configuration)
# Archive old private key (keep in cold storage for audit)

AliveMCP detects failed rotations from outside. The probe token (signed by the old key) begins returning HTTP 401 within 60 seconds of a key disappearing from JWKS. This is reported as a sustained 401 spike on a server that was healthy 60 seconds ago — distinct from an expired-token 401 (which the probe handles with credential rotation) and distinct from an invalid-token 401 (which signals a configuration error). The external probe catches a failed JWKS rotation immediately, before users begin reporting session failures.

RBAC: From Verified Identity to Permitted Tool Calls

RBAC is the authorization layer — it answers "given that we have verified who this caller is, which of our tools are they allowed to invoke?" It operates on the verified identity that JWT validation (or API key validation) has bound to the session, not on the token directly.

The central structural decision is where the permission model lives. The wrong answer is to put role checks in individual tool handlers: if (identity.plan !== 'team') return { isError: true }. Scattered role checks become inconsistent as the tool set grows, and they make it impossible to audit the full permission model without reading every handler.

The correct answer is a central TOOL_PERMISSIONS map and a requireScopes wrapper that every tool handler uses:

// Single source of truth for the entire permission model
const TOOL_PERMISSIONS: Record<string, string[]> = {
  'server_status':   ['health:ping'],              // free public tier
  'endpoint_list':   ['data:read'],                // author tier
  'alert_configure': ['data:write'],               // author tier
  'team_dashboard':  ['data:read', 'team:access'], // team tier
  'sla_export':      ['admin:reports'],            // enterprise tier
};

// Scope expansion: roles map to scope sets at identity extraction time
const ROLE_SCOPE_EXPANSION: Record<string, string[]> = {
  'author': ['health:ping', 'data:read', 'data:write'],
  'team':   ['health:ping', 'data:read', 'data:write', 'team:access'],
  'admin':  ['health:ping', 'data:read', 'data:write', 'team:access', 'admin:reports'],
};

function requireScopes(requiredScopes: string[]) {
  return (identity: McpIdentity, toolName: string) => {
    const missing = requiredScopes.filter(s => !identity.scopes.includes(s));
    if (missing.length === 0) return null; // allow
    logger.warn({ event: 'rbac_deny', tool: toolName, sub: identity.sub,
                  tenant_id: identity.tenant_id, required: requiredScopes,
                  caller_scopes: identity.scopes, missing });
    return { isError: true, content: [{ type: 'text',
      text: `Insufficient permissions. Required: ${requiredScopes.join(', ')}`
    }]};
  };
}

Two details matter for correctness. First, scope expansion happens at identity extraction time — when res.locals.identity is populated in JWT middleware, a role claim like "role": "team" is immediately expanded to its full scope list via ROLE_SCOPE_EXPANSION. Individual tool handlers receive a scope list, never a role string. This prevents role-checking logic from being scattered across handlers and ensures the expansion logic has exactly one canonical location.

Second, per-tenant data isolation requires a structural constraint beyond RBAC. RBAC controls which tools a session can call; it does not control which tenant's data those tools return. Every database query for tenant data must include WHERE tenant_id = $1 with the tenant_id from the verified identity — not from a request parameter, which could be tampered with. Cross-tenant requests should return a generic "not found" response, not an "access denied" — revealing that a resource exists for another tenant is itself an information leak.

API Key Management: The Simpler Alternative

API key management is the parallel path for MCP servers where you control both the client and the server — internal tool integrations, operator dashboards, CI/CD pipelines, or the probe credentials that AliveMCP uses to monitor servers. It eliminates OAuth and JWT entirely while preserving the same verified-identity-with-scopes output that RBAC consumes.

The most common mistake in API key design is using UUIDs. UUIDs are 122 bits of entropy — sufficient for database primary keys but not for credentials. A 256-bit random key requires 2¹²⁸ expected guesses to brute-force; a UUID requires 2⁶¹. Against a leaked hash database, this difference is significant. Use crypto.randomBytes(32).toString('hex') for 256 bits:

// Key format: mcp_{env}_{8-char-prefix}_{64-char-secret}
// Prefix: identifies the key in logs without revealing the secret
// Full key: shown once at creation, never stored in plaintext
function generateApiKey(env: 'live' | 'test'): { key: string; prefix: string; hash: string } {
  const secret  = crypto.randomBytes(32).toString('hex');   // 64 hex chars
  const prefix  = secret.slice(0, 8);
  const key     = `mcp_${env}_${prefix}_${secret}`;
  const hash    = crypto.createHash('sha256').update(key).digest('hex');
  return { key, prefix, hash };
}

The mcp_{env}_{prefix}_ format is not just cosmetic. Git secret scanners can be configured to detect tokens matching this pattern — the same way GitHub detects its own ghp_-prefixed tokens. If a key is accidentally committed, the scanner fires. If the full key appears in a log line, the prefix portion identifies which key it is without exposing the secret portion.

The database schema stores the prefix and hash, never the plaintext key. Lookup uses the prefix as an index (fast B-tree scan on eight characters) and then verifies the hash only if the prefix matches:

-- Never store plaintext. key_prefix is for lookup + log correlation.
CREATE TABLE api_keys (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  key_prefix  TEXT NOT NULL UNIQUE,     -- first 8 chars of secret
  key_hash    TEXT NOT NULL,            -- SHA-256 of the full key
  scopes      TEXT[] NOT NULL DEFAULT '{}',
  tenant_id   UUID REFERENCES tenants(id),
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  last_used_at TIMESTAMPTZ,
  expires_at  TIMESTAMPTZ,
  revoked_at  TIMESTAMPTZ              -- never DELETE rows; preserve audit trail
);

Validation must use constant-time comparison to prevent timing attacks. Using === or bcrypt is wrong for different reasons: === short-circuits on the first mismatched byte, leaking timing information; bcrypt adds 100ms+ overhead per request at its recommended cost factor, which is prohibitive for an API key check that happens on every request. The correct approach: SHA-256 both the stored hash and the presented key, then compare with crypto.timingSafeEqual:

function verifyApiKey(presentedKey: string, storedHash: string): boolean {
  const presentedHash = crypto.createHash('sha256').update(presentedKey).digest();
  const storedHashBuf  = Buffer.from(storedHash, 'hex');
  return crypto.timingSafeEqual(presentedHash, storedHashBuf);
}

Once validated, the key's scopes array from the database row populates the same McpIdentity shape that JWT validation produces — the same RBAC layer, the same requireScopes wrapper, the same per-tenant query pattern. API key management is an alternative credential, not an alternative auth system.

How the Five Concerns Compose

The five concerns are not alternatives to each other — they address different phases of a single request lifecycle and compose in a specific order:

Token acquisition (OAuth device flow) happens before the MCP session opens. The LLM client polls the authorization server and receives an access token. This is client-side; the MCP server is not involved. For machine-to-machine clients (probes, CI/CD), client credentials flow replaces device flow. For API key clients, this phase is skipped entirely — the key is the credential.
Authentication (JWT validation or API key validation) happens at HTTP middleware before initialize. The verified identity — sub, expanded scopes, tenant_id, plan — is stored in res.locals.identity. This identity is the only auth state that subsequent tool calls consult.
Key rotation (JWKS) runs asynchronously in the background, independent of individual sessions. The JWKS endpoint serves both old and new keys during the grace period. Individual sessions see no change — their jose JWKS client fetches the current key set and finds the key whose kid matches the token header.
Authorization (RBAC) happens inside each tool handler, via the requireScopes wrapper. It reads identity.scopes from the session context — already expanded at step 2 — and returns an MCP error response if any required scope is missing. No token re-verification occurs here.
Per-tenant data isolation is enforced structurally in every database query, using identity.tenant_id from the session context. It is not a fifth separate concern but a requirement that follows from RBAC: RBAC controls which tools can be called; tenant isolation controls which data those tools see.

The composition rule between API keys and OAuth+JWT: they produce the same output (an McpIdentity with scopes) and feed into the same RBAC layer. Choose based on whether you need federated identity. If the client is also yours — an AliveMCP probe, an internal integration, a CI/CD pipeline — API keys are simpler and equally correct. If the client is a third-party LLM platform or a user's own agent, OAuth+JWT is necessary because the issuing authority is external.

The ordering that enforces the security model in the HTTP middleware stack is:

// Express middleware order matters — each step can reject before the next runs
app.use(correlationId);    // attach request ID for log correlation
app.use(structuredLogger); // log every request with correlation ID
app.use(rateLimiter);      // reject before auth to save auth overhead on floods
app.use(jwtOrApiKeyAuth);  // populate res.locals.identity or reject
app.use(mcpTransport);     // MCP SDK reads res.locals.identity for session binding

Rate limiting before auth is a deliberate choice: it prevents credential-stuffing attacks from reaching the JWT validation or API key hash-comparison step, where even constant-time operations consume CPU.

The Gap External Probes Fill

A correctly implemented five-concern auth system still has a class of failure that internal auth checks cannot detect: failures that prevent the auth system itself from operating.

The JWKS endpoint is unreachable — createRemoteJWKSet fails on first fetch at cold start, and the server refuses all connections even though the MCP server process is running
The authorization server is returning 500s — new tokens cannot be issued; existing sessions are unaffected, but no new sessions can open
A JWKS rotation removed an old key too early — all sessions holding tokens signed by the old key begin failing silently, at the rate of JWKS cache expiry
A misconfigured audience value — tokens that were valid yesterday start failing today because an environment variable was changed without updating the auth server configuration
TLS certificate expiry on the JWKS endpoint — the JWKS fetch fails with a certificate error; the JWKS cache serves stale keys until they expire, then all new sessions fail

These failure modes produce 401 errors at the session open boundary, not inside tool handlers. An internal health check endpoint that calls a tool will appear healthy — the tool handler is running correctly. An external probe that completes a full session (open with a real credential, call a tool, verify the response, close) catches these cases because it exercises the entire auth pipeline from the outside.

AliveMCP probes complete the full MCP session lifecycle every 60 seconds using a dedicated probe credential with a minimal health:ping scope. A sustained 401 at session open — while the MCP server was returning 200s one minute ago — is the signature of an auth infrastructure failure rather than a server failure. The uptime dashboard reports these as distinct event types so operators can route them to the auth infrastructure team, not the application on-call rotation.