Authentication guide · 2026-06-04 · Production MCP servers

MCP Server Authentication and Authorization Guide: JWT Validation, JWKS Rotation, RBAC, OAuth Device Flow, and API Key Management

Most MCP server tutorials add a single Authorization: Bearer check and call it done. That is authentication — the verification that a caller holds a credential issued by an authority you trust. Authorization — the determination of what that verified caller is permitted to call — is a separate concern that most tutorials skip entirely. And neither concern covers the operational question: what happens to in-flight sessions when you rotate the signing keys that underpin your token verification? A complete authentication and authorization system for a production MCP server has five concerns: JWT validation as the verification layer, JWKS key rotation as the operational layer that keeps verification working over time, RBAC as the authorization layer that maps verified identity to permitted tool calls, OAuth 2.0 device flow as the token acquisition mechanism for LLM clients, and API key management as the simpler alternative for deployments you control end-to-end. This guide covers them as a system — how each concern addresses a distinct part of the auth problem, how they compose, and what remains invisible to internal auth checks even when all five are correctly implemented.

TL;DR

Why Auth for MCP Servers Is Different

A conventional HTTP API authenticates each request independently. A client sends credentials with every call; the server verifies them and returns a response. State is request-scoped. If a token expires mid-session, the client retries with a fresh token and the next request works.

MCP servers have two properties that change this model significantly.

First, MCP sessions are long-lived. An MCP session opens with initialize and may persist for minutes or hours while an LLM iterates over a complex task, issuing dozens of tool calls. Authentication happens at session open; subsequent tool calls reuse the session context. If you re-validate the bearer token on each tool call, you will eventually validate a token that has expired mid-session — not because the caller is unauthorised, but because the session outlasted the token TTL. The correct model is: authenticate at initialize, bind the verified identity to the session, reuse it for all subsequent calls.

Second, LLM clients cannot perform browser redirects. The standard OAuth 2.0 authorization code flow assumes the client can open a browser, handle a redirect to a callback URL, and exchange a code for a token. An LLM agent running a task cannot do this. OAuth 2.0 device flow was designed for exactly this constraint: the client requests a device code, displays a short URL and user code, and polls for the token while the user authenticates in a separate browser session. Device flow is the correct acquisition mechanism for MCP clients.

These two properties — session-scoped auth and non-browser clients — mean that a complete auth system for MCP servers must cover: token acquisition (device flow), verification at session open (JWT validation), key rotation that does not break in-flight sessions (JWKS rotation), per-tool-call permission enforcement that does not re-check the token (RBAC), and optionally a simpler credential type for controlled deployments (API keys).

The Five Concerns and Their Roles

Concern Phase What it answers What it cannot do alone
OAuth 2.0 device flow Token acquisition How does an LLM client obtain a valid token? Does not verify the token on the resource server side; does not control what the token can call
JWT validation Authentication Is this token valid, unexpired, and issued for my service? Does not enforce what the verified caller can call; does not handle key rotation grace periods
JWKS key rotation Key operations How do we rotate signing keys without breaking in-flight sessions? Does not generate tokens; does not enforce permissions; only manages the key lifecycle
RBAC Authorization Given verified identity, which tools can this caller invoke? Does not verify the token; depends on identity already being extracted and bound to the session
API key management Alternative credential How do controlled clients authenticate without OAuth? Does not federate identity; the issuing system must be the same system that validates — no third-party auth server

The table shows the composition logic: OAuth produces the credential; JWT validation verifies it; JWKS rotation keeps verification infrastructure current; RBAC turns verified identity into an access decision; API keys are a parallel path that skips OAuth and JWT entirely but produces the same result (a verified identity with a scope list) that RBAC consumes.

OAuth 2.0 Device Flow: How LLM Clients Get Tokens

Device flow is the token acquisition layer — the mechanism by which an LLM client gets a JWT it can present to your MCP server. It exists because the standard authorization code flow assumes browser redirect capability that LLM agents do not have.

The flow has four phases. First, the client posts to the device authorization endpoint to get a device code and a verification URI. Second, it displays the URI and user code to the user (or passes them to the human-in-the-loop approval step). Third, it polls the token endpoint with the device code. Fourth, it receives an access token (and optionally a refresh token) when the user completes authorization.

// Phase 1: request device and user codes
const deviceResponse = await fetch(metadata.device_authorization_endpoint, {
  method: 'POST',
  headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
  body: new URLSearchParams({
    client_id: process.env.OAUTH_CLIENT_ID!,
    scope: 'openid profile mcp:tools',
  }),
});
const { device_code, user_code, verification_uri_complete, interval } =
  await deviceResponse.json();

// Phase 2: show the user where to go
console.log(`Authorize at: ${verification_uri_complete}`);
console.log(`Or visit ${verification_uri_complete.split('?')[0]} and enter: ${user_code}`);

// Phase 3: poll
let pollInterval = interval ?? 5;
while (true) {
  await new Promise(r => setTimeout(r, pollInterval * 1000));
  const tokenResponse = await fetch(metadata.token_endpoint, {
    method: 'POST',
    headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
    body: new URLSearchParams({
      client_id: process.env.OAUTH_CLIENT_ID!,
      device_code,
      grant_type: 'urn:ietf:params:oauth:grant-type:device_code',
    }),
  });
  const result = await tokenResponse.json();
  if (result.access_token) return result; // Phase 4: done
  if (result.error === 'slow_down') { pollInterval += 5; continue; }
  if (result.error !== 'authorization_pending') throw new Error(result.error);
}

The slow_down error is mandatory to handle: if the client polls too frequently, the authorization server returns slow_down and requires the interval to increase by 5 seconds. Ignoring this causes the client to be rate-limited and the device code to expire before authorization completes.

For machine-to-machine MCP integrations where there is no user — for example, an AliveMCP probe authenticating against a monitored MCP server — the client credentials grant (grant_type=client_credentials) is the correct choice. Client credentials require no user interaction and produce a token immediately. The access token is then presented to your MCP server as a Bearer token in the Authorization header on the initialize request, where JWT validation takes over.

JWT Validation: Verification at the Transport Boundary

JWT validation is the authentication layer — it answers whether the bearer token presented at session open is valid, unexpired, and issued by the correct authority for your specific service. It runs exactly once per session, in HTTP middleware before the MCP initialize message is processed.

The three options that must always be set on jwtVerify are algorithms, issuer, and audience. Omitting any degrades the check:

Omitted optionWhat an attacker can now do
algorithmsPresent a token with "alg": "none" or with HS256 using a brute-forced or extracted symmetric key
issuerPresent a valid token issued by a different authorization server — one the attacker controls
audiencePresent a valid token issued for a different resource server — one that shares the same authorization server
import { createRemoteJWKSet, jwtVerify, errors as JoseErrors } from 'jose';

// Module-level singleton — do NOT create per-request
const JWKS = createRemoteJWKSet(
  new URL(`${process.env.AUTH_ISSUER}/.well-known/jwks.json`),
  { cacheMaxAge: 10 * 60 * 1000, cooldownDuration: 30_000 }
);

async function jwtAuthMiddleware(req, res, next) {
  const authHeader = req.headers['authorization'];
  if (!authHeader?.startsWith('Bearer ')) {
    return res.status(401).json({ error: 'missing_token' });
  }
  const token = authHeader.slice(7);
  try {
    const { payload } = await jwtVerify(token, JWKS, {
      algorithms: ['RS256', 'ES256'],   // never allow HS256 or none
      issuer:    process.env.AUTH_ISSUER,
      audience:  process.env.AUTH_AUDIENCE,
    });
    res.locals.identity = {
      sub:       payload.sub,
      scopes:    (payload['scope'] as string)?.split(' ') ?? [],
      plan:      payload['plan']      as string | undefined,
      tenant_id: payload['tenant_id'] as string | undefined,
    };
    next();
  } catch (err) {
    if (err instanceof JoseErrors.JWTExpired)
      return res.status(401).json({ error: 'token_expired' });
    if (err instanceof JoseErrors.JWTClaimValidationFailed)
      return res.status(401).json({ error: 'invalid_claims' });
    return res.status(401).json({ error: 'invalid_token' });
  }
}

Two error distinctions matter for client behaviour. token_expired tells the client its token has expired — it should use its refresh token to get a new one and retry. invalid_token tells the client the token is corrupt or was issued for the wrong service — there is no recovery without re-authentication. Clients that cannot distinguish these errors will either hammer the server with unrecoverable retry loops or fail to refresh when they could have.

The cacheMaxAge and cooldownDuration options on createRemoteJWKSet are not optional. Without a cooldownDuration, an attacker can exploit JWKS cache misses by sending tokens with arbitrary kid values — each unknown kid triggers a fetch to the JWKS endpoint, potentially rate-limiting the authorization server. A 30-second cooldown limits this to two fetches per minute regardless of how many unknown kid values are presented.

JWKS Key Rotation: Zero-Downtime Key Operations

JWKS key rotation is the operational layer — the procedure for retiring old signing keys and introducing new ones without terminating in-flight MCP sessions. It is the concern that most auth documentation omits entirely, despite being the most operationally disruptive if done incorrectly.

The failure mode is specific to MCP's long-lived session model. In a stateless REST API, token expiry is a normal event: the client gets a 401, uses its refresh token, and retries with a fresh access token signed by the new key. In an MCP session, the session is open and the LLM is mid-task. If the old key disappears from JWKS while a session is in progress, the next JWKS cache refresh will cause all tokens signed by the old key to fail validation — not because they expired, but because their signing key no longer exists.

The solution is a grace period:

ScenarioGrace period
Short-lived tokens (1h TTL), short sessions (<1h)1 hour
Short-lived tokens (1h TTL), long sessions (up to 8h)8 hours
Long-lived tokens (24h TTL)24 hours
Emergency rotation (key compromised)0 — accept session disruption

The zero-downtime rotation sequence is: generate the new key pair → publish the new public key to the JWKS endpoint alongside the old one → begin signing new tokens with the new private key → wait for the grace period → verify no tokens signed by the old key are still in active sessions (check last_used_at in the authorization server's key table) → remove the old key from JWKS → archive the old private key.

# Generate new key (example: RS256)
openssl genrsa -out keys/new-private.pem 2048
openssl rsa -in keys/new-private.pem -pubout -out keys/new-public.pem

# At this point: JWKS has old key + new key
# Authorization server signs new tokens with new private key
# Old tokens (signed by old key) remain valid while old key is in JWKS

# After grace period: check last_used_at for old key
# If recent use: extend grace period
# If no recent use: proceed

# Remove old key from JWKS (update key-set configuration)
# Archive old private key (keep in cold storage for audit)

AliveMCP detects failed rotations from outside. The probe token (signed by the old key) begins returning HTTP 401 within 60 seconds of a key disappearing from JWKS. This is reported as a sustained 401 spike on a server that was healthy 60 seconds ago — distinct from an expired-token 401 (which the probe handles with credential rotation) and distinct from an invalid-token 401 (which signals a configuration error). The external probe catches a failed JWKS rotation immediately, before users begin reporting session failures.

RBAC: From Verified Identity to Permitted Tool Calls

RBAC is the authorization layer — it answers "given that we have verified who this caller is, which of our tools are they allowed to invoke?" It operates on the verified identity that JWT validation (or API key validation) has bound to the session, not on the token directly.

The central structural decision is where the permission model lives. The wrong answer is to put role checks in individual tool handlers: if (identity.plan !== 'team') return { isError: true }. Scattered role checks become inconsistent as the tool set grows, and they make it impossible to audit the full permission model without reading every handler.

The correct answer is a central TOOL_PERMISSIONS map and a requireScopes wrapper that every tool handler uses:

// Single source of truth for the entire permission model
const TOOL_PERMISSIONS: Record<string, string[]> = {
  'server_status':   ['health:ping'],              // free public tier
  'endpoint_list':   ['data:read'],                // author tier
  'alert_configure': ['data:write'],               // author tier
  'team_dashboard':  ['data:read', 'team:access'], // team tier
  'sla_export':      ['admin:reports'],            // enterprise tier
};

// Scope expansion: roles map to scope sets at identity extraction time
const ROLE_SCOPE_EXPANSION: Record<string, string[]> = {
  'author': ['health:ping', 'data:read', 'data:write'],
  'team':   ['health:ping', 'data:read', 'data:write', 'team:access'],
  'admin':  ['health:ping', 'data:read', 'data:write', 'team:access', 'admin:reports'],
};

function requireScopes(requiredScopes: string[]) {
  return (identity: McpIdentity, toolName: string) => {
    const missing = requiredScopes.filter(s => !identity.scopes.includes(s));
    if (missing.length === 0) return null; // allow
    logger.warn({ event: 'rbac_deny', tool: toolName, sub: identity.sub,
                  tenant_id: identity.tenant_id, required: requiredScopes,
                  caller_scopes: identity.scopes, missing });
    return { isError: true, content: [{ type: 'text',
      text: `Insufficient permissions. Required: ${requiredScopes.join(', ')}`
    }]};
  };
}

Two details matter for correctness. First, scope expansion happens at identity extraction time — when res.locals.identity is populated in JWT middleware, a role claim like "role": "team" is immediately expanded to its full scope list via ROLE_SCOPE_EXPANSION. Individual tool handlers receive a scope list, never a role string. This prevents role-checking logic from being scattered across handlers and ensures the expansion logic has exactly one canonical location.

Second, per-tenant data isolation requires a structural constraint beyond RBAC. RBAC controls which tools a session can call; it does not control which tenant's data those tools return. Every database query for tenant data must include WHERE tenant_id = $1 with the tenant_id from the verified identity — not from a request parameter, which could be tampered with. Cross-tenant requests should return a generic "not found" response, not an "access denied" — revealing that a resource exists for another tenant is itself an information leak.

API Key Management: The Simpler Alternative

API key management is the parallel path for MCP servers where you control both the client and the server — internal tool integrations, operator dashboards, CI/CD pipelines, or the probe credentials that AliveMCP uses to monitor servers. It eliminates OAuth and JWT entirely while preserving the same verified-identity-with-scopes output that RBAC consumes.

The most common mistake in API key design is using UUIDs. UUIDs are 122 bits of entropy — sufficient for database primary keys but not for credentials. A 256-bit random key requires 2128 expected guesses to brute-force; a UUID requires 261. Against a leaked hash database, this difference is significant. Use crypto.randomBytes(32).toString('hex') for 256 bits:

// Key format: mcp_{env}_{8-char-prefix}_{64-char-secret}
// Prefix: identifies the key in logs without revealing the secret
// Full key: shown once at creation, never stored in plaintext
function generateApiKey(env: 'live' | 'test'): { key: string; prefix: string; hash: string } {
  const secret  = crypto.randomBytes(32).toString('hex');   // 64 hex chars
  const prefix  = secret.slice(0, 8);
  const key     = `mcp_${env}_${prefix}_${secret}`;
  const hash    = crypto.createHash('sha256').update(key).digest('hex');
  return { key, prefix, hash };
}

The mcp_{env}_{prefix}_ format is not just cosmetic. Git secret scanners can be configured to detect tokens matching this pattern — the same way GitHub detects its own ghp_-prefixed tokens. If a key is accidentally committed, the scanner fires. If the full key appears in a log line, the prefix portion identifies which key it is without exposing the secret portion.

The database schema stores the prefix and hash, never the plaintext key. Lookup uses the prefix as an index (fast B-tree scan on eight characters) and then verifies the hash only if the prefix matches:

-- Never store plaintext. key_prefix is for lookup + log correlation.
CREATE TABLE api_keys (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  key_prefix  TEXT NOT NULL UNIQUE,     -- first 8 chars of secret
  key_hash    TEXT NOT NULL,            -- SHA-256 of the full key
  scopes      TEXT[] NOT NULL DEFAULT '{}',
  tenant_id   UUID REFERENCES tenants(id),
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  last_used_at TIMESTAMPTZ,
  expires_at  TIMESTAMPTZ,
  revoked_at  TIMESTAMPTZ              -- never DELETE rows; preserve audit trail
);

Validation must use constant-time comparison to prevent timing attacks. Using === or bcrypt is wrong for different reasons: === short-circuits on the first mismatched byte, leaking timing information; bcrypt adds 100ms+ overhead per request at its recommended cost factor, which is prohibitive for an API key check that happens on every request. The correct approach: SHA-256 both the stored hash and the presented key, then compare with crypto.timingSafeEqual:

function verifyApiKey(presentedKey: string, storedHash: string): boolean {
  const presentedHash = crypto.createHash('sha256').update(presentedKey).digest();
  const storedHashBuf  = Buffer.from(storedHash, 'hex');
  return crypto.timingSafeEqual(presentedHash, storedHashBuf);
}

Once validated, the key's scopes array from the database row populates the same McpIdentity shape that JWT validation produces — the same RBAC layer, the same requireScopes wrapper, the same per-tenant query pattern. API key management is an alternative credential, not an alternative auth system.

How the Five Concerns Compose

The five concerns are not alternatives to each other — they address different phases of a single request lifecycle and compose in a specific order:

  1. Token acquisition (OAuth device flow) happens before the MCP session opens. The LLM client polls the authorization server and receives an access token. This is client-side; the MCP server is not involved. For machine-to-machine clients (probes, CI/CD), client credentials flow replaces device flow. For API key clients, this phase is skipped entirely — the key is the credential.
  2. Authentication (JWT validation or API key validation) happens at HTTP middleware before initialize. The verified identity — sub, expanded scopes, tenant_id, plan — is stored in res.locals.identity. This identity is the only auth state that subsequent tool calls consult.
  3. Key rotation (JWKS) runs asynchronously in the background, independent of individual sessions. The JWKS endpoint serves both old and new keys during the grace period. Individual sessions see no change — their jose JWKS client fetches the current key set and finds the key whose kid matches the token header.
  4. Authorization (RBAC) happens inside each tool handler, via the requireScopes wrapper. It reads identity.scopes from the session context — already expanded at step 2 — and returns an MCP error response if any required scope is missing. No token re-verification occurs here.
  5. Per-tenant data isolation is enforced structurally in every database query, using identity.tenant_id from the session context. It is not a fifth separate concern but a requirement that follows from RBAC: RBAC controls which tools can be called; tenant isolation controls which data those tools see.

The composition rule between API keys and OAuth+JWT: they produce the same output (an McpIdentity with scopes) and feed into the same RBAC layer. Choose based on whether you need federated identity. If the client is also yours — an AliveMCP probe, an internal integration, a CI/CD pipeline — API keys are simpler and equally correct. If the client is a third-party LLM platform or a user's own agent, OAuth+JWT is necessary because the issuing authority is external.

The ordering that enforces the security model in the HTTP middleware stack is:

// Express middleware order matters — each step can reject before the next runs
app.use(correlationId);    // attach request ID for log correlation
app.use(structuredLogger); // log every request with correlation ID
app.use(rateLimiter);      // reject before auth to save auth overhead on floods
app.use(jwtOrApiKeyAuth);  // populate res.locals.identity or reject
app.use(mcpTransport);     // MCP SDK reads res.locals.identity for session binding

Rate limiting before auth is a deliberate choice: it prevents credential-stuffing attacks from reaching the JWT validation or API key hash-comparison step, where even constant-time operations consume CPU.

The Gap External Probes Fill

A correctly implemented five-concern auth system still has a class of failure that internal auth checks cannot detect: failures that prevent the auth system itself from operating.

These failure modes produce 401 errors at the session open boundary, not inside tool handlers. An internal health check endpoint that calls a tool will appear healthy — the tool handler is running correctly. An external probe that completes a full session (open with a real credential, call a tool, verify the response, close) catches these cases because it exercises the entire auth pipeline from the outside.

AliveMCP probes complete the full MCP session lifecycle every 60 seconds using a dedicated probe credential with a minimal health:ping scope. A sustained 401 at session open — while the MCP server was returning 200s one minute ago — is the signature of an auth infrastructure failure rather than a server failure. The uptime dashboard reports these as distinct event types so operators can route them to the auth infrastructure team, not the application on-call rotation.

Further Reading

This guide synthesises the five batch-18 authentication and authorization deep-dive pages. For implementation detail on any individual concern:

For the observability stack that instruments your auth layer — capturing denied tool calls, auth failures, and JWKS fetch errors as structured events — see the observability stack guide. For the infrastructure hardening layer that sits in front of auth — API gateway JWT pre-validation, TLS termination, rate limiting, and secrets management for auth credentials — see the infrastructure hardening guide.