Guide · Cloud Monitoring

Monitoring GCP-hosted MCP servers

Cloud Run is GCP's natural home for serverless MCP servers — fast deploys, automatic HTTPS, scale-to-zero on free tier. But GCP-hosted MCPs carry their own failure taxonomy: Cloud Run cold starts, Identity-Aware Proxy (IAP) authentication complexity, Workload Identity Federation misconfiguration, and VPC Service Controls perimeter violations. An HTTP ping catches none of these at the protocol layer. Here's what each failure mode looks like in a probe log and how to wire your monitoring to catch them.

TL;DR

Cloud Run MCPs on the free tier scale to zero after 15 minutes of inactivity — cold starts of 1–5 seconds for Node, up to 15 seconds for JVM. IAP-protected Cloud Run MCPs present a 403 to any probe that doesn't carry a valid Google-signed OIDC token. Workload Identity Federation misconfiguration surfaces as 403s on calls to GCP APIs — the MCP appears up while every tool that touches GCP fails. VPC-SC violations look like 403s with a PERMISSION_DENIED error that only appears in Cloud Logging. An MCP-aware monitor probes the full handshake; a plain HTTP monitor shows green on all four of these while real calls fail.

GCP hosting patterns for MCP servers

Most GCP-hosted MCP servers fall into one of three patterns:

Cloud Run (public endpoint): The most common pattern. Cloud Run provides a managed HTTPS endpoint, automatic TLS, and scale-to-zero. Free tier includes 2 million requests/month and 360,000 GB-seconds of memory. Failure modes: cold start, maximum request timeout (3600 seconds — much more generous than AWS API Gateway's 29 seconds), concurrency limit (default 80 per instance), CPU throttling when scaled to zero.
Cloud Run (IAP-protected, internal): Enterprise MCPs that shouldn't be publicly accessible. IAP sits in front of Cloud Run and requires a Google-signed OIDC token on every request. Any monitoring probe — including AliveMCP — needs a service account token with IAP access to probe these endpoints. Failure modes: token expiry, service account key rotation, IAP policy changes removing the monitoring service account's access.
GKE (Autopilot or Standard) + GKE Gateway: The step-up pattern for MCP servers that need GPU access, persistent storage, or fine-grained network policies. Failure modes: pod eviction during scale events, node pool upgrade rolling restarts, GKE Gateway backend health check failures.

Failure mode 1: Cloud Run cold start

Cloud Run's free tier scales to zero after the configured minimum instances setting (default: 0). After 15 minutes of inactivity, the instance is deallocated. The first request after that idle window triggers a cold start: GCP allocates a new container instance, pulls the image from Artifact Registry, initializes the runtime, and starts your MCP server process.

Cold start latency on Cloud Run:

Node.js 20 (minimal MCP server): 800ms–2 seconds. Node starts fast; if your server imports a large dependency bundle (e.g., an SDK with bundled models), add 500ms–3 seconds for initialization.
Python 3.12 (with dependencies cached): 1–4 seconds, depending on the size of the installed packages layer.
JVM (Kotlin/Java MCP server, no JVM flags): 5–15 seconds. JVM cold starts on Cloud Run are often worse than on Lambda because there's no SnapStart equivalent on GCP yet. Quarkus native or GraalVM native image reduces this to sub-second.

The monitoring implication: a standard 10-second probe timeout will miss JVM cold starts (>10 seconds), and even Node cold starts can trip a tight 2-second timeout on an overloaded free-tier Cloud Run instance. Recommended: set 30-second probe timeout for Cloud Run MCPs, and use N=3 consecutive-failure hysteresis (so a single cold-start timeout never fires an alert). See MCP server cold start for the full detection and suppression guide.

Fix to eliminate cold starts: Set --min-instances=1 on your Cloud Run service. This keeps one warm instance running at all times. Cost: approximately $5.40/month for a 256MB, 1-vCPU instance running continuously — comparable to the cheapest always-on VPS tier. AliveMCP's 60-second probes also act as an incidental keep-alive for Cloud Run instances with >15-minute idle timeouts (the probes reset the idle timer), though this is a side effect, not a guarantee.

Failure mode 2: Identity-Aware Proxy (IAP) authentication failure

IAP-protected Cloud Run MCPs require every request to carry an Authorization header with a Google-signed OIDC token scoped to the IAP client ID. If the token is absent, expired, or signed for the wrong audience, IAP returns a 302 redirect to Google's OAuth flow (for browser clients) or a 403 with a JSON body (for API clients). Your MCP server never sees the request.

Probe signature: The probe receives an HTTP 403 with a response body like {"error": "PERMISSION_DENIED", "message": "IAP rejected the request: Missing bearer token"}. The MCP protocol layer is never reached — initialize never runs. This looks like a server-down event at the HTTP layer but is actually an auth configuration issue.

Common IAP failure triggers:

Service account key used for monitoring probes is rotated without updating the monitoring config.
The IAP policy is updated (e.g., a security review removes the monitoring service account from the allowed members list).
The OIDC token audience changes when the IAP OAuth client is recreated (e.g., after project-level IAP reconfiguration).
The monitoring service account is in a different GCP project and the cross-project IAP trust configuration is modified.

Fix: AliveMCP Author tier ($9/mo) supports OIDC token-based probing for IAP-protected endpoints: provide the service account JSON key and the IAP client ID, and AliveMCP generates a fresh token on each probe. The token refresh cycle (1-hour token lifetime, refreshed every 55 minutes) is handled automatically. Alerts fire if the token refresh fails or if IAP returns a non-200 response, separately from MCP protocol failures — so you can distinguish "our monitoring credential expired" from "the MCP server itself is down."

Failure mode 3: Workload Identity Federation misconfiguration

MCP servers on Cloud Run that call GCP APIs (Vertex AI, BigQuery, Cloud Storage, Firestore) use the attached service account's Workload Identity to authenticate. When this is correctly configured, the Cloud Run service account automatically gets short-lived credentials for GCP API calls — no key management required. It fails when:

The service account is removed from the IAM binding on a GCP resource (e.g., a BigQuery dataset's reader role is revoked during an IAM cleanup), causing 403s on all queries to that resource.
The Cloud Run service is deployed in one GCP project but the GCP resources it accesses (Firestore, Vertex AI endpoint) are in another project — cross-project Workload Identity requires explicit binding and is often missed during migrations.
A Vertex AI model endpoint is deleted or deprecated, and the MCP tool that calls it receives 404 or 410 responses from the Vertex API.

Probe signature: initialize succeeds (authentication to the MCP endpoint is independent of the service account's GCP API permissions). tools/list returns the tool list correctly (tool registration is static). But any tool call that makes a GCP API request returns a JSON-RPC error with an error message containing the GCP API's 403/404 response. This is invisible to HTTP uptime monitors and to monitors that only check initialize and tools/list.

Fix: Configure AliveMCP's credentialed probe to run a lightweight read-only test tool call (e.g., a list operation or a status query) on every probe cycle. If the tool call fails with a permission error while tools/list succeeds, fire a P2 alert ("tools registered but GCP API calls failing") rather than a P1 ("server down"). This distinguishes infrastructure failure from the more common configuration drift.

Failure mode 4: VPC Service Controls (VPC-SC) perimeter violations

Enterprise GCP deployments use VPC Service Controls to define perimeters around GCP services — access to BigQuery, Cloud Storage, or Vertex AI from outside the perimeter is blocked at the GCP API layer, not at the network layer. A Cloud Run service that makes calls to BigQuery within the same perimeter is fine; if the Cloud Run service's project is moved outside the perimeter (or the perimeter is updated to exclude it), all GCP API calls from the MCP server start failing with PERMISSION_DENIED responses that look like IAM failures but are actually VPC-SC violations.

VPC-SC violations are logged in Cloud Audit Logs under the cloudaudit.googleapis.com/policy log sink — they don't appear in the MCP server's own application logs. From the probe's perspective, the failure looks identical to a Workload Identity misconfiguration (tool calls fail, tools/list succeeds). Distinguishing them requires checking Cloud Audit Logs for violations entries, which require GCP access — not something an external probe can detect.

Monitoring recommendation: Set up a Cloud Logging metric alert on resource.type="audited_resource" AND protoPayload.status.code=7 AND protoPayload.serviceName=<your-mcp-service-domain> to catch VPC-SC violations early. Combine this internal alert with AliveMCP's external probe so you have both sides: AliveMCP catches the user-facing failure (tool calls return JSON-RPC errors), Cloud Logging catches the root cause (VPC-SC violation).

Monitoring GKE-hosted MCP servers

MCP servers on GKE (Autopilot or Standard) introduce cluster-level failure modes absent from serverless patterns:

Node pool upgrades: GKE Autopilot performs automatic node pool upgrades. During a node upgrade, pods are evicted and rescheduled on the new nodes. If your MCP server doesn't have a PodDisruptionBudget (PDB) set, the upgrade may evict all instances simultaneously, causing a brief outage. Set a PDB with minAvailable: 1.
GKE Gateway backend health check failures: GKE Gateway (successor to GKE Ingress) runs its own health checks against pods. If the health check path doesn't match your MCP server's health route, the backend is marked unhealthy and traffic is dropped. Recommended: expose a /healthz HTTP GET endpoint that returns 200 alongside your MCP SSE endpoint, and configure the GKE Gateway health check to use that path.
GPU node scale-down: MCPs that use Vertex AI or locally-embedded models on GPU nodes may lose GPU instances during cluster scale-down, forcing cold starts on the next request. GPU node cold starts are significantly longer (2–5 minutes for CUDA runtime initialization) than CPU node cold starts.