Guide · Observability
MCP server log aggregation
MCP servers run in containers that are ephemeral: when a container is replaced after a deployment or an OOM kill, its local logs are gone. Log aggregation ships every log line to a durable central store — Grafana Loki, Elasticsearch, or AWS CloudWatch Logs — before the container can be destroyed. Once logs are centralised, you can query across multiple container instances, correlate errors with deployment timestamps, and set up alerts on log patterns (e.g., error rate, circuit-breaker state change). This guide covers the standard pipeline: Pino JSON to stdout → Docker captures it → Promtail or Filebeat ships it to the aggregator → Grafana or Kibana queries it.
TL;DR
Write JSON logs to stdout with Pino. Use Docker's default json-file log driver (logs go to /var/lib/docker/containers/<id>/<id>-json.log). Run Promtail as a sidecar or DaemonSet to tail those files and push to Loki. In Grafana, query with LogQL: {service="my-mcp-server"} | json | level="error". Set up a Loki alert rule for error rate and a recording rule for P99 latency derived from log duration fields. Use the trace_id field for trace-to-log correlation with Grafana Tempo. Pair with AliveMCP external probes for failures that produce no logs at all.
The log shipping pipeline
The application never needs to know where logs are stored. The pipeline is infrastructure, not code:
Application (Pino)
│ writes NDJSON to stdout
▼
Docker runtime
│ captures stdout via json-file log driver
│ stores to /var/lib/docker/containers/<id>/<id>-json.log
▼
Promtail / Filebeat / Fluentd
│ tails the Docker log files
│ parses JSON, adds labels (service, environment, container_id)
▼
Log aggregator (Loki / Elasticsearch / CloudWatch)
│ indexes and stores log records durably
▼
Grafana / Kibana / CloudWatch Logs Insights
queries, alerts, dashboards
The application's contract is: write structured JSON to stdout. Everything downstream is the infrastructure's responsibility. This separation means you can switch from Loki to Elasticsearch (or add both) without changing any application code.
Grafana Loki with Promtail
Loki is the most common choice for teams already using Grafana for metrics. It stores logs as compressed chunks in object storage (S3, GCS, local disk) and indexes only labels, not log content — making it significantly cheaper than Elasticsearch at scale. Queries use LogQL, which is modelled on PromQL.
Docker Compose setup for a local Loki + Promtail stack:
# docker-compose.yml — Loki + Promtail for MCP server log aggregation
services:
loki:
image: grafana/loki:2.9.7
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:2.9.7
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock
- ./promtail-config.yaml:/etc/promtail/config.yaml
command: -config.file=/etc/promtail/config.yaml
grafana:
image: grafana/grafana:10.4.2
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
# promtail-config.yaml — scrape Docker container logs
server:
http_listen_port: 9080
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
filters:
- name: label
values: ["logging=promtail"] # only collect from containers with this label
relabel_configs:
- source_labels: [__meta_docker_container_name]
target_label: container
- source_labels: [__meta_docker_container_label_com_docker_compose_service]
target_label: service
- source_labels: [__meta_docker_container_label_environment]
target_label: environment
pipeline_stages:
# Docker wraps the container's stdout in a JSON envelope
- docker: {}
# Parse the application's Pino JSON log line
- json:
expressions:
level: level
session_id: session_id
trace_id: trace_id
msg: msg
duration_ms: duration_ms
# Promote level as a Loki label for fast filtering
- labels:
level:
# Convert Pino's ISO timestamp to nanoseconds for Loki
- timestamp:
source: time
format: RFC3339Nano
Add the logging=promtail label to your MCP server container:
# docker-compose.yml — tag MCP server for Promtail collection
services:
mcp-server:
build: .
labels:
- "logging=promtail"
- "environment=production"
- "com.docker.compose.service=mcp-server"
Querying logs with LogQL
LogQL uses a selector + filter pipeline syntax. Start with label selectors, then pipe through JSON parsers and filters:
# All error-level logs from the MCP server
{service="mcp-server", level="error"}
# All logs for a specific session
{service="mcp-server"} | json | session_id = "sess_abc123"
# Tool calls that took longer than 1 second
{service="mcp-server"} | json | duration_ms > 1000 | line_format "{{.msg}} [{{.duration_ms}}ms]"
# Error rate over 5-minute windows (for alerting)
sum(rate({service="mcp-server", level="error"}[5m]))
/
sum(rate({service="mcp-server"}[5m]))
# Logs for a specific trace_id (trace-to-log correlation)
{service="mcp-server"} | json | trace_id = "4bf92f3577b34da6a3ce929d0e0e4736"
The last query is the trace-to-log correlation query. In Grafana, configure a Loki derived field for trace_id that creates an automatic link to Grafana Tempo. Clicking any log line's trace ID jumps directly to the trace.
Alerting from logs with Loki rules
Loki supports Prometheus-compatible alert rules, evaluated against LogQL metrics queries. Write these as YAML and mount them into your Loki configuration:
# loki-rules.yaml — alert on MCP server error rate
groups:
- name: mcp-server-logs
rules:
- alert: MCPHighErrorRateFromLogs
expr: |
sum(rate({service="mcp-server", level="error"}[5m]))
/
sum(rate({service="mcp-server"}[5m]))
> 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "MCP server error log rate above 5%"
- alert: MCPCircuitBreakerOpenLog
expr: |
count_over_time(
{service="mcp-server"} | json | msg =~ "circuit.*open"[1m]
) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "MCP circuit breaker open event detected in logs"
Log-based alerts complement metric-based alerts: Prometheus metrics track aggregate rates; Loki rules can alert on specific log message patterns that metrics never capture (e.g., "JWT signing key rotated", "dependency pool exhausted", "graceful shutdown started").
Filebeat + Elasticsearch (ELK stack)
If your organisation already runs Elasticsearch (Elastic Cloud, self-hosted, or OpenSearch), ship MCP server logs via Filebeat. Filebeat is lighter than Logstash for pure log-shipping scenarios:
# filebeat.yml — ship Docker container logs to Elasticsearch
filebeat.autodiscover:
providers:
- type: docker
hints.enabled: true
templates:
- condition:
contains:
docker.container.labels.logging: "filebeat"
config:
- type: container
paths:
- /var/lib/docker/containers/${data.docker.container.id}/*.log
json.keys_under_root: true
json.add_error_key: true
json.message_key: msg
processors:
- add_docker_metadata: ~
- drop_fields:
fields: ["agent", "ecs", "input", "log.offset"]
ignore_missing: true
output.elasticsearch:
hosts: ["https://elasticsearch:9200"]
username: "${ELASTIC_USERNAME}"
password: "${ELASTIC_PASSWORD}"
index: "mcp-server-logs-%{+yyyy.MM.dd}"
In Kibana, create an index pattern for mcp-server-logs-*. The Pino JSON fields (level, session_id, trace_id, tool, duration_ms) become searchable Kibana fields automatically. Use Kibana Lens to build a dashboard with error rate over time, P99 duration derived from the duration_ms field, and a session count by tool.
AWS CloudWatch Logs
If your MCP server runs on AWS (ECS, EKS, EC2, Lambda), CloudWatch Logs is the lowest-friction option — the AWS log driver ships container logs with zero additional agents:
# ECS task definition — CloudWatch log driver
{
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-mcp-server",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs",
"awslogs-create-group": "true"
}
}
}
# CloudWatch Logs Insights — query errors in the last 1 hour
fields @timestamp, level, msg, session_id, tool, err.message
| filter level = "error"
| sort @timestamp desc
| limit 100
# CloudWatch Logs Insights — P99 tool call duration
filter ispresent(duration_ms) and ispresent(tool)
| stats pct(duration_ms, 99) as p99_ms, count() as calls by tool
| sort p99_ms desc
CloudWatch Logs Metric Filters create CloudWatch Metrics from log patterns, which you can use in CloudWatch Alarms:
# AWS CLI — create metric filter for error log count
aws logs put-metric-filter \
--log-group-name /ecs/my-mcp-server \
--filter-name MCPErrorCount \
--filter-pattern '{ $.level = "error" }' \
--metric-transformations \
metricName=MCPErrorCount,metricNamespace=MCPServer,metricValue=1
Log retention and cost
Log storage cost is driven by volume and retention period. A production MCP server at info level generates roughly 1–5 KB per tool call (one log line per call plus session lifecycle events). At 10,000 tool calls per day, that's 10–50 MB/day before compression. Loki's chunk compression typically achieves 10:1 on JSON logs, so storage cost is manageable even at info level indefinitely.
Never ship debug level logs to your aggregator in production. Debug logs include per-call parameter dumps and intermediate state — they can be 10–100x the volume of info logs, and they may contain sensitive data that should not leave the process. Set LOG_LEVEL=info in production and enable debug logging only for specific sessions during active investigation.
Set retention policies matched to your debugging needs:
| Log type | Suggested retention | Reason |
|---|---|---|
Error logs (level=error) | 90 days | Debugging incidents days after they happen |
| Info logs (tool calls, sessions) | 30 days | Pattern analysis, billing queries |
Warn logs (level=warn) | 30 days | Tracking recovery from degraded states |
| Debug logs (if shipped at all) | 7 days | Short-term investigation only |
What log aggregation cannot catch
Log aggregation covers failures that produce logs. It cannot catch:
- Process crashes before the logger is initialised (a syntax error in startup code, for example)
- OOM kills where the OS terminates the process without giving it a chance to flush the log buffer
- Network-level failures between the client and the server (DNS failures, TLS errors) that happen before any application code runs
- Failures in the log shipping pipeline itself (if Promtail crashes, logs still exist in Docker's json-file but are not aggregated)
AliveMCP addresses all four: it connects from outside the process, makes a full MCP session attempt, and reports the outcome regardless of whether the server's internal logging pipeline is functional. External probes are the safety net below the log aggregation layer. See the observability overview for how external probes, structured logs, metrics, and distributed traces form a complete observability system for MCP servers.