Guide · Observability

MCP server log aggregation

MCP servers run in containers that are ephemeral: when a container is replaced after a deployment or an OOM kill, its local logs are gone. Log aggregation ships every log line to a durable central store — Grafana Loki, Elasticsearch, or AWS CloudWatch Logs — before the container can be destroyed. Once logs are centralised, you can query across multiple container instances, correlate errors with deployment timestamps, and set up alerts on log patterns (e.g., error rate, circuit-breaker state change). This guide covers the standard pipeline: Pino JSON to stdout → Docker captures it → Promtail or Filebeat ships it to the aggregator → Grafana or Kibana queries it.

TL;DR

Write JSON logs to stdout with Pino. Use Docker's default json-file log driver (logs go to /var/lib/docker/containers/<id>/<id>-json.log). Run Promtail as a sidecar or DaemonSet to tail those files and push to Loki. In Grafana, query with LogQL: {service="my-mcp-server"} | json | level="error". Set up a Loki alert rule for error rate and a recording rule for P99 latency derived from log duration fields. Use the trace_id field for trace-to-log correlation with Grafana Tempo. Pair with AliveMCP external probes for failures that produce no logs at all.

The log shipping pipeline

The application never needs to know where logs are stored. The pipeline is infrastructure, not code:

Application (Pino)
  │  writes NDJSON to stdout
  ▼
Docker runtime
  │  captures stdout via json-file log driver
  │  stores to /var/lib/docker/containers/<id>/<id>-json.log
  ▼
Promtail / Filebeat / Fluentd
  │  tails the Docker log files
  │  parses JSON, adds labels (service, environment, container_id)
  ▼
Log aggregator (Loki / Elasticsearch / CloudWatch)
  │  indexes and stores log records durably
  ▼
Grafana / Kibana / CloudWatch Logs Insights
     queries, alerts, dashboards

The application's contract is: write structured JSON to stdout. Everything downstream is the infrastructure's responsibility. This separation means you can switch from Loki to Elasticsearch (or add both) without changing any application code.

Grafana Loki with Promtail

Loki is the most common choice for teams already using Grafana for metrics. It stores logs as compressed chunks in object storage (S3, GCS, local disk) and indexes only labels, not log content — making it significantly cheaper than Elasticsearch at scale. Queries use LogQL, which is modelled on PromQL.

Docker Compose setup for a local Loki + Promtail stack:

# docker-compose.yml — Loki + Promtail for MCP server log aggregation
services:
  loki:
    image: grafana/loki:2.9.7
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:2.9.7
    volumes:
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock
      - ./promtail-config.yaml:/etc/promtail/config.yaml
    command: -config.file=/etc/promtail/config.yaml

  grafana:
    image: grafana/grafana:10.4.2
    ports:
      - "3000:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true

# promtail-config.yaml — scrape Docker container logs
server:
  http_listen_port: 9080

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
        filters:
          - name: label
            values: ["logging=promtail"]  # only collect from containers with this label

    relabel_configs:
      - source_labels: [__meta_docker_container_name]
        target_label: container
      - source_labels: [__meta_docker_container_label_com_docker_compose_service]
        target_label: service
      - source_labels: [__meta_docker_container_label_environment]
        target_label: environment

    pipeline_stages:
      # Docker wraps the container's stdout in a JSON envelope
      - docker: {}
      # Parse the application's Pino JSON log line
      - json:
          expressions:
            level:      level
            session_id: session_id
            trace_id:   trace_id
            msg:        msg
            duration_ms: duration_ms
      # Promote level as a Loki label for fast filtering
      - labels:
          level:
      # Convert Pino's ISO timestamp to nanoseconds for Loki
      - timestamp:
          source: time
          format: RFC3339Nano

Add the logging=promtail label to your MCP server container:

# docker-compose.yml — tag MCP server for Promtail collection
services:
  mcp-server:
    build: .
    labels:
      - "logging=promtail"
      - "environment=production"
      - "com.docker.compose.service=mcp-server"

Querying logs with LogQL

LogQL uses a selector + filter pipeline syntax. Start with label selectors, then pipe through JSON parsers and filters:

# All error-level logs from the MCP server
{service="mcp-server", level="error"}

# All logs for a specific session
{service="mcp-server"} | json | session_id = "sess_abc123"

# Tool calls that took longer than 1 second
{service="mcp-server"} | json | duration_ms > 1000 | line_format "{{.msg}} [{{.duration_ms}}ms]"

# Error rate over 5-minute windows (for alerting)
sum(rate({service="mcp-server", level="error"}[5m]))
/
sum(rate({service="mcp-server"}[5m]))

# Logs for a specific trace_id (trace-to-log correlation)
{service="mcp-server"} | json | trace_id = "4bf92f3577b34da6a3ce929d0e0e4736"

The last query is the trace-to-log correlation query. In Grafana, configure a Loki derived field for trace_id that creates an automatic link to Grafana Tempo. Clicking any log line's trace ID jumps directly to the trace.

Alerting from logs with Loki rules

Loki supports Prometheus-compatible alert rules, evaluated against LogQL metrics queries. Write these as YAML and mount them into your Loki configuration:

# loki-rules.yaml — alert on MCP server error rate
groups:
  - name: mcp-server-logs
    rules:
      - alert: MCPHighErrorRateFromLogs
        expr: |
          sum(rate({service="mcp-server", level="error"}[5m]))
          /
          sum(rate({service="mcp-server"}[5m]))
          > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "MCP server error log rate above 5%"

      - alert: MCPCircuitBreakerOpenLog
        expr: |
          count_over_time(
            {service="mcp-server"} | json | msg =~ "circuit.*open"[1m]
          ) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "MCP circuit breaker open event detected in logs"

Log-based alerts complement metric-based alerts: Prometheus metrics track aggregate rates; Loki rules can alert on specific log message patterns that metrics never capture (e.g., "JWT signing key rotated", "dependency pool exhausted", "graceful shutdown started").

Filebeat + Elasticsearch (ELK stack)

If your organisation already runs Elasticsearch (Elastic Cloud, self-hosted, or OpenSearch), ship MCP server logs via Filebeat. Filebeat is lighter than Logstash for pure log-shipping scenarios:

# filebeat.yml — ship Docker container logs to Elasticsearch
filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true
      templates:
        - condition:
            contains:
              docker.container.labels.logging: "filebeat"
          config:
            - type: container
              paths:
                - /var/lib/docker/containers/${data.docker.container.id}/*.log
              json.keys_under_root: true
              json.add_error_key: true
              json.message_key: msg

processors:
  - add_docker_metadata: ~
  - drop_fields:
      fields: ["agent", "ecs", "input", "log.offset"]
      ignore_missing: true

output.elasticsearch:
  hosts: ["https://elasticsearch:9200"]
  username: "${ELASTIC_USERNAME}"
  password: "${ELASTIC_PASSWORD}"
  index: "mcp-server-logs-%{+yyyy.MM.dd}"

In Kibana, create an index pattern for mcp-server-logs-*. The Pino JSON fields (level, session_id, trace_id, tool, duration_ms) become searchable Kibana fields automatically. Use Kibana Lens to build a dashboard with error rate over time, P99 duration derived from the duration_ms field, and a session count by tool.

AWS CloudWatch Logs

If your MCP server runs on AWS (ECS, EKS, EC2, Lambda), CloudWatch Logs is the lowest-friction option — the AWS log driver ships container logs with zero additional agents:

# ECS task definition — CloudWatch log driver
{
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": "/ecs/my-mcp-server",
      "awslogs-region": "us-east-1",
      "awslogs-stream-prefix": "ecs",
      "awslogs-create-group": "true"
    }
  }
}

# CloudWatch Logs Insights — query errors in the last 1 hour
fields @timestamp, level, msg, session_id, tool, err.message
| filter level = "error"
| sort @timestamp desc
| limit 100

# CloudWatch Logs Insights — P99 tool call duration
filter ispresent(duration_ms) and ispresent(tool)
| stats pct(duration_ms, 99) as p99_ms, count() as calls by tool
| sort p99_ms desc

CloudWatch Logs Metric Filters create CloudWatch Metrics from log patterns, which you can use in CloudWatch Alarms:

# AWS CLI — create metric filter for error log count
aws logs put-metric-filter \
  --log-group-name /ecs/my-mcp-server \
  --filter-name MCPErrorCount \
  --filter-pattern '{ $.level = "error" }' \
  --metric-transformations \
    metricName=MCPErrorCount,metricNamespace=MCPServer,metricValue=1

Log retention and cost

Log storage cost is driven by volume and retention period. A production MCP server at info level generates roughly 1–5 KB per tool call (one log line per call plus session lifecycle events). At 10,000 tool calls per day, that's 10–50 MB/day before compression. Loki's chunk compression typically achieves 10:1 on JSON logs, so storage cost is manageable even at info level indefinitely.

Never ship debug level logs to your aggregator in production. Debug logs include per-call parameter dumps and intermediate state — they can be 10–100x the volume of info logs, and they may contain sensitive data that should not leave the process. Set LOG_LEVEL=info in production and enable debug logging only for specific sessions during active investigation.

Set retention policies matched to your debugging needs:

Log type	Suggested retention	Reason
Error logs (`level=error`)	90 days	Debugging incidents days after they happen
Info logs (tool calls, sessions)	30 days	Pattern analysis, billing queries
Warn logs (`level=warn`)	30 days	Tracking recovery from degraded states
Debug logs (if shipped at all)	7 days	Short-term investigation only

What log aggregation cannot catch

Log aggregation covers failures that produce logs. It cannot catch:

Process crashes before the logger is initialised (a syntax error in startup code, for example)
OOM kills where the OS terminates the process without giving it a chance to flush the log buffer
Network-level failures between the client and the server (DNS failures, TLS errors) that happen before any application code runs
Failures in the log shipping pipeline itself (if Promtail crashes, logs still exist in Docker's json-file but are not aggregated)

AliveMCP addresses all four: it connects from outside the process, makes a full MCP session attempt, and reports the outcome regardless of whether the server's internal logging pipeline is functional. External probes are the safety net below the log aggregation layer. See the observability overview for how external probes, structured logs, metrics, and distributed traces form a complete observability system for MCP servers.