Building MCP Servers
Pillar Guide

MCP Server Monitoring & Observability: Prometheus, Grafana & Health Checks

Complete guide to monitoring MCP servers in production — health checks, metrics collection with Prometheus, Grafana dashboards, logging strategies, and alerting.

20 min read
Updated February 26, 2026
By MCP Server Spot

Monitoring MCP servers in production requires a combination of health checks, metrics collection, structured logging, and alerting. Without observability, you are flying blind -- unable to detect failures, diagnose latency spikes, or understand usage patterns across your MCP tools. This guide gives you a complete, step-by-step framework for making your MCP servers observable using industry-standard tools: Prometheus for metrics, Grafana for dashboards, structured JSON logging, and Alertmanager for notifications.

Whether you are running a single MCP server for your team or operating dozens of servers at enterprise scale, the patterns in this guide apply. We cover implementations in both Python and TypeScript, with architecture decisions explained along the way.

For the foundational deployment guide, see Deploying Remote MCP Servers. For debugging during development, see Testing and Debugging MCP Servers.

Why Monitoring Matters for MCP Servers

MCP servers sit at a critical integration point between AI models and external systems. When an MCP server fails or slows down, the impact cascades:

  • Tool calls fail silently. The AI model receives an error and may retry, hallucinate, or give the user an unhelpful response.
  • Latency compounds. An AI agent making multiple sequential tool calls amplifies any per-call latency. A 2-second tool response becomes 10 seconds across five calls.
  • Usage patterns are invisible. Without metrics, you cannot answer basic questions: Which tools are used most? What is the error rate? Are any tools never called?
  • Incidents take longer to resolve. Without logs and traces, debugging a production failure means guessing.

MCP servers also have unique monitoring challenges compared to traditional APIs:

ChallengeWhy It Matters
Stdio transport restrictionsLocal MCP servers use stdout for protocol messages, so logging must go to stderr
Long-lived SSE connectionsRemote servers maintain persistent connections that need health monitoring
AI-driven call patternsTraffic is bursty and unpredictable -- the AI model decides when to call tools
Multi-tool workflowsA single user request may trigger a chain of tool calls across multiple servers
Stateful sessionsSSE transport maintains session state that must be tracked and cleaned up

The good news: MCP servers are standard HTTP services (when deployed remotely), so the entire ecosystem of monitoring tools works out of the box.

Key Metrics to Track

Before writing any instrumentation code, decide which metrics matter. Here is the complete set of metrics every production MCP server should expose:

Protocol-Level Metrics

MetricTypeDescription
mcp_tool_calls_totalCounterTotal tool invocations, labeled by tool name and status (success/error)
mcp_tool_call_duration_secondsHistogramLatency distribution for tool calls, labeled by tool name
mcp_resource_reads_totalCounterTotal resource read operations, labeled by resource URI
mcp_active_sessionsGaugeCurrent number of active client sessions
mcp_session_duration_secondsHistogramHow long client sessions last before disconnecting
mcp_errors_totalCounterTotal protocol-level errors, labeled by error type

Infrastructure-Level Metrics

MetricTypeDescription
process_cpu_seconds_totalCounterCPU time consumed by the server process
process_resident_memory_bytesGaugeRSS memory usage
process_open_fdsGaugeOpen file descriptors (important for connection-heavy servers)
mcp_http_requests_totalCounterTotal HTTP requests to the server (health checks, SSE, messages)
mcp_http_request_duration_secondsHistogramHTTP request latency by endpoint

Business-Level Metrics

Depending on your server's purpose, you may also want:

  • API quota usage -- If your tools call rate-limited external APIs, track remaining quota
  • Cache hit rate -- If you cache tool results, measure effectiveness
  • Data freshness -- For servers that sync data, track how stale the data is

Health Check Endpoints

Every production MCP server needs at least two health endpoints: a liveness probe and a readiness probe.

  • Liveness (/health): Is the process running and can it accept connections?
  • Readiness (/ready): Is the server fully initialized and are all dependencies (database, external APIs) reachable?

Python Health Check Implementation

from starlette.applications import Starlette
from starlette.routing import Route, Mount
from starlette.responses import JSONResponse
from mcp.server.fastmcp import FastMCP
import time
import asyncio

mcp = FastMCP("My Production Server")

# Track server start time
SERVER_START_TIME = time.time()

# ... define tools, resources, prompts ...

async def health_check(request):
    """Liveness probe -- is the server process running?"""
    return JSONResponse(
        {
            "status": "healthy",
            "uptime_seconds": round(time.time() - SERVER_START_TIME, 1),
            "version": "1.2.0",
        }
    )

async def readiness_check(request):
    """Readiness probe -- are all dependencies available?"""
    checks = {}
    overall_healthy = True

    # Check database connectivity
    try:
        await db.execute("SELECT 1")
        checks["database"] = "healthy"
    except Exception as e:
        checks["database"] = f"unhealthy: {str(e)}"
        overall_healthy = False

    # Check external API reachability
    try:
        async with httpx.AsyncClient() as client:
            resp = await client.get(
                "https://api.example.com/health",
                timeout=5.0,
            )
            resp.raise_for_status()
            checks["external_api"] = "healthy"
    except Exception as e:
        checks["external_api"] = f"unhealthy: {str(e)}"
        overall_healthy = False

    status_code = 200 if overall_healthy else 503
    return JSONResponse(
        {
            "status": "ready" if overall_healthy else "not_ready",
            "checks": checks,
            "uptime_seconds": round(time.time() - SERVER_START_TIME, 1),
        },
        status_code=status_code,
    )

app = Starlette(
    routes=[
        Route("/health", endpoint=health_check),
        Route("/ready", endpoint=readiness_check),
        Mount("/", app=mcp.sse_app()),
    ]
)

TypeScript Health Check Implementation

import express from "express";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import { createServer } from "./server.js";

const app = express();
app.use(express.json());

const startTime = Date.now();
const transports = new Map<string, SSEServerTransport>();

// Liveness probe
app.get("/health", (req, res) => {
  res.json({
    status: "healthy",
    uptimeSeconds: Math.round((Date.now() - startTime) / 1000),
    activeSessions: transports.size,
    memoryMB: Math.round(process.memoryUsage().rss / 1024 / 1024),
  });
});

// Readiness probe
app.get("/ready", async (req, res) => {
  const checks: Record<string, string> = {};
  let healthy = true;

  // Check database
  try {
    await db.query("SELECT 1");
    checks.database = "healthy";
  } catch (err) {
    checks.database = `unhealthy: ${(err as Error).message}`;
    healthy = false;
  }

  res.status(healthy ? 200 : 503).json({
    status: healthy ? "ready" : "not_ready",
    checks,
    uptimeSeconds: Math.round((Date.now() - startTime) / 1000),
  });
});

// SSE and message endpoints...
app.get("/sse", async (req, res) => {
  const server = createServer();
  const transport = new SSEServerTransport("/messages", res);
  transports.set(transport.sessionId, transport);

  res.on("close", () => {
    transports.delete(transport.sessionId);
  });

  await server.connect(transport);
});

Kubernetes Probe Configuration

If you deploy to Kubernetes, wire these endpoints into your pod spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  template:
    spec:
      containers:
        - name: mcp-server
          image: my-mcp-server:latest
          ports:
            - containerPort: 3001
          livenessProbe:
            httpGet:
              path: /health
              port: 3001
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: 3001
            initialDelaySeconds: 10
            periodSeconds: 15
            failureThreshold: 2

Prometheus Integration

Prometheus is the industry standard for collecting and storing time-series metrics. Integrating Prometheus with your MCP server involves three steps: instrument your code, expose a /metrics endpoint, and configure Prometheus to scrape it.

Python: Prometheus Instrumentation

Install the Prometheus client library:

uv add prometheus-client

Define your metrics and instrument your tool handlers:

from prometheus_client import (
    Counter,
    Histogram,
    Gauge,
    CollectorRegistry,
    generate_latest,
)
from starlette.responses import Response
import time

# Create a dedicated registry (avoids default process metrics clutter)
registry = CollectorRegistry()

# Define metrics
tool_calls_total = Counter(
    "mcp_tool_calls_total",
    "Total number of MCP tool calls",
    ["tool_name", "status"],
    registry=registry,
)

tool_call_duration = Histogram(
    "mcp_tool_call_duration_seconds",
    "Duration of MCP tool calls in seconds",
    ["tool_name"],
    buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
    registry=registry,
)

active_sessions = Gauge(
    "mcp_active_sessions",
    "Number of active MCP client sessions",
    registry=registry,
)

resource_reads_total = Counter(
    "mcp_resource_reads_total",
    "Total number of resource read operations",
    ["resource_uri"],
    registry=registry,
)

# Metrics endpoint
async def metrics_endpoint(request):
    """Prometheus metrics endpoint."""
    return Response(
        content=generate_latest(registry),
        media_type="text/plain; version=0.0.4; charset=utf-8",
    )

Wrap your tool handlers with instrumentation:

from mcp.server.fastmcp import FastMCP
import functools

mcp = FastMCP("Monitored Server")

def instrumented_tool(func):
    """Decorator that adds Prometheus metrics to MCP tool handlers."""
    tool_name = func.__name__

    @functools.wraps(func)
    async def wrapper(*args, **kwargs):
        start_time = time.time()
        try:
            result = await func(*args, **kwargs)
            tool_calls_total.labels(
                tool_name=tool_name, status="success"
            ).inc()
            return result
        except Exception as e:
            tool_calls_total.labels(
                tool_name=tool_name, status="error"
            ).inc()
            raise
        finally:
            duration = time.time() - start_time
            tool_call_duration.labels(tool_name=tool_name).observe(duration)

    return wrapper

@mcp.tool()
@instrumented_tool
async def search_documents(query: str) -> str:
    """Search the document database for relevant content."""
    results = await db.search(query)
    return format_results(results)

@mcp.tool()
@instrumented_tool
async def get_user_profile(user_id: str) -> str:
    """Retrieve a user's profile information."""
    profile = await db.get_user(user_id)
    return format_profile(profile)

TypeScript: Prometheus Instrumentation

Install the prom-client package:

npm install prom-client

Set up metrics and instrument your server:

import { Registry, Counter, Histogram, Gauge } from "prom-client";

const register = new Registry();

// Define metrics
const toolCallsTotal = new Counter({
  name: "mcp_tool_calls_total",
  help: "Total number of MCP tool calls",
  labelNames: ["tool_name", "status"] as const,
  registers: [register],
});

const toolCallDuration = new Histogram({
  name: "mcp_tool_call_duration_seconds",
  help: "Duration of MCP tool calls in seconds",
  labelNames: ["tool_name"] as const,
  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
  registers: [register],
});

const activeSessions = new Gauge({
  name: "mcp_active_sessions",
  help: "Number of active MCP client sessions",
  registers: [register],
});

// Expose metrics endpoint
app.get("/metrics", async (req, res) => {
  res.set("Content-Type", register.contentType);
  res.send(await register.metrics());
});

// Instrument tool handler
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const toolName = request.params.name;
  const timer = toolCallDuration.startTimer({ tool_name: toolName });

  try {
    const result = await handleToolCall(request);
    toolCallsTotal.inc({ tool_name: toolName, status: "success" });
    return result;
  } catch (error) {
    toolCallsTotal.inc({ tool_name: toolName, status: "error" });
    throw error;
  } finally {
    timer();
  }
});

// Track sessions
app.get("/sse", async (req, res) => {
  activeSessions.inc();
  res.on("close", () => {
    activeSessions.dec();
  });
  // ... transport setup
});

Prometheus Scrape Configuration

Add your MCP server as a scrape target in prometheus.yml:

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "mcp-server"
    static_configs:
      - targets: ["mcp-server:3001"]
    metrics_path: /metrics
    scrape_interval: 15s

  # If you have multiple MCP servers
  - job_name: "mcp-servers"
    static_configs:
      - targets:
          - "mcp-weather:3001"
          - "mcp-database:3002"
          - "mcp-github:3003"
        labels:
          environment: "production"

For Kubernetes environments, use service discovery instead of static targets:

scrape_configs:
  - job_name: "mcp-servers"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_scrape
        action: keep
        regex: "true"
      - source_labels:
          - __meta_kubernetes_pod_annotation_prometheus_io_port
        action: replace
        target_label: __address__
        regex: (.+)
        replacement: "${1}:${2}"

Grafana Dashboard Setup

Once Prometheus is collecting metrics, use Grafana to visualize them. This section walks through building an MCP-specific dashboard.

Adding Prometheus as a Data Source

  1. Open Grafana (default: http://localhost:3000)
  2. Navigate to Configuration then Data Sources
  3. Click Add data source and select Prometheus
  4. Set the URL to your Prometheus instance (e.g., http://prometheus:9090)
  5. Click Save & Test

Essential Dashboard Panels

Build your MCP dashboard with these panels:

Panel 1: Tool Call Rate (Requests per Second)

PromQL query:

sum(rate(mcp_tool_calls_total[5m])) by (tool_name)

Visualization: Time series with legend showing each tool. This tells you which tools are being called and how often.

Panel 2: Tool Call Latency (P50 / P95 / P99)

PromQL queries:

# P50
histogram_quantile(0.50, sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool_name))

# P95
histogram_quantile(0.95, sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool_name))

# P99
histogram_quantile(0.99, sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool_name))

Visualization: Time series with three lines per tool. This reveals latency degradation before it impacts users.

Panel 3: Error Rate (%)

PromQL query:

sum(rate(mcp_tool_calls_total{status="error"}[5m])) by (tool_name)
/
sum(rate(mcp_tool_calls_total[5m])) by (tool_name)
* 100

Visualization: Time series with threshold lines at 1% (warning) and 5% (critical).

Panel 4: Active Sessions

PromQL query:

mcp_active_sessions

Visualization: Gauge or stat panel showing current value.

Panel 5: Memory Usage

PromQL query:

process_resident_memory_bytes / 1024 / 1024

Visualization: Time series in MB with the container memory limit shown as a threshold line.

Panel 6: Tool Call Breakdown (Table)

PromQL query:

sum(increase(mcp_tool_calls_total[24h])) by (tool_name, status)

Visualization: Table showing total calls and error count per tool over the last 24 hours.

Dashboard Layout Recommendation

RowLeft PanelRight Panel
1Tool Call Rate (time series)Error Rate % (time series)
2Latency P50/P95/P99 (time series)Active Sessions (gauge)
3Memory Usage (time series)CPU Usage (time series)
4Tool Call Breakdown (table, full width)

Structured Logging

Metrics tell you what is happening. Logs tell you why. A good logging strategy is essential for diagnosing issues that metrics alone cannot explain.

JSON Logging Format

Use structured JSON logs so they can be parsed by log aggregation tools (Elasticsearch, Loki, CloudWatch, Datadog):

import json
import sys
import time
import uuid
from datetime import datetime, timezone

class MCPLogger:
    """Structured JSON logger for MCP servers.

    All output goes to stderr to avoid corrupting
    the JSON-RPC protocol on stdout.
    """

    def __init__(self, server_name: str, version: str = "1.0.0"):
        self.server_name = server_name
        self.version = version

    def _emit(self, level: str, message: str, **fields):
        entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "level": level,
            "server": self.server_name,
            "version": self.version,
            "message": message,
            **fields,
        }
        print(json.dumps(entry), file=sys.stderr)

    def info(self, message: str, **fields):
        self._emit("INFO", message, **fields)

    def warn(self, message: str, **fields):
        self._emit("WARN", message, **fields)

    def error(self, message: str, **fields):
        self._emit("ERROR", message, **fields)

    def debug(self, message: str, **fields):
        self._emit("DEBUG", message, **fields)

logger = MCPLogger("weather-server", version="2.1.0")

Correlation IDs for Request Tracing

Assign a unique ID to each tool call so you can trace related log entries:

import uuid

@mcp.tool()
async def search_documents(query: str) -> str:
    """Search documents with full observability."""
    request_id = str(uuid.uuid4())[:8]

    logger.info(
        "Tool call started",
        tool="search_documents",
        request_id=request_id,
        query_length=len(query),
    )

    try:
        start = time.time()
        results = await db.search(query)
        duration = time.time() - start

        logger.info(
            "Tool call completed",
            tool="search_documents",
            request_id=request_id,
            result_count=len(results),
            duration_ms=round(duration * 1000),
        )

        return format_results(results)
    except Exception as e:
        logger.error(
            "Tool call failed",
            tool="search_documents",
            request_id=request_id,
            error=str(e),
            error_type=type(e).__name__,
        )
        raise

This produces log lines like:

{"timestamp":"2026-02-26T14:30:01Z","level":"INFO","server":"weather-server","version":"2.1.0","message":"Tool call started","tool":"search_documents","request_id":"a1b2c3d4","query_length":42}
{"timestamp":"2026-02-26T14:30:01Z","level":"INFO","server":"weather-server","version":"2.1.0","message":"Tool call completed","tool":"search_documents","request_id":"a1b2c3d4","result_count":7,"duration_ms":234}

Log Levels and When to Use Them

LevelWhen to UseExample
DEBUGDetailed diagnostic info, high volumeParameter values, intermediate results
INFONormal operations worth recordingTool call start/complete, session connect/disconnect
WARNUnexpected but recoverable situationsRetry attempt, deprecated tool usage, slow query
ERRORFailures that need attentionTool call exception, dependency unreachable, data corruption

In production, set the log level to INFO. Enable DEBUG only when actively investigating an issue to avoid excessive log volume.

TypeScript Structured Logging

type LogLevel = "DEBUG" | "INFO" | "WARN" | "ERROR";

interface LogEntry {
  timestamp: string;
  level: LogLevel;
  server: string;
  message: string;
  [key: string]: unknown;
}

function createLogger(serverName: string) {
  const minLevel: LogLevel =
    (process.env.LOG_LEVEL as LogLevel) || "INFO";
  const levels: Record<LogLevel, number> = {
    DEBUG: 0,
    INFO: 1,
    WARN: 2,
    ERROR: 3,
  };

  function emit(level: LogLevel, message: string, fields?: Record<string, unknown>) {
    if (levels[level] < levels[minLevel]) return;

    const entry: LogEntry = {
      timestamp: new Date().toISOString(),
      level,
      server: serverName,
      message,
      ...fields,
    };

    // stderr is safe for MCP servers (stdout is for JSON-RPC)
    console.error(JSON.stringify(entry));
  }

  return {
    debug: (msg: string, fields?: Record<string, unknown>) =>
      emit("DEBUG", msg, fields),
    info: (msg: string, fields?: Record<string, unknown>) =>
      emit("INFO", msg, fields),
    warn: (msg: string, fields?: Record<string, unknown>) =>
      emit("WARN", msg, fields),
    error: (msg: string, fields?: Record<string, unknown>) =>
      emit("ERROR", msg, fields),
  };
}

const logger = createLogger("github-mcp-server");

Shipping Logs to Aggregation Services

For production, send logs to a centralized system:

Grafana Loki with Promtail:

# promtail-config.yml
server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: mcp-server
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: ["__meta_docker_container_name"]
        target_label: "container"
    pipeline_stages:
      - json:
          expressions:
            level: level
            server: server
            tool: tool
      - labels:
          level:
          server:
          tool:

AWS CloudWatch (for ECS deployments): Configure the awslogs log driver in your ECS task definition. Logs from stderr are automatically shipped to CloudWatch.

Alerting Rules

Metrics and logs are only useful if someone acts on them. Set up alerting rules that notify your team when something goes wrong.

Prometheus Alerting Rules

Create an alerting rules file:

# mcp-alerts.yml
groups:
  - name: mcp-server-alerts
    rules:
      # Server is completely down
      - alert: MCPServerDown
        expr: up{job="mcp-server"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "MCP server is unreachable"
          description: "The MCP server has been down for more than 1 minute."
          runbook: "https://wiki.example.com/runbooks/mcp-server-down"

      # High error rate on any tool
      - alert: MCPToolHighErrorRate
        expr: >
          (
            sum(rate(mcp_tool_calls_total{status="error"}[5m])) by (tool_name)
            /
            sum(rate(mcp_tool_calls_total[5m])) by (tool_name)
          ) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate on tool {{ $labels.tool_name }}"
          description: "Tool {{ $labels.tool_name }} has an error rate above 5% for the last 5 minutes."

      # P95 latency exceeding threshold
      - alert: MCPToolHighLatency
        expr: >
          histogram_quantile(0.95,
            sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool_name)
          ) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High P95 latency on tool {{ $labels.tool_name }}"
          description: "Tool {{ $labels.tool_name }} P95 latency exceeds 10 seconds."

      # Memory approaching container limit
      - alert: MCPServerHighMemory
        expr: >
          process_resident_memory_bytes / 1024 / 1024 > 400
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "MCP server memory usage above 400 MB"
          description: "Memory usage has been above 400 MB for 10 minutes. Check for leaks."

      # No tool calls received (possible connectivity issue)
      - alert: MCPServerNoTraffic
        expr: >
          sum(rate(mcp_tool_calls_total[15m])) == 0
        for: 30m
        labels:
          severity: info
        annotations:
          summary: "No tool calls received for 30 minutes"
          description: "The MCP server has not received any tool calls. This may be normal during off-hours or could indicate a connectivity issue."

Alertmanager Configuration

Route alerts to the right channels based on severity:

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  receiver: "default"
  group_by: ["alertname", "severity"]
  group_wait: 10s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: "pagerduty-critical"
      repeat_interval: 1h
    - match:
        severity: warning
      receiver: "slack-warnings"
      repeat_interval: 4h

receivers:
  - name: "default"
    slack_configs:
      - api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
        channel: "#mcp-alerts"
        title: "{{ .GroupLabels.alertname }}"
        text: "{{ .CommonAnnotations.description }}"

  - name: "pagerduty-critical"
    pagerduty_configs:
      - service_key: "YOUR_PAGERDUTY_KEY"
        description: "{{ .CommonAnnotations.summary }}"

  - name: "slack-warnings"
    slack_configs:
      - api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
        channel: "#mcp-alerts"
        title: "WARNING: {{ .GroupLabels.alertname }}"
        text: "{{ .CommonAnnotations.description }}"

Distributed Tracing for Multi-Server Setups

When your architecture includes multiple MCP servers -- or when MCP servers call downstream APIs -- distributed tracing shows you the full request path.

OpenTelemetry Integration (Python)

uv add opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
    OTLPSpanExporter,
)
from opentelemetry.sdk.resources import Resource

# Configure tracing
resource = Resource.create(
    {
        "service.name": "mcp-weather-server",
        "service.version": "2.1.0",
    }
)

provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(
    OTLPSpanExporter(endpoint="http://jaeger:4317")
)
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("mcp-weather-server")

@mcp.tool()
async def get_forecast(latitude: float, longitude: float) -> str:
    """Get weather forecast with distributed tracing."""
    with tracer.start_as_current_span("tool.get_forecast") as span:
        span.set_attribute("tool.name", "get_forecast")
        span.set_attribute("tool.args.latitude", latitude)
        span.set_attribute("tool.args.longitude", longitude)

        # Child span for the API call
        with tracer.start_as_current_span("http.get_points"):
            points = await fetch_weather_points(latitude, longitude)

        # Another child span
        with tracer.start_as_current_span("http.get_forecast"):
            forecast = await fetch_forecast(points["forecast_url"])

        span.set_attribute("tool.result.periods", len(forecast))
        return format_forecast(forecast)

OpenTelemetry Integration (TypeScript)

npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-grpc
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-grpc";
import { trace } from "@opentelemetry/api";

const sdk = new NodeSDK({
  serviceName: "mcp-github-server",
  traceExporter: new OTLPTraceExporter({
    url: "http://jaeger:4317",
  }),
});

sdk.start();

const tracer = trace.getTracer("mcp-github-server");

// Use in tool handlers
async function handleToolCall(name: string, args: unknown) {
  return tracer.startActiveSpan(`tool.${name}`, async (span) => {
    span.setAttribute("tool.name", name);
    try {
      const result = await executeToolLogic(name, args);
      span.setAttribute("tool.status", "success");
      return result;
    } catch (error) {
      span.setAttribute("tool.status", "error");
      span.recordException(error as Error);
      throw error;
    } finally {
      span.end();
    }
  });
}

Viewing Traces in Jaeger

Deploy Jaeger alongside your MCP server to visualize traces:

# docker-compose.yml (partial)
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # Jaeger UI
      - "4317:4317"    # OTLP gRPC
    environment:
      - COLLECTOR_OTLP_ENABLED=true

Open http://localhost:16686 to search for traces by service name, operation, or duration. Each trace shows the full hierarchy of spans, making it easy to identify which downstream call caused latency.

Monitoring at Scale

As you scale from one MCP server to many, your monitoring strategy needs to evolve.

Centralized Monitoring Architecture

┌────────────────────────────────────────────────────────────┐
│                     Grafana Dashboard                      │
│  (Tool call rates, latency, errors across all servers)     │
└─────────────┬────────────────────────────┬─────────────────┘
              │                            │
       ┌──────▼──────┐            ┌────────▼────────┐
       │  Prometheus  │            │   Grafana Loki  │
       │  (metrics)   │            │   (logs)        │
       └──────┬───────┘            └────────┬────────┘
              │                             │
    ┌─────────┼─────────────┐     ┌─────────┼──────────┐
    ▼         ▼             ▼     ▼         ▼          ▼
┌────────┐┌────────┐┌────────┐┌────────┐┌────────┐┌────────┐
│Weather ││GitHub  ││Database││Weather ││GitHub  ││Database│
│MCP     ││MCP    ││MCP     ││Logs    ││Logs    ││Logs    │
│/metrics││/metrics││/metrics││stderr  ││stderr  ││stderr  │
└────────┘└────────┘└────────┘└────────┘└────────┘└────────┘

Multi-Server Grafana Dashboard

When monitoring multiple MCP servers, add a server selector variable to your Grafana dashboard:

  1. Create a variable named server with query: label_values(mcp_tool_calls_total, job)
  2. Update all panel queries to filter by job="$server"
  3. Add an "All" option to see aggregate metrics across all servers

Example query with server filtering:

sum(rate(mcp_tool_calls_total{job="$server"}[5m])) by (tool_name)

Capacity Planning Queries

Use these queries to plan scaling decisions:

# Peak tool call rate over the last 7 days
max_over_time(sum(rate(mcp_tool_calls_total[5m]))[7d:5m])

# Average session duration
histogram_quantile(0.5, rate(mcp_session_duration_seconds_bucket[24h]))

# Memory growth trend (predict when you will hit limits)
predict_linear(process_resident_memory_bytes[6h], 3600 * 24)

Production Readiness Checklist

Before declaring your MCP server production-ready, verify every item on this checklist:

Health and Availability

  • /health endpoint returns 200 when the server is running
  • /ready endpoint checks all critical dependencies (database, APIs)
  • Kubernetes probes (or equivalent) configured for liveness and readiness
  • Graceful shutdown handler drains active sessions on SIGTERM

Metrics and Dashboards

  • Prometheus metrics exposed at /metrics
  • Tool call counter with tool name and status labels
  • Tool call latency histogram with appropriate buckets
  • Active session gauge
  • Grafana dashboard with rate, latency, error, and session panels
  • Dashboard accessible to all team members

Logging

  • Structured JSON logging to stderr
  • Correlation IDs on every tool call
  • Log level configurable via environment variable
  • Logs shipped to a centralized aggregation system
  • No sensitive data (API keys, passwords) in log output

Alerting

  • Alert for server down (critical, 1-minute threshold)
  • Alert for high error rate (warning, 5% for 5 minutes)
  • Alert for high latency (warning, P95 above threshold)
  • Alert for high memory usage (warning, 80% of limit)
  • Alert routing configured (PagerDuty for critical, Slack for warning)
  • Runbook links included in alert annotations

Infrastructure

  • Container resource limits set (CPU and memory)
  • Horizontal auto-scaling configured
  • TLS/HTTPS enabled on all endpoints
  • Secrets injected via environment variables, not hardcoded

What to Read Next

Summary

Production MCP server monitoring rests on four pillars: health checks that confirm your server and its dependencies are operational, Prometheus metrics that quantify tool call rates, latency, and errors, structured JSON logging that explains what happened and why, and alerting rules that notify your team before users are impacted.

Start with health checks and basic metrics -- these take less than an hour to implement and immediately give you visibility. Add Grafana dashboards next, then alerting rules. Distributed tracing is the final layer, most valuable when you operate multiple interconnected MCP servers. The production readiness checklist at the end of this guide ensures nothing is missed before your server goes live.

Frequently Asked Questions

What metrics should I monitor on an MCP server?

The most critical metrics are: tool call count (total invocations per tool), tool call latency (P50/P95/P99 response times), error rate (percentage of failed tool calls), active session count (concurrent SSE or Streamable HTTP connections), transport health (connection drops, reconnects), and resource utilization (CPU, memory, open file descriptors). These give you full visibility into both protocol-level and infrastructure-level health.

How do I add a health check endpoint to my MCP server?

For Python MCP servers using FastMCP with Starlette, add a Route('/health', ...) that returns a JSON response with status, uptime, and dependency checks (database, external APIs). For TypeScript servers using Express, add an app.get('/health', ...) handler. The health check should verify that the server can accept connections and that critical dependencies are reachable.

Can I use Prometheus to monitor MCP servers?

Yes. Expose a /metrics endpoint using the prometheus_client library (Python) or prom-client package (TypeScript). Define custom counters for tool calls, histograms for latency, and gauges for active sessions. Prometheus scrapes this endpoint at a configured interval and stores the time-series data for querying and alerting.

How do I set up a Grafana dashboard for MCP servers?

Add Prometheus as a data source in Grafana, then create a dashboard with panels for tool call rate, latency percentiles, error rate, and active sessions. Use PromQL queries like rate(mcp_tool_calls_total[5m]) for throughput and histogram_quantile(0.95, rate(mcp_tool_call_duration_seconds_bucket[5m])) for P95 latency. Import or build a dashboard JSON and share it across your team.

What is the best logging strategy for MCP servers?

Use structured JSON logging to stderr (since stdout is reserved for JSON-RPC in stdio transport). Include fields like timestamp, level, server_name, tool_name, request_id, and duration. Use a correlation ID to trace a single request across log entries. In production, ship logs to a centralized system like Elasticsearch, Loki, or CloudWatch for searching and alerting.

How do I set up alerts for MCP server failures?

Use Prometheus Alertmanager to define alerting rules based on your metrics. Critical alerts include: server down (up == 0), high error rate (error rate above 5% for 5 minutes), high latency (P95 above 10 seconds), and session count anomalies. Route alerts to PagerDuty, Slack, or email based on severity. Always include a runbook link in alert annotations.

How do I implement distributed tracing for multi-server MCP setups?

Use OpenTelemetry to add tracing to your MCP server. Create a span for each tool call and propagate trace context through any downstream HTTP requests. If one MCP server calls another, pass the trace parent header so that the full request path is visible in your tracing backend (Jaeger, Zipkin, or a managed service like Datadog APM).

Should I monitor MCP servers differently in local vs remote deployments?

Yes. Local stdio-based MCP servers have limited monitoring options -- use stderr logging and the MCP Inspector for debugging. Remote MCP servers (SSE or Streamable HTTP) support full observability: health check endpoints, Prometheus metrics, structured logging, and distributed tracing. Focus your monitoring investment on remote production deployments.

How do I monitor MCP server memory usage to prevent OOM crashes?

Export memory metrics using process-level gauges (process.memoryUsage() in Node.js, psutil or /proc/self/status in Python). Track heap usage, RSS, and external memory. Set Prometheus alerts when memory exceeds 80% of your container limit. Common memory leaks in MCP servers include unbounded session maps, growing caches without eviction, and accumulated log buffers.

What is a production readiness checklist for MCP server monitoring?

Before going to production, verify: (1) health check endpoint returns 200 and checks dependencies, (2) Prometheus metrics are exposed for tool calls, latency, and errors, (3) structured JSON logging is configured to stderr, (4) alerting rules cover server down, high error rate, and high latency, (5) Grafana dashboard exists with key panels, (6) log aggregation is configured, (7) graceful shutdown drains active sessions, (8) resource limits (CPU/memory) are set on your container.

Related Guides