MCP Server Monitoring & Observability: Prometheus, Grafana & Health Checks
Complete guide to monitoring MCP servers in production — health checks, metrics collection with Prometheus, Grafana dashboards, logging strategies, and alerting.
Monitoring MCP servers in production requires a combination of health checks, metrics collection, structured logging, and alerting. Without observability, you are flying blind -- unable to detect failures, diagnose latency spikes, or understand usage patterns across your MCP tools. This guide gives you a complete, step-by-step framework for making your MCP servers observable using industry-standard tools: Prometheus for metrics, Grafana for dashboards, structured JSON logging, and Alertmanager for notifications.
Whether you are running a single MCP server for your team or operating dozens of servers at enterprise scale, the patterns in this guide apply. We cover implementations in both Python and TypeScript, with architecture decisions explained along the way.
For the foundational deployment guide, see Deploying Remote MCP Servers. For debugging during development, see Testing and Debugging MCP Servers.
Why Monitoring Matters for MCP Servers
MCP servers sit at a critical integration point between AI models and external systems. When an MCP server fails or slows down, the impact cascades:
- Tool calls fail silently. The AI model receives an error and may retry, hallucinate, or give the user an unhelpful response.
- Latency compounds. An AI agent making multiple sequential tool calls amplifies any per-call latency. A 2-second tool response becomes 10 seconds across five calls.
- Usage patterns are invisible. Without metrics, you cannot answer basic questions: Which tools are used most? What is the error rate? Are any tools never called?
- Incidents take longer to resolve. Without logs and traces, debugging a production failure means guessing.
MCP servers also have unique monitoring challenges compared to traditional APIs:
| Challenge | Why It Matters |
|---|---|
| Stdio transport restrictions | Local MCP servers use stdout for protocol messages, so logging must go to stderr |
| Long-lived SSE connections | Remote servers maintain persistent connections that need health monitoring |
| AI-driven call patterns | Traffic is bursty and unpredictable -- the AI model decides when to call tools |
| Multi-tool workflows | A single user request may trigger a chain of tool calls across multiple servers |
| Stateful sessions | SSE transport maintains session state that must be tracked and cleaned up |
The good news: MCP servers are standard HTTP services (when deployed remotely), so the entire ecosystem of monitoring tools works out of the box.
Key Metrics to Track
Before writing any instrumentation code, decide which metrics matter. Here is the complete set of metrics every production MCP server should expose:
Protocol-Level Metrics
| Metric | Type | Description |
|---|---|---|
mcp_tool_calls_total | Counter | Total tool invocations, labeled by tool name and status (success/error) |
mcp_tool_call_duration_seconds | Histogram | Latency distribution for tool calls, labeled by tool name |
mcp_resource_reads_total | Counter | Total resource read operations, labeled by resource URI |
mcp_active_sessions | Gauge | Current number of active client sessions |
mcp_session_duration_seconds | Histogram | How long client sessions last before disconnecting |
mcp_errors_total | Counter | Total protocol-level errors, labeled by error type |
Infrastructure-Level Metrics
| Metric | Type | Description |
|---|---|---|
process_cpu_seconds_total | Counter | CPU time consumed by the server process |
process_resident_memory_bytes | Gauge | RSS memory usage |
process_open_fds | Gauge | Open file descriptors (important for connection-heavy servers) |
mcp_http_requests_total | Counter | Total HTTP requests to the server (health checks, SSE, messages) |
mcp_http_request_duration_seconds | Histogram | HTTP request latency by endpoint |
Business-Level Metrics
Depending on your server's purpose, you may also want:
- API quota usage -- If your tools call rate-limited external APIs, track remaining quota
- Cache hit rate -- If you cache tool results, measure effectiveness
- Data freshness -- For servers that sync data, track how stale the data is
Health Check Endpoints
Every production MCP server needs at least two health endpoints: a liveness probe and a readiness probe.
- Liveness (
/health): Is the process running and can it accept connections? - Readiness (
/ready): Is the server fully initialized and are all dependencies (database, external APIs) reachable?
Python Health Check Implementation
from starlette.applications import Starlette
from starlette.routing import Route, Mount
from starlette.responses import JSONResponse
from mcp.server.fastmcp import FastMCP
import time
import asyncio
mcp = FastMCP("My Production Server")
# Track server start time
SERVER_START_TIME = time.time()
# ... define tools, resources, prompts ...
async def health_check(request):
"""Liveness probe -- is the server process running?"""
return JSONResponse(
{
"status": "healthy",
"uptime_seconds": round(time.time() - SERVER_START_TIME, 1),
"version": "1.2.0",
}
)
async def readiness_check(request):
"""Readiness probe -- are all dependencies available?"""
checks = {}
overall_healthy = True
# Check database connectivity
try:
await db.execute("SELECT 1")
checks["database"] = "healthy"
except Exception as e:
checks["database"] = f"unhealthy: {str(e)}"
overall_healthy = False
# Check external API reachability
try:
async with httpx.AsyncClient() as client:
resp = await client.get(
"https://api.example.com/health",
timeout=5.0,
)
resp.raise_for_status()
checks["external_api"] = "healthy"
except Exception as e:
checks["external_api"] = f"unhealthy: {str(e)}"
overall_healthy = False
status_code = 200 if overall_healthy else 503
return JSONResponse(
{
"status": "ready" if overall_healthy else "not_ready",
"checks": checks,
"uptime_seconds": round(time.time() - SERVER_START_TIME, 1),
},
status_code=status_code,
)
app = Starlette(
routes=[
Route("/health", endpoint=health_check),
Route("/ready", endpoint=readiness_check),
Mount("/", app=mcp.sse_app()),
]
)
TypeScript Health Check Implementation
import express from "express";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import { createServer } from "./server.js";
const app = express();
app.use(express.json());
const startTime = Date.now();
const transports = new Map<string, SSEServerTransport>();
// Liveness probe
app.get("/health", (req, res) => {
res.json({
status: "healthy",
uptimeSeconds: Math.round((Date.now() - startTime) / 1000),
activeSessions: transports.size,
memoryMB: Math.round(process.memoryUsage().rss / 1024 / 1024),
});
});
// Readiness probe
app.get("/ready", async (req, res) => {
const checks: Record<string, string> = {};
let healthy = true;
// Check database
try {
await db.query("SELECT 1");
checks.database = "healthy";
} catch (err) {
checks.database = `unhealthy: ${(err as Error).message}`;
healthy = false;
}
res.status(healthy ? 200 : 503).json({
status: healthy ? "ready" : "not_ready",
checks,
uptimeSeconds: Math.round((Date.now() - startTime) / 1000),
});
});
// SSE and message endpoints...
app.get("/sse", async (req, res) => {
const server = createServer();
const transport = new SSEServerTransport("/messages", res);
transports.set(transport.sessionId, transport);
res.on("close", () => {
transports.delete(transport.sessionId);
});
await server.connect(transport);
});
Kubernetes Probe Configuration
If you deploy to Kubernetes, wire these endpoints into your pod spec:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server
spec:
template:
spec:
containers:
- name: mcp-server
image: my-mcp-server:latest
ports:
- containerPort: 3001
livenessProbe:
httpGet:
path: /health
port: 3001
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 3001
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 2
Prometheus Integration
Prometheus is the industry standard for collecting and storing time-series metrics. Integrating Prometheus with your MCP server involves three steps: instrument your code, expose a /metrics endpoint, and configure Prometheus to scrape it.
Python: Prometheus Instrumentation
Install the Prometheus client library:
uv add prometheus-client
Define your metrics and instrument your tool handlers:
from prometheus_client import (
Counter,
Histogram,
Gauge,
CollectorRegistry,
generate_latest,
)
from starlette.responses import Response
import time
# Create a dedicated registry (avoids default process metrics clutter)
registry = CollectorRegistry()
# Define metrics
tool_calls_total = Counter(
"mcp_tool_calls_total",
"Total number of MCP tool calls",
["tool_name", "status"],
registry=registry,
)
tool_call_duration = Histogram(
"mcp_tool_call_duration_seconds",
"Duration of MCP tool calls in seconds",
["tool_name"],
buckets=[0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
registry=registry,
)
active_sessions = Gauge(
"mcp_active_sessions",
"Number of active MCP client sessions",
registry=registry,
)
resource_reads_total = Counter(
"mcp_resource_reads_total",
"Total number of resource read operations",
["resource_uri"],
registry=registry,
)
# Metrics endpoint
async def metrics_endpoint(request):
"""Prometheus metrics endpoint."""
return Response(
content=generate_latest(registry),
media_type="text/plain; version=0.0.4; charset=utf-8",
)
Wrap your tool handlers with instrumentation:
from mcp.server.fastmcp import FastMCP
import functools
mcp = FastMCP("Monitored Server")
def instrumented_tool(func):
"""Decorator that adds Prometheus metrics to MCP tool handlers."""
tool_name = func.__name__
@functools.wraps(func)
async def wrapper(*args, **kwargs):
start_time = time.time()
try:
result = await func(*args, **kwargs)
tool_calls_total.labels(
tool_name=tool_name, status="success"
).inc()
return result
except Exception as e:
tool_calls_total.labels(
tool_name=tool_name, status="error"
).inc()
raise
finally:
duration = time.time() - start_time
tool_call_duration.labels(tool_name=tool_name).observe(duration)
return wrapper
@mcp.tool()
@instrumented_tool
async def search_documents(query: str) -> str:
"""Search the document database for relevant content."""
results = await db.search(query)
return format_results(results)
@mcp.tool()
@instrumented_tool
async def get_user_profile(user_id: str) -> str:
"""Retrieve a user's profile information."""
profile = await db.get_user(user_id)
return format_profile(profile)
TypeScript: Prometheus Instrumentation
Install the prom-client package:
npm install prom-client
Set up metrics and instrument your server:
import { Registry, Counter, Histogram, Gauge } from "prom-client";
const register = new Registry();
// Define metrics
const toolCallsTotal = new Counter({
name: "mcp_tool_calls_total",
help: "Total number of MCP tool calls",
labelNames: ["tool_name", "status"] as const,
registers: [register],
});
const toolCallDuration = new Histogram({
name: "mcp_tool_call_duration_seconds",
help: "Duration of MCP tool calls in seconds",
labelNames: ["tool_name"] as const,
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
registers: [register],
});
const activeSessions = new Gauge({
name: "mcp_active_sessions",
help: "Number of active MCP client sessions",
registers: [register],
});
// Expose metrics endpoint
app.get("/metrics", async (req, res) => {
res.set("Content-Type", register.contentType);
res.send(await register.metrics());
});
// Instrument tool handler
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const toolName = request.params.name;
const timer = toolCallDuration.startTimer({ tool_name: toolName });
try {
const result = await handleToolCall(request);
toolCallsTotal.inc({ tool_name: toolName, status: "success" });
return result;
} catch (error) {
toolCallsTotal.inc({ tool_name: toolName, status: "error" });
throw error;
} finally {
timer();
}
});
// Track sessions
app.get("/sse", async (req, res) => {
activeSessions.inc();
res.on("close", () => {
activeSessions.dec();
});
// ... transport setup
});
Prometheus Scrape Configuration
Add your MCP server as a scrape target in prometheus.yml:
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "mcp-server"
static_configs:
- targets: ["mcp-server:3001"]
metrics_path: /metrics
scrape_interval: 15s
# If you have multiple MCP servers
- job_name: "mcp-servers"
static_configs:
- targets:
- "mcp-weather:3001"
- "mcp-database:3002"
- "mcp-github:3003"
labels:
environment: "production"
For Kubernetes environments, use service discovery instead of static targets:
scrape_configs:
- job_name: "mcp-servers"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
action: keep
regex: "true"
- source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_port
action: replace
target_label: __address__
regex: (.+)
replacement: "${1}:${2}"
Grafana Dashboard Setup
Once Prometheus is collecting metrics, use Grafana to visualize them. This section walks through building an MCP-specific dashboard.
Adding Prometheus as a Data Source
- Open Grafana (default:
http://localhost:3000) - Navigate to Configuration then Data Sources
- Click Add data source and select Prometheus
- Set the URL to your Prometheus instance (e.g.,
http://prometheus:9090) - Click Save & Test
Essential Dashboard Panels
Build your MCP dashboard with these panels:
Panel 1: Tool Call Rate (Requests per Second)
PromQL query:
sum(rate(mcp_tool_calls_total[5m])) by (tool_name)
Visualization: Time series with legend showing each tool. This tells you which tools are being called and how often.
Panel 2: Tool Call Latency (P50 / P95 / P99)
PromQL queries:
# P50
histogram_quantile(0.50, sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool_name))
# P95
histogram_quantile(0.95, sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool_name))
# P99
histogram_quantile(0.99, sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool_name))
Visualization: Time series with three lines per tool. This reveals latency degradation before it impacts users.
Panel 3: Error Rate (%)
PromQL query:
sum(rate(mcp_tool_calls_total{status="error"}[5m])) by (tool_name)
/
sum(rate(mcp_tool_calls_total[5m])) by (tool_name)
* 100
Visualization: Time series with threshold lines at 1% (warning) and 5% (critical).
Panel 4: Active Sessions
PromQL query:
mcp_active_sessions
Visualization: Gauge or stat panel showing current value.
Panel 5: Memory Usage
PromQL query:
process_resident_memory_bytes / 1024 / 1024
Visualization: Time series in MB with the container memory limit shown as a threshold line.
Panel 6: Tool Call Breakdown (Table)
PromQL query:
sum(increase(mcp_tool_calls_total[24h])) by (tool_name, status)
Visualization: Table showing total calls and error count per tool over the last 24 hours.
Dashboard Layout Recommendation
| Row | Left Panel | Right Panel |
|---|---|---|
| 1 | Tool Call Rate (time series) | Error Rate % (time series) |
| 2 | Latency P50/P95/P99 (time series) | Active Sessions (gauge) |
| 3 | Memory Usage (time series) | CPU Usage (time series) |
| 4 | Tool Call Breakdown (table, full width) |
Structured Logging
Metrics tell you what is happening. Logs tell you why. A good logging strategy is essential for diagnosing issues that metrics alone cannot explain.
JSON Logging Format
Use structured JSON logs so they can be parsed by log aggregation tools (Elasticsearch, Loki, CloudWatch, Datadog):
import json
import sys
import time
import uuid
from datetime import datetime, timezone
class MCPLogger:
"""Structured JSON logger for MCP servers.
All output goes to stderr to avoid corrupting
the JSON-RPC protocol on stdout.
"""
def __init__(self, server_name: str, version: str = "1.0.0"):
self.server_name = server_name
self.version = version
def _emit(self, level: str, message: str, **fields):
entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"level": level,
"server": self.server_name,
"version": self.version,
"message": message,
**fields,
}
print(json.dumps(entry), file=sys.stderr)
def info(self, message: str, **fields):
self._emit("INFO", message, **fields)
def warn(self, message: str, **fields):
self._emit("WARN", message, **fields)
def error(self, message: str, **fields):
self._emit("ERROR", message, **fields)
def debug(self, message: str, **fields):
self._emit("DEBUG", message, **fields)
logger = MCPLogger("weather-server", version="2.1.0")
Correlation IDs for Request Tracing
Assign a unique ID to each tool call so you can trace related log entries:
import uuid
@mcp.tool()
async def search_documents(query: str) -> str:
"""Search documents with full observability."""
request_id = str(uuid.uuid4())[:8]
logger.info(
"Tool call started",
tool="search_documents",
request_id=request_id,
query_length=len(query),
)
try:
start = time.time()
results = await db.search(query)
duration = time.time() - start
logger.info(
"Tool call completed",
tool="search_documents",
request_id=request_id,
result_count=len(results),
duration_ms=round(duration * 1000),
)
return format_results(results)
except Exception as e:
logger.error(
"Tool call failed",
tool="search_documents",
request_id=request_id,
error=str(e),
error_type=type(e).__name__,
)
raise
This produces log lines like:
{"timestamp":"2026-02-26T14:30:01Z","level":"INFO","server":"weather-server","version":"2.1.0","message":"Tool call started","tool":"search_documents","request_id":"a1b2c3d4","query_length":42}
{"timestamp":"2026-02-26T14:30:01Z","level":"INFO","server":"weather-server","version":"2.1.0","message":"Tool call completed","tool":"search_documents","request_id":"a1b2c3d4","result_count":7,"duration_ms":234}
Log Levels and When to Use Them
| Level | When to Use | Example |
|---|---|---|
| DEBUG | Detailed diagnostic info, high volume | Parameter values, intermediate results |
| INFO | Normal operations worth recording | Tool call start/complete, session connect/disconnect |
| WARN | Unexpected but recoverable situations | Retry attempt, deprecated tool usage, slow query |
| ERROR | Failures that need attention | Tool call exception, dependency unreachable, data corruption |
In production, set the log level to INFO. Enable DEBUG only when actively investigating an issue to avoid excessive log volume.
TypeScript Structured Logging
type LogLevel = "DEBUG" | "INFO" | "WARN" | "ERROR";
interface LogEntry {
timestamp: string;
level: LogLevel;
server: string;
message: string;
[key: string]: unknown;
}
function createLogger(serverName: string) {
const minLevel: LogLevel =
(process.env.LOG_LEVEL as LogLevel) || "INFO";
const levels: Record<LogLevel, number> = {
DEBUG: 0,
INFO: 1,
WARN: 2,
ERROR: 3,
};
function emit(level: LogLevel, message: string, fields?: Record<string, unknown>) {
if (levels[level] < levels[minLevel]) return;
const entry: LogEntry = {
timestamp: new Date().toISOString(),
level,
server: serverName,
message,
...fields,
};
// stderr is safe for MCP servers (stdout is for JSON-RPC)
console.error(JSON.stringify(entry));
}
return {
debug: (msg: string, fields?: Record<string, unknown>) =>
emit("DEBUG", msg, fields),
info: (msg: string, fields?: Record<string, unknown>) =>
emit("INFO", msg, fields),
warn: (msg: string, fields?: Record<string, unknown>) =>
emit("WARN", msg, fields),
error: (msg: string, fields?: Record<string, unknown>) =>
emit("ERROR", msg, fields),
};
}
const logger = createLogger("github-mcp-server");
Shipping Logs to Aggregation Services
For production, send logs to a centralized system:
Grafana Loki with Promtail:
# promtail-config.yml
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: mcp-server
docker_sd_configs:
- host: unix:///var/run/docker.sock
relabel_configs:
- source_labels: ["__meta_docker_container_name"]
target_label: "container"
pipeline_stages:
- json:
expressions:
level: level
server: server
tool: tool
- labels:
level:
server:
tool:
AWS CloudWatch (for ECS deployments): Configure the awslogs log driver in your ECS task definition. Logs from stderr are automatically shipped to CloudWatch.
Alerting Rules
Metrics and logs are only useful if someone acts on them. Set up alerting rules that notify your team when something goes wrong.
Prometheus Alerting Rules
Create an alerting rules file:
# mcp-alerts.yml
groups:
- name: mcp-server-alerts
rules:
# Server is completely down
- alert: MCPServerDown
expr: up{job="mcp-server"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MCP server is unreachable"
description: "The MCP server has been down for more than 1 minute."
runbook: "https://wiki.example.com/runbooks/mcp-server-down"
# High error rate on any tool
- alert: MCPToolHighErrorRate
expr: >
(
sum(rate(mcp_tool_calls_total{status="error"}[5m])) by (tool_name)
/
sum(rate(mcp_tool_calls_total[5m])) by (tool_name)
) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate on tool {{ $labels.tool_name }}"
description: "Tool {{ $labels.tool_name }} has an error rate above 5% for the last 5 minutes."
# P95 latency exceeding threshold
- alert: MCPToolHighLatency
expr: >
histogram_quantile(0.95,
sum(rate(mcp_tool_call_duration_seconds_bucket[5m])) by (le, tool_name)
) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High P95 latency on tool {{ $labels.tool_name }}"
description: "Tool {{ $labels.tool_name }} P95 latency exceeds 10 seconds."
# Memory approaching container limit
- alert: MCPServerHighMemory
expr: >
process_resident_memory_bytes / 1024 / 1024 > 400
for: 10m
labels:
severity: warning
annotations:
summary: "MCP server memory usage above 400 MB"
description: "Memory usage has been above 400 MB for 10 minutes. Check for leaks."
# No tool calls received (possible connectivity issue)
- alert: MCPServerNoTraffic
expr: >
sum(rate(mcp_tool_calls_total[15m])) == 0
for: 30m
labels:
severity: info
annotations:
summary: "No tool calls received for 30 minutes"
description: "The MCP server has not received any tool calls. This may be normal during off-hours or could indicate a connectivity issue."
Alertmanager Configuration
Route alerts to the right channels based on severity:
# alertmanager.yml
global:
resolve_timeout: 5m
route:
receiver: "default"
group_by: ["alertname", "severity"]
group_wait: 10s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: "pagerduty-critical"
repeat_interval: 1h
- match:
severity: warning
receiver: "slack-warnings"
repeat_interval: 4h
receivers:
- name: "default"
slack_configs:
- api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
channel: "#mcp-alerts"
title: "{{ .GroupLabels.alertname }}"
text: "{{ .CommonAnnotations.description }}"
- name: "pagerduty-critical"
pagerduty_configs:
- service_key: "YOUR_PAGERDUTY_KEY"
description: "{{ .CommonAnnotations.summary }}"
- name: "slack-warnings"
slack_configs:
- api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
channel: "#mcp-alerts"
title: "WARNING: {{ .GroupLabels.alertname }}"
text: "{{ .CommonAnnotations.description }}"
Distributed Tracing for Multi-Server Setups
When your architecture includes multiple MCP servers -- or when MCP servers call downstream APIs -- distributed tracing shows you the full request path.
OpenTelemetry Integration (Python)
uv add opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
OTLPSpanExporter,
)
from opentelemetry.sdk.resources import Resource
# Configure tracing
resource = Resource.create(
{
"service.name": "mcp-weather-server",
"service.version": "2.1.0",
}
)
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(
OTLPSpanExporter(endpoint="http://jaeger:4317")
)
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("mcp-weather-server")
@mcp.tool()
async def get_forecast(latitude: float, longitude: float) -> str:
"""Get weather forecast with distributed tracing."""
with tracer.start_as_current_span("tool.get_forecast") as span:
span.set_attribute("tool.name", "get_forecast")
span.set_attribute("tool.args.latitude", latitude)
span.set_attribute("tool.args.longitude", longitude)
# Child span for the API call
with tracer.start_as_current_span("http.get_points"):
points = await fetch_weather_points(latitude, longitude)
# Another child span
with tracer.start_as_current_span("http.get_forecast"):
forecast = await fetch_forecast(points["forecast_url"])
span.set_attribute("tool.result.periods", len(forecast))
return format_forecast(forecast)
OpenTelemetry Integration (TypeScript)
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-grpc
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-grpc";
import { trace } from "@opentelemetry/api";
const sdk = new NodeSDK({
serviceName: "mcp-github-server",
traceExporter: new OTLPTraceExporter({
url: "http://jaeger:4317",
}),
});
sdk.start();
const tracer = trace.getTracer("mcp-github-server");
// Use in tool handlers
async function handleToolCall(name: string, args: unknown) {
return tracer.startActiveSpan(`tool.${name}`, async (span) => {
span.setAttribute("tool.name", name);
try {
const result = await executeToolLogic(name, args);
span.setAttribute("tool.status", "success");
return result;
} catch (error) {
span.setAttribute("tool.status", "error");
span.recordException(error as Error);
throw error;
} finally {
span.end();
}
});
}
Viewing Traces in Jaeger
Deploy Jaeger alongside your MCP server to visualize traces:
# docker-compose.yml (partial)
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC
environment:
- COLLECTOR_OTLP_ENABLED=true
Open http://localhost:16686 to search for traces by service name, operation, or duration. Each trace shows the full hierarchy of spans, making it easy to identify which downstream call caused latency.
Monitoring at Scale
As you scale from one MCP server to many, your monitoring strategy needs to evolve.
Centralized Monitoring Architecture
┌────────────────────────────────────────────────────────────┐
│ Grafana Dashboard │
│ (Tool call rates, latency, errors across all servers) │
└─────────────┬────────────────────────────┬─────────────────┘
│ │
┌──────▼──────┐ ┌────────▼────────┐
│ Prometheus │ │ Grafana Loki │
│ (metrics) │ │ (logs) │
└──────┬───────┘ └────────┬────────┘
│ │
┌─────────┼─────────────┐ ┌─────────┼──────────┐
▼ ▼ ▼ ▼ ▼ ▼
┌────────┐┌────────┐┌────────┐┌────────┐┌────────┐┌────────┐
│Weather ││GitHub ││Database││Weather ││GitHub ││Database│
│MCP ││MCP ││MCP ││Logs ││Logs ││Logs │
│/metrics││/metrics││/metrics││stderr ││stderr ││stderr │
└────────┘└────────┘└────────┘└────────┘└────────┘└────────┘
Multi-Server Grafana Dashboard
When monitoring multiple MCP servers, add a server selector variable to your Grafana dashboard:
- Create a variable named
serverwith query:label_values(mcp_tool_calls_total, job) - Update all panel queries to filter by
job="$server" - Add an "All" option to see aggregate metrics across all servers
Example query with server filtering:
sum(rate(mcp_tool_calls_total{job="$server"}[5m])) by (tool_name)
Capacity Planning Queries
Use these queries to plan scaling decisions:
# Peak tool call rate over the last 7 days
max_over_time(sum(rate(mcp_tool_calls_total[5m]))[7d:5m])
# Average session duration
histogram_quantile(0.5, rate(mcp_session_duration_seconds_bucket[24h]))
# Memory growth trend (predict when you will hit limits)
predict_linear(process_resident_memory_bytes[6h], 3600 * 24)
Production Readiness Checklist
Before declaring your MCP server production-ready, verify every item on this checklist:
Health and Availability
-
/healthendpoint returns 200 when the server is running -
/readyendpoint checks all critical dependencies (database, APIs) - Kubernetes probes (or equivalent) configured for liveness and readiness
- Graceful shutdown handler drains active sessions on SIGTERM
Metrics and Dashboards
- Prometheus metrics exposed at
/metrics - Tool call counter with tool name and status labels
- Tool call latency histogram with appropriate buckets
- Active session gauge
- Grafana dashboard with rate, latency, error, and session panels
- Dashboard accessible to all team members
Logging
- Structured JSON logging to stderr
- Correlation IDs on every tool call
- Log level configurable via environment variable
- Logs shipped to a centralized aggregation system
- No sensitive data (API keys, passwords) in log output
Alerting
- Alert for server down (critical, 1-minute threshold)
- Alert for high error rate (warning, 5% for 5 minutes)
- Alert for high latency (warning, P95 above threshold)
- Alert for high memory usage (warning, 80% of limit)
- Alert routing configured (PagerDuty for critical, Slack for warning)
- Runbook links included in alert annotations
Infrastructure
- Container resource limits set (CPU and memory)
- Horizontal auto-scaling configured
- TLS/HTTPS enabled on all endpoints
- Secrets injected via environment variables, not hardcoded
What to Read Next
- Deploy your server to production: Deploying Remote MCP Servers
- Debug issues during development: Testing and Debugging MCP Servers
- Secure your deployment: MCP Security Model
- Understand MCP architecture: MCP Architecture Explained
- Browse production-ready servers: MCP Server Directory
Summary
Production MCP server monitoring rests on four pillars: health checks that confirm your server and its dependencies are operational, Prometheus metrics that quantify tool call rates, latency, and errors, structured JSON logging that explains what happened and why, and alerting rules that notify your team before users are impacted.
Start with health checks and basic metrics -- these take less than an hour to implement and immediately give you visibility. Add Grafana dashboards next, then alerting rules. Distributed tracing is the final layer, most valuable when you operate multiple interconnected MCP servers. The production readiness checklist at the end of this guide ensures nothing is missed before your server goes live.
Frequently Asked Questions
What metrics should I monitor on an MCP server?
The most critical metrics are: tool call count (total invocations per tool), tool call latency (P50/P95/P99 response times), error rate (percentage of failed tool calls), active session count (concurrent SSE or Streamable HTTP connections), transport health (connection drops, reconnects), and resource utilization (CPU, memory, open file descriptors). These give you full visibility into both protocol-level and infrastructure-level health.
How do I add a health check endpoint to my MCP server?
For Python MCP servers using FastMCP with Starlette, add a Route('/health', ...) that returns a JSON response with status, uptime, and dependency checks (database, external APIs). For TypeScript servers using Express, add an app.get('/health', ...) handler. The health check should verify that the server can accept connections and that critical dependencies are reachable.
Can I use Prometheus to monitor MCP servers?
Yes. Expose a /metrics endpoint using the prometheus_client library (Python) or prom-client package (TypeScript). Define custom counters for tool calls, histograms for latency, and gauges for active sessions. Prometheus scrapes this endpoint at a configured interval and stores the time-series data for querying and alerting.
How do I set up a Grafana dashboard for MCP servers?
Add Prometheus as a data source in Grafana, then create a dashboard with panels for tool call rate, latency percentiles, error rate, and active sessions. Use PromQL queries like rate(mcp_tool_calls_total[5m]) for throughput and histogram_quantile(0.95, rate(mcp_tool_call_duration_seconds_bucket[5m])) for P95 latency. Import or build a dashboard JSON and share it across your team.
What is the best logging strategy for MCP servers?
Use structured JSON logging to stderr (since stdout is reserved for JSON-RPC in stdio transport). Include fields like timestamp, level, server_name, tool_name, request_id, and duration. Use a correlation ID to trace a single request across log entries. In production, ship logs to a centralized system like Elasticsearch, Loki, or CloudWatch for searching and alerting.
How do I set up alerts for MCP server failures?
Use Prometheus Alertmanager to define alerting rules based on your metrics. Critical alerts include: server down (up == 0), high error rate (error rate above 5% for 5 minutes), high latency (P95 above 10 seconds), and session count anomalies. Route alerts to PagerDuty, Slack, or email based on severity. Always include a runbook link in alert annotations.
How do I implement distributed tracing for multi-server MCP setups?
Use OpenTelemetry to add tracing to your MCP server. Create a span for each tool call and propagate trace context through any downstream HTTP requests. If one MCP server calls another, pass the trace parent header so that the full request path is visible in your tracing backend (Jaeger, Zipkin, or a managed service like Datadog APM).
Should I monitor MCP servers differently in local vs remote deployments?
Yes. Local stdio-based MCP servers have limited monitoring options -- use stderr logging and the MCP Inspector for debugging. Remote MCP servers (SSE or Streamable HTTP) support full observability: health check endpoints, Prometheus metrics, structured logging, and distributed tracing. Focus your monitoring investment on remote production deployments.
How do I monitor MCP server memory usage to prevent OOM crashes?
Export memory metrics using process-level gauges (process.memoryUsage() in Node.js, psutil or /proc/self/status in Python). Track heap usage, RSS, and external memory. Set Prometheus alerts when memory exceeds 80% of your container limit. Common memory leaks in MCP servers include unbounded session maps, growing caches without eviction, and accumulated log buffers.
What is a production readiness checklist for MCP server monitoring?
Before going to production, verify: (1) health check endpoint returns 200 and checks dependencies, (2) Prometheus metrics are exposed for tool calls, latency, and errors, (3) structured JSON logging is configured to stderr, (4) alerting rules cover server down, high error rate, and high latency, (5) Grafana dashboard exists with key panels, (6) log aggregation is configured, (7) graceful shutdown drains active sessions, (8) resource limits (CPU/memory) are set on your container.
Related Guides
The complete guide to MCP security — OAuth 2.1 authentication, permission models, transport security, and securing your MCP deployments.
Production deployment guide for remote MCP servers — Docker containerization, cloud hosting (AWS, GCP, Azure), scaling, and monitoring.
Master MCP server testing with the MCP Inspector, debugging techniques, logging best practices, and automated testing strategies.