Deploying Remote MCP Servers: Docker, Cloud Hosting & Scaling
Production deployment guide for remote MCP servers — Docker containerization, cloud hosting (AWS, GCP, Azure), scaling, and monitoring.
Moving an MCP server from local development to production deployment involves three key transitions: switching from stdio to HTTP transport, containerizing for consistent environments, and adding the operational infrastructure (monitoring, scaling, secrets management) that production workloads require. This guide walks you through each step.
Local MCP servers use stdio transport and run as child processes of the client application. Remote MCP servers use HTTP-based transports (SSE or Streamable HTTP) and run as standalone services accessible over the network. For a detailed comparison, see Local vs Remote MCP Servers.
Choosing a Remote Transport
Remote MCP servers use HTTP-based transports instead of stdio. There are two main options:
| Transport | Protocol | Connection Model | Best For |
|---|---|---|---|
| SSE (Server-Sent Events) | GET for server-to-client, POST for client-to-server | Persistent connection | Real-time updates, long-lived sessions |
| Streamable HTTP | Standard HTTP POST with optional SSE streaming | Request-response | Stateless deployments, serverless, simpler infrastructure |
SSE Transport Implementation (TypeScript)
import express from "express";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
import { createServer } from "./server.js";
const app = express();
app.use(express.json());
// Store active transports
const transports = new Map<string, SSEServerTransport>();
// SSE endpoint — client connects here for server-to-client events
app.get("/sse", async (req, res) => {
const server = createServer();
const transport = new SSEServerTransport("/messages", res);
transports.set(transport.sessionId, transport);
res.on("close", () => {
transports.delete(transport.sessionId);
});
await server.connect(transport);
});
// Messages endpoint — client sends requests here
app.post("/messages", async (req, res) => {
const sessionId = req.query.sessionId as string;
const transport = transports.get(sessionId);
if (!transport) {
res.status(404).json({ error: "Session not found" });
return;
}
await transport.handlePostMessage(req, res);
});
// Health check
app.get("/health", (req, res) => {
res.json({ status: "healthy", sessions: transports.size });
});
const PORT = process.env.PORT || 3001;
app.listen(PORT, () => {
console.error(`MCP SSE server listening on port ${PORT}`);
});
SSE Transport Implementation (Python)
from mcp.server.fastmcp import FastMCP
from starlette.applications import Starlette
from starlette.routing import Route, Mount
from starlette.responses import JSONResponse
import uvicorn
mcp = FastMCP("Remote Weather Server")
# ... define tools, resources, prompts ...
# Create the Starlette app with SSE transport
app = Starlette(
routes=[
Route("/health", endpoint=lambda r: JSONResponse({"status": "healthy"})),
Mount("/", app=mcp.sse_app()),
]
)
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=3001)
Docker Containerization
Docker is the recommended deployment method for MCP servers. It ensures consistent environments across development, staging, and production.
Dockerfile for Python MCP Server
# Multi-stage build for smaller final image
FROM python:3.12-slim AS builder
# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
# Copy dependency files first (for better caching)
COPY pyproject.toml uv.lock ./
# Install dependencies
RUN uv sync --frozen --no-dev
# Copy application code
COPY src/ ./src/
COPY server.py ./
# --- Production image ---
FROM python:3.12-slim
WORKDIR /app
# Copy virtual environment and app from builder
COPY --from=builder /app/.venv /app/.venv
COPY --from=builder /app/src /app/src
COPY --from=builder /app/server.py /app/
# Set the virtual environment path
ENV PATH="/app/.venv/bin:$PATH"
# Expose the SSE port
EXPOSE 3001
# Health check
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD python -c "import httpx; httpx.get('http://localhost:3001/health').raise_for_status()"
# Run the server
CMD ["python", "server.py"]
Dockerfile for TypeScript MCP Server
FROM node:20-slim AS builder
WORKDIR /app
# Copy package files first for caching
COPY package.json package-lock.json ./
RUN npm ci
# Copy source and build
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build
# --- Production image ---
FROM node:20-slim
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production
COPY --from=builder /app/dist ./dist
EXPOSE 3001
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD node -e "fetch('http://localhost:3001/health').then(r => r.ok ? process.exit(0) : process.exit(1))"
CMD ["node", "dist/index.js"]
Dockerfile for Go MCP Server
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o mcp-server .
# --- Minimal production image ---
FROM alpine:3.19
RUN apk --no-cache add ca-certificates
COPY --from=builder /app/mcp-server /usr/local/bin/
EXPOSE 3001
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD wget -qO- http://localhost:3001/health || exit 1
CMD ["mcp-server"]
Docker Compose for Local Testing
# docker-compose.yml
version: "3.8"
services:
mcp-server:
build: .
ports:
- "3001:3001"
environment:
- NODE_ENV=production
- DATABASE_URL=postgres://user:pass@db:5432/myapp
- API_KEY=${API_KEY}
env_file:
- .env.production
restart: unless-stopped
depends_on:
db:
condition: service_healthy
db:
image: postgres:16-alpine
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=myapp
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-LINE", "pg_isready"]
interval: 10s
timeout: 5s
volumes:
pgdata:
Cloud Platform Deployments
AWS (ECS Fargate)
AWS ECS Fargate runs your Docker container without managing servers.
Task definition (simplified):
{
"family": "mcp-server",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"containerDefinitions": [
{
"name": "mcp-server",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/mcp-server:latest",
"portMappings": [
{
"containerPort": 3001,
"protocol": "tcp"
}
],
"environment": [
{ "name": "NODE_ENV", "value": "production" }
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:mcp/database-url"
}
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:3001/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/mcp-server",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "mcp"
}
}
}
]
}
Put an Application Load Balancer (ALB) in front of ECS with:
- HTTPS listener on port 443
- Target group pointing to port 3001
- Sticky sessions enabled (for SSE transport)
- Health check path:
/health
Google Cloud Run
Cloud Run is excellent for MCP servers -- it handles HTTPS, scaling, and container management automatically.
# Build and push the image
gcloud builds submit --tag gcr.io/PROJECT_ID/mcp-server
# Deploy to Cloud Run
gcloud run deploy mcp-server \
--image gcr.io/PROJECT_ID/mcp-server:latest \
--platform managed \
--region us-central1 \
--port 3001 \
--memory 512Mi \
--cpu 1 \
--min-instances 1 \
--max-instances 10 \
--set-env-vars "NODE_ENV=production" \
--set-secrets "DATABASE_URL=mcp-db-url:latest" \
--allow-unauthenticated
Cloud Run automatically provides:
- HTTPS with managed certificates
- Automatic scaling from 1 to N instances
- Built-in monitoring and logging
- Request-based billing
Note on SSE with Cloud Run: Cloud Run supports HTTP streaming (SSE) with a maximum request timeout of 60 minutes. For long-lived SSE connections, set --timeout 3600 and implement client-side reconnection logic.
Azure Container Apps
# Create the container app
az containerapp create \
--name mcp-server \
--resource-group my-rg \
--environment my-env \
--image myregistry.azurecr.io/mcp-server:latest \
--target-port 3001 \
--ingress external \
--min-replicas 1 \
--max-replicas 10 \
--cpu 0.5 \
--memory 1Gi \
--secrets dburl=mcp-database-url \
--env-vars "DATABASE_URL=secretref:dburl"
Railway
Railway offers the simplest deployment path for small to medium MCP servers:
# Install Railway CLI
npm install -g @railway/cli
# Login and initialize
railway login
railway init
# Deploy (auto-detects Dockerfile)
railway up
# Set environment variables
railway variables set DATABASE_URL=postgres://...
railway variables set API_KEY=your-key
Railway automatically:
- Builds from your Dockerfile
- Assigns a public HTTPS URL
- Manages environment variables
- Provides logging and monitoring
Fly.io
# Initialize
fly launch
# This generates a fly.toml configuration:
# fly.toml
app = "mcp-weather-server"
primary_region = "ord"
[build]
[http_service]
internal_port = 3001
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
[checks]
[checks.health]
port = 3001
type = "http"
interval = "30s"
timeout = "5s"
path = "/health"
# Set secrets
fly secrets set DATABASE_URL=postgres://...
fly secrets set API_KEY=your-key
# Deploy
fly deploy
Environment Variables and Secrets Management
The Configuration Hierarchy
1. Cloud secrets manager (highest priority, most secure)
└── AWS Secrets Manager, GCP Secret Manager, Azure Key Vault
2. Platform environment variables
└── ECS task definition, Cloud Run --set-env-vars, Railway variables
3. .env files (development only, NEVER in production images)
└── .env.local, .env.development
4. Hardcoded defaults (lowest priority, non-sensitive values only)
└── Default ports, feature flags, pagination limits
Secrets Management Best Practices
import os
class Config:
"""Server configuration with environment variable loading."""
# Required secrets — fail fast if missing
DATABASE_URL: str = os.environ["DATABASE_URL"]
API_KEY: str = os.environ["API_KEY"]
# Optional with defaults
PORT: int = int(os.environ.get("PORT", "3001"))
LOG_LEVEL: str = os.environ.get("LOG_LEVEL", "INFO")
MAX_CONNECTIONS: int = int(os.environ.get("MAX_CONNECTIONS", "10"))
@classmethod
def validate(cls):
"""Validate configuration at startup."""
required = ["DATABASE_URL", "API_KEY"]
missing = [var for var in required if not os.environ.get(var)]
if missing:
raise EnvironmentError(
f"Missing required environment variables: {', '.join(missing)}"
)
Never Commit Secrets
Add these to your .gitignore:
.env
.env.local
.env.production
*.pem
*.key
secrets/
And use .env.example to document required variables:
# .env.example — Copy to .env and fill in values
DATABASE_URL=postgres://user:password@host:5432/dbname
API_KEY=your-api-key-here
PORT=3001
LOG_LEVEL=INFO
Scaling Strategies
Horizontal Scaling
MCP servers are stateless at the protocol level, making horizontal scaling straightforward:
┌─────────────────┐
│ Load Balancer │
│ (HTTPS + SSL) │
└───────┬─────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ MCP │ │ MCP │ │ MCP │
│ Server 1 │ │ Server 2 │ │ Server 3 │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└─────────────┼─────────────┘
▼
┌──────────────┐
│ Database │
│ / Redis │
└──────────────┘
Key considerations for SSE transport:
- Enable sticky sessions (session affinity) on the load balancer so SSE connections stay on the same instance
- Use Redis for shared state if your server maintains in-memory data
- Implement graceful shutdown to drain connections before scaling down
// Graceful shutdown
process.on("SIGTERM", async () => {
console.error("SIGTERM received, shutting down gracefully...");
// Stop accepting new connections
httpServer.close();
// Close all active MCP sessions
for (const [id, transport] of transports) {
await transport.close();
transports.delete(id);
}
// Wait for in-flight requests to complete
await new Promise((resolve) => setTimeout(resolve, 5000));
process.exit(0);
});
Auto-Scaling Configuration
AWS ECS:
{
"targetTrackingScalingPolicies": [
{
"targetValue": 70,
"predefinedMetricSpecification": {
"predefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"scaleInCooldown": 60,
"scaleOutCooldown": 30
}
]
}
Kubernetes HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mcp-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mcp-server
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Monitoring and Health Checks
Health Check Endpoint
Every production MCP server needs a health check:
app.get("/health", async (req, res) => {
const checks = {
server: "healthy",
uptime: process.uptime(),
activeSessions: transports.size,
memoryUsage: process.memoryUsage(),
timestamp: new Date().toISOString(),
};
// Check database connectivity
try {
await db.query("SELECT 1");
checks.database = "healthy";
} catch (error) {
checks.database = "unhealthy";
res.status(503);
}
res.json(checks);
});
Metrics Collection
Export metrics for your monitoring platform:
from prometheus_client import Counter, Histogram, Gauge, start_http_server
# Metrics
tool_calls_total = Counter(
"mcp_tool_calls_total",
"Total number of MCP tool calls",
["tool_name", "status"],
)
tool_call_duration = Histogram(
"mcp_tool_call_duration_seconds",
"Duration of MCP tool calls",
["tool_name"],
)
active_sessions = Gauge(
"mcp_active_sessions",
"Number of active MCP sessions",
)
# Start Prometheus metrics server on a separate port
start_http_server(9090)
# Use in tool handlers
@mcp.tool()
async def my_tool(query: str) -> str:
with tool_call_duration.labels(tool_name="my_tool").time():
try:
result = await do_work(query)
tool_calls_total.labels(tool_name="my_tool", status="success").inc()
return result
except Exception as e:
tool_calls_total.labels(tool_name="my_tool", status="error").inc()
raise
Alerting Rules
Set up alerts for critical conditions:
# Prometheus alerting rules
groups:
- name: mcp-server
rules:
- alert: MCPServerHighErrorRate
expr: rate(mcp_tool_calls_total{status="error"}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate on MCP server"
- alert: MCPServerDown
expr: up{job="mcp-server"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MCP server is down"
- alert: MCPServerHighLatency
expr: histogram_quantile(0.95, mcp_tool_call_duration_seconds_bucket) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "MCP server P95 latency exceeds 5 seconds"
Reverse Proxy Configuration
Nginx Configuration for MCP SSE
upstream mcp_backend {
ip_hash; # Sticky sessions for SSE
server mcp-server-1:3001;
server mcp-server-2:3001;
server mcp-server-3:3001;
}
server {
listen 443 ssl http2;
server_name mcp.example.com;
ssl_certificate /etc/ssl/certs/mcp.example.com.pem;
ssl_certificate_key /etc/ssl/private/mcp.example.com.key;
# SSE endpoint — disable buffering
location /sse {
proxy_pass http://mcp_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 86400s; # 24 hours for SSE
proxy_set_header X-Accel-Buffering no;
}
# Messages endpoint
location /messages {
proxy_pass http://mcp_backend;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# Health check
location /health {
proxy_pass http://mcp_backend;
}
}
Caddy (Automatic HTTPS)
mcp.example.com {
reverse_proxy /sse mcp-server:3001 {
flush_interval -1
transport http {
read_timeout 0
}
}
reverse_proxy /messages mcp-server:3001
reverse_proxy /health mcp-server:3001
}
Caddy automatically obtains and renews SSL certificates from Let's Encrypt, making it the simplest option for HTTPS.
What to Read Next
- Compare local and remote deployments: Local vs Remote MCP Servers
- Secure your deployment: MCP Security Model
- Cloud provider MCP servers: Cloud Provider MCP Servers
- Test before deploying: Testing & Debugging MCP Servers
- Browse production servers: MCP Server Directory
Summary
Deploying MCP servers to production follows a well-established pattern: switch to HTTP transport (SSE or Streamable HTTP), containerize with Docker, deploy to a managed platform, and add operational infrastructure (health checks, monitoring, auto-scaling). The MCP protocol's stateless request-response model makes horizontal scaling straightforward, and the rich ecosystem of cloud platforms means you can go from a working Docker container to a production deployment in minutes.
Start with the simplest deployment that meets your needs -- Railway or Fly.io for small projects, Cloud Run or ECS Fargate for production workloads -- and add complexity (Kubernetes, custom metrics, multi-region) only when your scale demands it.
Frequently Asked Questions
What transport should I use for remote MCP servers?
Use the Streamable HTTP transport (the modern standard) or SSE (Server-Sent Events) for remote deployments. Streamable HTTP uses standard HTTP requests with optional SSE streaming for server-to-client notifications. SSE uses a persistent HTTP connection for server-to-client events and a separate POST endpoint for client-to-server messages. Both work well behind load balancers and proxies.
Can I deploy an MCP server as a Docker container?
Yes, Docker is one of the best deployment strategies for MCP servers. Create a Dockerfile that installs your dependencies, copies your server code, and sets the appropriate entry point. The container exposes an HTTP port for SSE/Streamable HTTP transport. This works on any container platform: Docker Compose, Kubernetes, AWS ECS, Google Cloud Run, etc.
How do I handle secrets and API keys in production MCP servers?
Never hardcode secrets. Use environment variables injected at deployment time. On cloud platforms, use their native secrets management: AWS Secrets Manager, Google Secret Manager, Azure Key Vault. For Kubernetes, use Secrets resources. For Docker Compose, use the secrets configuration or .env files that are not committed to version control.
Can I deploy an MCP server to a serverless platform like AWS Lambda?
Serverless platforms work with the Streamable HTTP transport pattern since each request is independent. However, SSE transport requires persistent connections, which conflict with serverless invocation models. For serverless, implement a stateless request-response pattern or use a platform like AWS Fargate or Cloud Run that supports long-running connections.
How do I scale MCP servers horizontally?
MCP servers are stateless at the protocol level (each request is independent), making them naturally scalable. Put multiple server instances behind a load balancer. If your server maintains in-memory state (like caches), use an external data store (Redis, database) so all instances share state. Use sticky sessions or session affinity for SSE connections.
Do I need SSL/TLS for remote MCP servers?
Yes, always use HTTPS for remote MCP servers. This protects the JSON-RPC messages (which may contain sensitive data) in transit. Use a reverse proxy like Nginx or Caddy for TLS termination, or leverage cloud platform load balancers that handle SSL automatically.
How do I monitor a production MCP server?
Implement health check endpoints (/health or /ready), export metrics (request count, latency, error rate) to Prometheus or your monitoring platform, use structured logging to stderr, and set up alerts for high error rates or latency. Most cloud platforms provide built-in monitoring dashboards.
What is the recommended architecture for enterprise MCP deployments?
Use a containerized server behind a load balancer with TLS termination. Implement OAuth 2.1 for authentication. Deploy to a managed container service (ECS, Cloud Run, AKS). Use a centralized logging system. Implement rate limiting at the reverse proxy level. Use separate environments (staging, production) with identical configurations.
How do I deploy an MCP server to Railway or Fly.io?
Both platforms support Docker-based deployments. For Railway, connect your Git repository and it auto-deploys on push. For Fly.io, use 'fly launch' with a Dockerfile. Both handle HTTPS, scaling, and environment variables natively. They are excellent choices for small to medium MCP server deployments.
How do I handle database connections in a containerized MCP server?
Use connection pooling to manage database connections efficiently. Pass the database connection string as an environment variable. For cloud databases, use IAM-based authentication when possible. Implement connection retry logic with exponential backoff. In Kubernetes, use init containers to wait for database readiness before starting your MCP server.
Related Articles
Related Guides
Complete guide to monitoring MCP servers in production — health checks, metrics collection with Prometheus, Grafana dashboards, logging strategies, and alerting.
Compare local (stdio) and remote (SSE/HTTP) MCP server deployments. Learn when to use each approach with practical examples and trade-offs.
The complete guide to MCP security — OAuth 2.1 authentication, permission models, transport security, and securing your MCP deployments.
Official and community MCP servers for major cloud providers — AWS, Azure, Google Cloud, Cloudflare, and how they enable AI-powered cloud management.