Architecture & Components
Pillar Guide

Composability in MCP: Building Hierarchical AI Systems

How MCP's composable architecture enables agents that are both clients and servers, creating powerful hierarchical AI systems.

20 min read
Updated February 25, 2026
By MCP Server Spot

Composability in MCP

Composability is MCP's architectural property that allows any component to simultaneously act as both a client and a server, enabling hierarchical AI systems where specialized agents orchestrate other agents through the same standardized protocol. This is one of the most powerful and underutilized features of the Model Context Protocol.

While most MCP deployments use a simple flat architecture -- one host connecting to several independent servers -- composability enables advanced patterns for multi-agent systems, complex enterprise workflows, and orchestration platforms where AI agents collaborate to accomplish tasks no single agent could handle alone.


Understanding Composability

The Flat Model (Most Common)

The standard MCP deployment is flat: one host, multiple independent servers:

┌─────────────────────────────────────┐
│             HOST                     │
│  ┌───────┐ ┌───────┐ ┌───────┐    │
│  │Client │ │Client │ │Client │    │
│  │  A    │ │  B    │ │  C    │    │
│  └───┬───┘ └───┬───┘ └───┬───┘    │
└──────┼─────────┼─────────┼─────────┘
       │         │         │
  ┌────▼───┐ ┌──▼────┐ ┌──▼────┐
  │GitHub  │ │File   │ │DB     │
  │Server  │ │Server │ │Server │
  └────────┘ └───────┘ └───────┘

Each server is independent. The host's AI model decides which tools to use and in what order. This works well for most scenarios.

The Composable Model

In a composable architecture, servers can also be clients to other servers, creating a hierarchy:

┌─────────────────────────────────┐
│              HOST                │
│         ┌───────┐               │
│         │Client │               │
│         └───┬───┘               │
└─────────────┼───────────────────┘
              │
     ┌────────▼─────────┐
     │  Orchestrator     │ ← Acts as SERVER to host
     │  Server           │ ← Acts as CLIENT to sub-servers
     │                   │
     │  ┌─────┐ ┌─────┐ │
     │  │Cl. 1│ │Cl. 2│ │
     │  └──┬──┘ └──┬──┘ │
     └─────┼───────┼────┘
           │       │
     ┌─────▼──┐ ┌──▼─────┐
     │GitHub  │ │Docker  │
     │Server  │ │Server  │
     └────────┘ └────────┘

The orchestrator is simultaneously:

  • An MCP server that exposes high-level tools (like deploy_application) to the host
  • An MCP client to GitHub and Docker servers, using their tools to execute the deployment

Why This Matters

Composability enables patterns that flat architectures cannot achieve:

  1. Abstraction: Complex multi-tool workflows are abstracted into single high-level tools
  2. Specialization: Each server handles one domain, orchestrated by higher-level agents
  3. Reuse: The same GitHub server works in any composition -- unchanged
  4. Encapsulation: The orchestrator hides internal complexity from the host
  5. Independent evolution: Each layer can be updated independently

Composability Patterns

Pattern 1: Orchestrator Server

The most common composable pattern. An orchestrator server exposes high-level workflow tools while internally delegating to specialized servers.

Host → Orchestrator → [GitHub, Docker, Monitoring, Slack]

User says: "Deploy the latest changes to staging"

Orchestrator's "deploy" tool:
  1. Calls GitHub server: check CI status on main branch
  2. Calls GitHub server: merge PR to staging branch
  3. Calls Docker server: build and push new image
  4. Calls Docker server: update staging deployment
  5. Calls Monitoring server: check health metrics
  6. Calls Slack server: notify #engineering channel

Implementation:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

// Create the orchestrator server
const server = new McpServer({
  name: "deployment-orchestrator",
  version: "1.0.0",
});

// Connect to sub-servers as a client
const githubClient = new Client({ name: "orchestrator", version: "1.0.0" });
const dockerClient = new Client({ name: "orchestrator", version: "1.0.0" });
const slackClient = new Client({ name: "orchestrator", version: "1.0.0" });

async function initSubClients() {
  await githubClient.connect(new StdioClientTransport({
    command: "npx",
    args: ["-y", "@modelcontextprotocol/server-github"],
    env: { GITHUB_PERSONAL_ACCESS_TOKEN: process.env.GITHUB_TOKEN! },
  }));

  await dockerClient.connect(new StdioClientTransport({
    command: "node",
    args: ["./docker-server.js"],
  }));

  await slackClient.connect(new StdioClientTransport({
    command: "npx",
    args: ["-y", "@modelcontextprotocol/server-slack"],
    env: { SLACK_BOT_TOKEN: process.env.SLACK_TOKEN! },
  }));
}

// Expose a high-level deployment tool
server.tool(
  "deploy_to_staging",
  "Deploy the latest changes to the staging environment. This will check CI, merge to staging, build a Docker image, deploy it, and notify the team.",
  {
    repo: z.string().describe("Repository in owner/repo format"),
    branch: z.string().default("main").describe("Branch to deploy from"),
    notifyChannel: z.string().default("#engineering").describe("Slack channel to notify"),
  },
  async ({ repo, branch, notifyChannel }) => {
    const steps: string[] = [];

    // Step 1: Check CI status
    const ciResult = await githubClient.callTool("get_branch_status", {
      owner: repo.split("/")[0],
      repo: repo.split("/")[1],
      branch,
    });
    steps.push(`CI Check: ${ciResult.content[0].text}`);

    // Step 2: Merge to staging
    const mergeResult = await githubClient.callTool("merge_branches", {
      owner: repo.split("/")[0],
      repo: repo.split("/")[1],
      base: "staging",
      head: branch,
    });
    steps.push(`Merge: ${mergeResult.content[0].text}`);

    // Step 3: Build Docker image
    const buildResult = await dockerClient.callTool("build_image", {
      tag: `${repo}:staging-latest`,
      context: ".",
    });
    steps.push(`Build: ${buildResult.content[0].text}`);

    // Step 4: Deploy
    const deployResult = await dockerClient.callTool("update_service", {
      service: `${repo.split("/")[1]}-staging`,
      image: `${repo}:staging-latest`,
    });
    steps.push(`Deploy: ${deployResult.content[0].text}`);

    // Step 5: Notify
    await slackClient.callTool("send_message", {
      channel: notifyChannel,
      text: `Deployed ${repo} (${branch}) to staging. Steps:\n${steps.join("\n")}`,
    });
    steps.push(`Notification sent to ${notifyChannel}`);

    return {
      content: [{
        type: "text",
        text: `Deployment complete!\n\n${steps.map((s, i) => `${i + 1}. ${s}`).join("\n")}`,
      }],
    };
  }
);

// Start everything
await initSubClients();
const transport = new StdioServerTransport();
await server.connect(transport);

Pattern 2: Delegation Chain

A chain of specialized agents, each adding a layer of capability:

Host → Research Agent → [Web Search, Document Reader, Knowledge Base]
                    └──► Analysis Agent → [Database, Calculator, Visualizer]
from mcp.server.fastmcp import FastMCP
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

mcp = FastMCP("research-agent")

# This agent can search the web and then delegate analysis
@mcp.tool()
async def research_and_analyze(topic: str, depth: str = "standard") -> str:
    """Research a topic and provide analysis with data.

    Searches the web for information, then delegates to the
    analysis agent for data-driven insights.

    Args:
        topic: The topic to research
        depth: Research depth (quick, standard, deep)
    """
    # Use web search sub-server
    search_params = StdioServerParameters(
        command="npx",
        args=["-y", "@modelcontextprotocol/server-brave-search"],
        env={"BRAVE_API_KEY": os.environ["BRAVE_API_KEY"]},
    )

    async with stdio_client(search_params) as (read, write):
        async with ClientSession(read, write) as search_session:
            await search_session.initialize()

            # Search for the topic
            search_result = await search_session.call_tool(
                "brave_web_search",
                {"query": topic, "count": 10}
            )

    # Delegate to analysis agent for data processing
    analysis_params = StdioServerParameters(
        command="python",
        args=["analysis_agent.py"],
    )

    async with stdio_client(analysis_params) as (read, write):
        async with ClientSession(read, write) as analysis_session:
            await analysis_session.initialize()

            analysis_result = await analysis_session.call_tool(
                "analyze_data",
                {
                    "raw_data": search_result.content[0].text,
                    "analysis_type": "trend_analysis",
                }
            )

    return f"""Research Results for: {topic}

## Web Research
{search_result.content[0].text}

## Analysis
{analysis_result.content[0].text}"""

Pattern 3: Fan-Out / Fan-In

An orchestrator that dispatches tasks to multiple agents in parallel and aggregates results:

                     ┌──────────────────┐
                     │   Orchestrator   │
                     └──────┬───────────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
         ┌────▼───┐   ┌────▼───┐   ┌────▼───┐
         │ Code   │   │Security│   │ Perf   │
         │ Review │   │ Audit  │   │ Check  │
         │ Agent  │   │ Agent  │   │ Agent  │
         └────────┘   └────────┘   └────────┘
              │             │             │
              └─────────────┼─────────────┘
                            │
                     ┌──────▼───────────┐
                     │  Aggregated      │
                     │  Report          │
                     └──────────────────┘
server.tool(
  "comprehensive_code_review",
  "Run a comprehensive code review with parallel specialized checks",
  {
    prNumber: z.number().describe("Pull request number"),
    repo: z.string().describe("Repository in owner/repo format"),
  },
  async ({ prNumber, repo }) => {
    // Get the PR diff first
    const diff = await githubClient.callTool("get_pull_request_diff", {
      owner: repo.split("/")[0],
      repo: repo.split("/")[1],
      pull_number: prNumber,
    });

    // Fan out to specialized review agents in parallel
    const [codeReview, securityAudit, perfCheck] = await Promise.all([
      codeReviewClient.callTool("review_code", {
        diff: diff.content[0].text,
        focus: "correctness",
      }),
      securityClient.callTool("audit_code", {
        diff: diff.content[0].text,
        severity: "all",
      }),
      perfClient.callTool("check_performance", {
        diff: diff.content[0].text,
        benchmarks: true,
      }),
    ]);

    // Fan in: aggregate results
    return {
      content: [{
        type: "text",
        text: `## Comprehensive Review of PR #${prNumber}

### Code Quality
${codeReview.content[0].text}

### Security Audit
${securityAudit.content[0].text}

### Performance Check
${perfCheck.content[0].text}

### Summary
Review complete. See individual sections above for details.`,
      }],
    };
  }
);

Pattern 4: Pipeline

Sequential processing where each agent transforms and passes data to the next:

Host → Extraction Agent → Transformation Agent → Loading Agent → Validation Agent

Data Pipeline:
  1. Extract: Pull data from source systems
  2. Transform: Clean, normalize, enrich
  3. Load: Write to target systems
  4. Validate: Verify data integrity

Pattern 5: Supervisor

A supervisor agent that monitors other agents and intervenes when needed:

                  ┌────────────────┐
                  │   Supervisor   │
                  │   Agent        │
                  └───┬──────┬────┘
                      │      │
          monitors    │      │   monitors
                      │      │
               ┌──────▼┐  ┌──▼─────┐
               │Worker │  │Worker  │
               │Agent 1│  │Agent 2 │
               └───────┘  └────────┘

Supervisor can:
- Monitor worker progress
- Redistribute tasks if a worker fails
- Aggregate results when workers complete
- Escalate issues that workers cannot handle

Sampling: AI Reasoning in the Hierarchy

What Sampling Enables

The sampling capability allows an MCP server to request LLM completions through its client connection. This is powerful for composability because middle-tier servers can leverage AI reasoning without hosting their own model:

┌──────────┐     ┌─────────────┐     ┌──────────┐
│   Host   │     │Orchestrator │     │Tool      │
│  + LLM   │◄────│Server       │◄────│Server    │
│          │     │(no LLM)     │     │          │
└──────────┘     └─────────────┘     └──────────┘

Tool Server calls tool → result
Orchestrator needs to interpret result using AI
Orchestrator sends sampling request → travels up to Host
Host generates LLM completion → sends back down
Orchestrator uses the completion in its workflow

Sampling Flow

// Server sends sampling request to client
{
  "jsonrpc": "2.0",
  "id": 10,
  "method": "sampling/createMessage",
  "params": {
    "messages": [
      {
        "role": "user",
        "content": {
          "type": "text",
          "text": "Analyze this error log and determine if it indicates a critical issue:\n\n[error log contents]"
        }
      }
    ],
    "maxTokens": 500,
    "systemPrompt": "You are a log analysis expert. Classify the severity of errors."
  }
}

// Client (via host) returns the LLM completion
{
  "jsonrpc": "2.0",
  "id": 10,
  "result": {
    "role": "assistant",
    "content": {
      "type": "text",
      "text": "CRITICAL: This error indicates a database connection pool exhaustion..."
    },
    "model": "claude-4-sonnet-20260101"
  }
}

Human-in-the-Loop

The host can require user consent before completing sampling requests. This ensures that the user remains in control of what the AI generates, even when the request originates from a server deep in the hierarchy:

Server → Client: sampling/createMessage(...)
Client → Host:   "Server 'monitoring' wants AI to analyze error logs. Allow?"
Host → User:     [Consent dialog]
User → Host:     "Yes, allow"
Host → Model:    Generate completion
Model → Host:    Completion
Host → Client → Server: Result

Real-World Composability Examples

Example 1: Enterprise Data Pipeline

Host (Claude Desktop)
  └── Data Pipeline Orchestrator
        ├── Salesforce MCP Server (extract CRM data)
        ├── PostgreSQL MCP Server (extract operational data)
        ├── Data Transform Server (clean, normalize, join)
        ├── BigQuery MCP Server (load into warehouse)
        └── Slack MCP Server (send pipeline completion report)

Example 2: Automated Code Review Platform

Host (Custom CI/CD Integration)
  └── Code Review Orchestrator
        ├── GitHub Server (fetch PR details, post comments)
        ├── Static Analysis Server (run linters, type checkers)
        ├── Security Scanner Server (vulnerability detection)
        ├── Test Runner Server (execute test suite)
        └── Documentation Server (check doc coverage)

Example 3: Customer Support Agent

Host (Support Platform)
  └── Support Orchestrator
        ├── CRM Server (customer history, account details)
        ├── Knowledge Base Server (search help articles)
        ├── Ticket System Server (create, update, close tickets)
        ├── Product Server (check feature flags, known issues)
        └── Escalation Server (route to human agents when needed)

Design Principles for Composable Systems

1. Acyclic Dependencies

Never create circular dependencies between servers:

// BAD: Circular dependency
Server A → Server B → Server A  (infinite loop!)

// GOOD: Acyclic hierarchy
Host → Orchestrator → Server A
                   → Server B

2. Clear Responsibility Boundaries

Each server should have a single, well-defined domain:

// BAD: Server does everything
"super-server" → GitHub + Docker + Slack + Database + Files

// GOOD: Single responsibility
"github-server" → GitHub only
"docker-server" → Docker only
"slack-server"  → Slack only

3. Idempotent Tool Design

Tools in composable systems should be idempotent when possible. If an orchestrator retries a failed step, the tool should produce the same result:

@mcp.tool()
async def create_or_update_issue(repo: str, title: str, body: str) -> str:
    """Create a new issue or update if one with the same title exists.

    This is idempotent — calling it multiple times with the same title
    will update the existing issue rather than creating duplicates.
    """
    existing = await find_issue_by_title(repo, title)
    if existing:
        return await update_issue(repo, existing["number"], body)
    else:
        return await create_issue(repo, title, body)

4. Error Propagation

Errors should propagate up the hierarchy with enough context for the orchestrator to make recovery decisions:

server.tool("step_in_workflow", "...", schema, async (params) => {
  try {
    const result = await subClient.callTool("do_something", params);
    return result;
  } catch (error) {
    return {
      content: [{
        type: "text",
        text: `Step failed: ${error.message}\n\nRecovery options:\n1. Retry (transient error)\n2. Skip this step\n3. Abort the workflow`,
      }],
      isError: true,
    };
  }
});

5. Depth Limiting

Prevent deeply nested compositions that become hard to debug:

const MAX_DEPTH = 5;

async function callWithDepthCheck(client, tool, args, currentDepth) {
  if (currentDepth >= MAX_DEPTH) {
    throw new Error(`Maximum composition depth (${MAX_DEPTH}) reached`);
  }
  return await client.callTool(tool, { ...args, _depth: currentDepth + 1 });
}

When to Use Composability

Good Candidates for Composable Architecture

ScenarioWhy Composability Helps
Multi-step deployment pipelinesAbstracts complex workflows into single tools
Cross-domain data processingEach domain has its own specialized server
Enterprise workflow automationEncapsulates business logic in orchestration layers
Multi-agent AI systemsAgents collaborate through standard protocol
CI/CD and DevOps automationTools chain naturally (test, build, deploy, monitor)

When to Stay Flat

ScenarioWhy Flat Is Simpler
Individual developer toolsNo need for orchestration
Single-domain toolsOne server covers everything needed
Simple tool collectionsHost's AI model handles orchestration fine
Prototype or MVPAdd composability when complexity demands it

Summary

MCP's composability is a powerful architectural property that enables hierarchical AI systems, multi-agent collaboration, and complex workflow orchestration. By allowing any component to act as both client and server, MCP creates a protocol that scales from simple single-server setups to sophisticated enterprise automation platforms.

Most users should start with flat architectures and adopt composability when their use cases demand it. When that time comes, MCP's composable design means they can add hierarchy without redesigning their existing servers.

Continue learning:

Frequently Asked Questions

What is composability in MCP?

Composability in MCP means that any MCP component can simultaneously act as both a client and a server. An MCP server can also be an MCP client to other servers, creating hierarchical chains where one agent orchestrates others. This enables multi-agent systems where specialized agents collaborate through the same standardized protocol.

How can an MCP server also be a client?

An MCP server exposes tools to upstream clients while maintaining its own MCP client connections to downstream servers. For example, an orchestrator server might expose a 'deploy_application' tool to the user's AI assistant, and internally use MCP client connections to a GitHub server, a Docker server, and a monitoring server to execute the deployment.

What is a hierarchical MCP architecture?

A hierarchical MCP architecture is one where MCP servers are arranged in layers. The top-level host connects to orchestrator servers, which in turn connect to specialized tool servers. This creates a tree structure where each level delegates to the level below it, enabling complex multi-step workflows while maintaining clean separation of concerns.

How does MCP support multi-agent systems?

MCP supports multi-agent systems by allowing each agent to act as an MCP server (exposing its capabilities to other agents) and an MCP client (using tools from other agents). Agents can discover each other's capabilities through standard MCP tool discovery, delegate tasks to specialized agents, and aggregate results — all through the same protocol.

What are the benefits of composable MCP architectures?

Benefits include separation of concerns (each server handles one domain), reusability (a GitHub server works in any composition), independent scaling (scale only the servers that need it), easier testing (test each server in isolation), and flexible orchestration (rearrange the hierarchy without rewriting servers).

Can MCP composability create infinite loops?

Yes, if not designed carefully. If Server A calls Server B which calls Server A, an infinite loop occurs. This is prevented through good architectural design: hierarchies should be acyclic (no circular dependencies), and servers should have clear upstream/downstream relationships. Some implementations add depth limits or cycle detection.

What is the sampling capability and how does it relate to composability?

Sampling is an MCP capability that allows a server to request LLM completions through the client. This is powerful for composability because a server in the middle of a hierarchy can ask for AI reasoning without hosting its own model. The request travels up the chain to the host, which generates the completion and sends it back down.

Is composability required for using MCP?

No. Most MCP deployments use a simple flat architecture: one host connecting to several independent servers. Composability is an advanced pattern for complex use cases like multi-agent systems, enterprise workflows, and orchestration platforms. Simple setups work perfectly well without any composability.

Related Guides