MCP Tool Routing as an Agent Operating System

Executive Summary

As AI agent organizations mature, agents accumulate tools at a rate that outpaces any individual agent’s ability to route among them effectively. Without structure, agents select suboptimal tools, load irrelevant context into constrained windows, or fail to discover the correct server entirely — defaulting to reimplementation of capabilities that already exist. This paper examines a structured MCP (Model Context Protocol) routing architecture comprising nine specialized servers and ninety-one total tools, organized into discrete capability domains. The central finding is that domain separation, combined with a mandatory capability-check gate enforced before any implementation begins, eliminates an entire class of failure mode — capability reinvention — while simultaneously improving routing precision across all agent sessions. The architecture functions as an agent operating system: a structured dispatch layer that transforms an unordered tool list into a deterministic, navigable capability map.

Key Findings

Unstructured tool sets degrade agent performance as they scale. Beyond approximately twenty tools, agents in flat tool lists exhibit measurable routing errors: wrong server selection, missed capabilities, and context window bloat from loading irrelevant tool definitions.
Domain separation forces precision about the nature of the problem being solved. Distributing tools across servers named for their domain — authentication, tenancy, governance, requirements — requires the agent to classify its intent before querying, which surfaces ambiguity that a flat list would silently absorb.
The find_capability() gate is the single highest-value constraint in the architecture. A mandatory pre-implementation check against the platform’s capability registry prevents agents from rebuilding shared services that already exist, a failure mode documented across more than twenty capability categories in the observed system.
Governance automation via a dedicated MCP server decouples review-trigger logic from agent judgment. When the decision of whether an architecture review or security review is required is delegated to a detect_required_reviews call, agents cannot skip review stages through omission or misclassification.
Session initialization via a context-retrieval call provides the orientation that cold-start agents lack. Beginning every session with get_session_context provides active task state, prior work, and in-progress decisions — eliminating the silent failure mode in which an agent begins new work that partially duplicates a task already underway.
MCP routing is the agent analogue of a module system. Just as software engineers import packages rather than reimplementing standard libraries, agents should query specialized capability servers rather than synthesizing behavior from first principles.

1. The Tool Discovery Problem: Why Unstructured Tool Sets Degrade Agent Performance

Flat tool lists do not scale. This finding emerges from the observable behavior of AI agents operating across large tool sets without routing structure: agents select plausible-but-incorrect tools, fail to discover specialized capabilities buried deep in alphabetical or registration-order lists, and — most consequentially — proceed to implement capabilities from scratch when the correct tool is present but not found. The failure modes are distinct from each other and compound in practice. Wrong tool selection occurs when multiple tools share surface-level semantic similarity. An agent navigating a flat list of ninety tools will select the first plausible match rather than the most precise one. In a multi-domain system, “get authentication patterns” and “get permission patterns” may resolve to different servers with fundamentally different scopes. Without domain structure, the agent has no mechanism for distinguishing them. Context window bloat occurs when tool definitions — descriptions, parameter schemas, examples — must be loaded to determine relevance. A ninety-tool flat list loaded in full consumes substantial context window capacity. Domain routing allows agents to load only the tool definitions relevant to the current intent, preserving context for the work itself. Reinvention failure is the most expensive mode. When an agent cannot locate the correct capability through tool discovery, it defaults to implementation. In a mature platform, this means rebuilding notification delivery, approval workflows, permission checks, or SLA tracking — services that already exist, that are already tested, and that other parts of the system depend on. The cost is not merely the time spent building the duplicate; it is the ongoing maintenance burden, the behavioral divergence from the canonical implementation, and the architectural fragmentation that accumulates over time.

An agent that cannot find a capability in its tool set will build the capability itself. This is not a reasoning failure — it is a rational response to an absent capability signal. The failure belongs to the architecture, not the agent. Structured routing with an explicit capability-check gate is the correct mitigation.

Structured routing addresses all three failure modes simultaneously. Domain-bounded servers reduce the search space for tool selection. Targeted server queries reduce context window consumption. And the capability-check gate — described in detail in Section 4 — converts capability discovery into an explicit, mandatory step rather than an optional best practice.

2. Architecture: Nine Specialized Servers as Distinct Capability Domains

The routing architecture partitions ninety-one tools across nine servers. Each server corresponds to a coherent capability domain with clear ownership boundaries. An agent consulting the routing table can determine, from a single declarative mapping, which server to query for any given intent. The nine domains are: Task management and session context handles the lifecycle of work items — creation, state transitions, closure — and provides the session-orientation capability that agents require at startup. This server also exposes task search, enabling agents to retrieve prior work before initiating potentially duplicative efforts. Platform capability registry is the capability-check gateway. Its primary purpose is to answer the question: does this already exist? Before any implementation begins, agents query this server to determine whether the platform already provides what they are about to build. It also exposes crate and library guides for implementation-pattern consistency. Authentication and authorization covers the permission model: obtaining permission patterns, validating permission strings, and exposing known anti-patterns. This domain is intentionally narrow — it concerns how agents interact with the access control layer, not how they implement access control themselves. Multi-tenancy and isolation addresses the structural requirements of a multi-tenant platform: partition key scoping, tenant lifecycle, data isolation patterns, and the conceptual model governing how tenant boundaries are expressed in the data layer. This server’s separation from authentication is architecturally significant. Tenant scoping is a data concern; authentication is an identity concern. Conflating them produces design errors. User experience and interface components covers the design system, UI component library, and SDK integration patterns. Agents building frontend functionality query this server before writing component code, ensuring alignment with the established component vocabulary rather than introducing new, one-off implementations. Backend implementation patterns provides the scaffolding and validation tools for backend development: entity generation, aggregate definition, repository scaffolding, layer-boundary rules, and domain-driven design patterns. This server encodes the structural conventions of the codebase as queryable tools rather than documentation. Governance and review orchestration automates the determination of when human or senior-agent review is required. Rather than leaving this judgment to individual agent sessions — which will vary in their conservatism — a detect_required_reviews call returns a deterministic answer based on the nature of the change: whether it affects security boundaries, data models, public APIs, or multi-tenant isolation. Requirements management tracks the relationship between implemented work and the formal requirement documents that authorized it. Agents creating or closing work query this server to ensure traceability from implementation back to source requirements. Post-closure quality auditing operates after work is complete. It identifies audit candidates in closed tasks, generates structured audit reports, and surfaces improvement proposals. This server exists because quality assurance cannot be limited to pre-implementation gates; patterns that are individually acceptable may produce systemic issues at scale, and post-closure audit is the mechanism for detecting them.

3. The Routing Table: Deterministic Server Selection From Intent

The routing table is the operational core of the architecture. It is not a conceptual model — it is a concrete lookup that agents consult to determine which server to call for any given intent. The table below presents the canonical mapping.

Intent	Server	Representative Tools
Create, update, start, or end tasks; search prior work; orient at session start	Task management server	`create_task`, `start_task`, `end_task`, `search_past_tasks`, `get_session_context`
Find existing platform capabilities; check whether code reimplements something that exists	Platform capability server	`find_capability`, `check_reinvention`, `get_crate_guide`
Authentication patterns, permission models, access control anti-patterns	Authentication server	`get_permission_patterns`, `validate_permission_string`, `get_auth_anti_patterns`
Tenant scoping, partition key design, isolation patterns, tenant lifecycle	Multi-tenancy server	`check_tenant_scoping`, `get_key_design_guide`, `get_isolation_patterns`
UI components, design system, frontend SDK integration	UX and interface server	`find_ui_component`, `get_coding_patterns`, `get_sdk_integration_patterns`
Entity scaffolding, aggregate definition, repository generation, layer rules	Backend patterns server	`tool_scaffold_`, `tool_validate_`, `tool_get_pattern`, `tool_get_layer_rules`
Determine review requirements; trigger architecture, security, or audit reviews	Governance server	`detect_required_reviews`, `request_code_reviews`, `post_architect_review`
Track requirements; create or merge requirement pull requests	Requirements server	`list_requirements`, `track_requirement`, `create_requirement_pr`
Post-closure audits; surface quality improvement candidates	Quality audit server	`find_audit_candidates`, `post_audit_report`, `create_proposal_task`

The value of this table lies not in the individual mappings but in its completeness and exclusivity. Each domain has exactly one server. An agent facing any intent in the system should be able to locate the correct server in a single table lookup without ambiguity.

3.1 Structured Routing vs. Unstructured Tool Lists

The following comparison characterizes the practical differences between operating with a structured routing table versus a flat, unordered tool list.

Dimension	Unstructured Tool List	Structured Routing Table
Tool discovery method	Sequential scan or semantic similarity matching	Declarative lookup by intent
Domain ambiguity resolution	Absent — agent selects first plausible match	Explicit — domain separation forces classification
Context window consumption	High — all tool definitions loaded or scanned	Low — only target server’s tools loaded
Capability reinvention risk	High — absent discovery mechanism, agent builds	Low — `find_capability` gate enforced pre-implementation
Governance compliance	Agent-dependent — varies by session	Deterministic — `detect_required_reviews` returns required gates
Onboarding of new agents	Manual — agent must learn tool landscape	Immediate — routing table provides complete map
Maintenance as tools are added	Degrades — new tools buried in flat list	Stable — new tools added to existing domain server

4. Capability-First Development: `find_capability()` as a Pre-Implementation Gate

The most consequential constraint in the architecture is not a server or a tool — it is a protocol. Before implementing any new feature, agents must execute a capability check against the platform capability server. This check is not advisory. It is a mandatory gate, and its enforcement is what separates the architecture described here from a tool collection with good intentions. The protocol requires three steps: Step 1: Call find_capability("<description of what you are about to build>") against the platform capability server. The call must precede any code generation. Step 2: If code has already been drafted — whether through speculative generation or prior session context — call check_reinvention("<code snippet>") to determine whether the drafted code reimplements an existing service. Step 3: Post the results of both calls as a structured comment on the active task before proceeding:

## Capability Check

### find_capability("send user notification")
Result: FOUND — NotificationService (notifications module)
Available methods: send_email(), send_in_app(), send_webhook()

### Conclusion
Do not implement. Use NotificationService from the notifications module.

Only when both calls return no match may the agent proceed to implementation.

4.1 What the Gate Protects

The platform capability registry documents more than twenty shared services that agents must not reimplement. The following table presents the categories most frequently encountered in the observed system.

Capability Need	Existing Service	Reinvention Cost
Send notifications to users	Notification service	High — delivery reliability, retry logic, template management
Add comments or attachments to work items	Collaboration service	Medium — storage, threading, permission scoping
Multi-step approval workflows	Approval service	High — state machine, notifications, audit trail
SLA tracking and alerting	SLA service	High — timer management, escalation logic, reporting
Webhook delivery to external systems	Webhook service	High — retry logic, signature verification, delivery receipts
Billing and subscription management	Billing client	Critical — financial logic must not be duplicated
Permission checks on resources	Access control service	Critical — security boundary; duplicates create bypass vectors

Each entry in this table represents a capability that agents have historically attempted to reimplement when the find_capability gate is absent or bypassed. The cost column captures not the initial implementation effort but the ongoing maintenance and correctness risk of running a duplicate implementation alongside the canonical one.

The capability registry is not a documentation artifact — it is a queryable service. The distinction matters. Documentation is read when engineers remember to read it. A queryable gate enforced as a pre-implementation protocol is consulted on every implementation, by every agent, in every session. Compliance is structural, not behavioral.

5. Session Initialization: Orientation Before Action

Cold-start agents — those beginning a new session without explicit task context — are statistically the most likely to produce work that duplicates in-progress efforts, contradicts recent architectural decisions, or ignores active dependencies between tasks. The session initialization protocol addresses this through a two-call sequence that must precede any substantive work. Call 1: get_session_context (task management server). This call returns the current active task state, tasks opened but not yet closed, recent completions, and any in-progress decisions or blockers recorded by prior sessions. An agent that begins work without this call may open a task that is already in progress, implement a feature that another agent is simultaneously implementing, or make an architectural choice that a prior session explicitly deferred pending review. Call 2: find_capability or find_ui_component (platform capability server or UX server, depending on domain). This call performs the pre-work capability check at the session level rather than the task level — confirming that the session’s general intent does not duplicate existing platform functionality before any task-level planning begins. The sequence imposes a small, fixed overhead at session start. The alternative — cold-start agents proceeding directly to implementation — produces coordination failures that are expensive to detect and expensive to reverse.

6. The Reinvention Problem: What Happens Without the Gate

The reinvention failure mode is well-characterized in the observed system. When the find_capability gate is absent, bypassed, or inconsistently enforced, agents independently reconstruct shared services with predictable regularity. The failure pattern follows a consistent structure:

An agent is assigned a task requiring, incidentally, notification delivery to a user.
The agent searches its immediate tool context for a “send notification” tool and finds none — because notification delivery is exposed through a service client, not a dedicated MCP tool.
The agent, unable to locate an existing capability, proceeds to implement notification delivery directly: an email sender, a queue producer, or a direct API call.
The implementation is functionally correct in isolation but behaviorally divergent from the canonical notification service: different retry logic, different logging, different template handling.
The duplicate implementation enters production alongside the canonical service. Both are now maintenance surfaces.

Multiplied across twenty-plus capability categories and an unbounded number of agent sessions, this pattern produces an architecture characterized by duplicate implementations, divergent behavior, and steadily increasing maintenance load.

A duplicate implementation of a permission check is not merely redundant — it is a security risk. Two independent implementations of the same authorization logic will diverge over time as the canonical implementation is updated to address new requirements or vulnerabilities. The duplicate will not receive those updates. The capability-check gate must be treated as a security control, not only an efficiency measure, in any system where access control logic is involved.

The check_reinvention tool addresses the case where drafting has preceded the gate — where an agent has already written code before consulting the capability registry. Rather than discarding the draft, the agent passes it to check_reinvention, which compares the draft against the capability registry’s semantic index and returns a match if the code reproduces existing functionality. This provides a recovery path when the gate is applied retroactively rather than proactively.

7. Implementation Constraints

The architecture described in this paper carries implementation dependencies and operational constraints that practitioners should account for before adoption. Constraint 1: Server proliferation requires naming discipline. Nine servers are navigable with a routing table. Fifteen or twenty servers, without strict domain governance, produce a routing problem of their own. Each new server should be evaluated against the question: does this represent a genuinely distinct capability domain, or does it belong to an existing server? The architecture should expand at domain granularity, not at feature granularity. Constraint 2: The capability registry requires maintenance to remain authoritative. The find_capability gate is only as effective as the registry it queries. A registry that does not reflect the current state of the platform — missing recently added services, retaining retired capabilities — will produce false negatives that result in reinvention. Registry maintenance must be treated as a first-class engineering responsibility, not housekeeping. Constraint 3: Governance automation reduces but does not eliminate human judgment. The detect_required_reviews tool automates the determination of whether a review is required; it does not automate the review itself. The architecture shifts the question from “should I request a review?” — which agents will answer inconsistently — to “given what I have built, what reviews are required?” — which the governance server answers deterministically. Human reviewers remain responsible for the substance of those reviews. Constraint 4: Session initialization is a protocol, not a feature. get_session_context exists and is available, but its value is contingent on agents calling it at the start of every session. An agent that skips this call receives no error; it simply begins work without orientation. Enforcement requires that session initialization be embedded in agent system prompts, onboarding materials, and any automated agent-launch scaffolding. Constraint 5: Tool counts within servers must remain manageable. The routing table solves the cross-server discovery problem. It does not solve the within-server tool discovery problem. If a single server accumulates forty tools, agents querying that server face the same flat-list challenges at a smaller scale. Individual server tool counts should be monitored, and servers should be decomposed if they exceed approximately fifteen to twenty tools.

8. Recommendations

Adopt domain-bounded MCP servers as the primary organizational unit for agent tooling. Do not add new tools to a flat list. Identify the capability domain the tool belongs to and add it to the corresponding server. If no server exists for the domain, create one — but only if the domain is genuinely distinct from all existing servers.
Enforce the find_capability gate as a non-negotiable pre-implementation step. Encode this requirement in your agent system prompts, in your task templates, and in any automated workflow scaffolding. An agent that bypasses this gate should have its output flagged for reinvention review before merge. Treat compliance as a code review criterion.
Begin every agent session with get_session_context before issuing any task instruction. Build this call into the session initialization logic of your agent launch scripts. Cold-start agents operating without session orientation are a coordination failure waiting to materialize.
Maintain the capability registry as a living document, not a snapshot. Assign ownership for registry maintenance to the team or role responsible for platform capability development. When a new shared service is added to the platform, the registry must be updated in the same pull request. Registry staleness is a security and quality risk, not a documentation gap.
Use the governance server’s detect_required_reviews as the authoritative trigger for human review. Remove review-trigger logic from agent judgment. An agent that decides for itself whether its change requires an architecture review will make that decision inconsistently. The governance server provides a deterministic, auditable answer — use it.
Monitor within-server tool counts and decompose servers that approach twenty tools. A server with twenty-five tools is exhibiting domain sprawl. Review the tool list, identify whether a coherent sub-domain has emerged, and extract it into a new server with its own routing table entry.
Document the routing table in a location accessible to all agents at session initialization. The routing table is the agent’s map of the capability landscape. It should be available as a queryable resource — not buried in documentation — so that agents encountering an unfamiliar intent can consult it without context window expansion from loading irrelevant tool definitions.

9. Conclusion: The Routing Table as Agent Infrastructure

The nine-server architecture described in this paper is, at its core, a solution to a library-management problem. Software engineers do not reinvent standard libraries because the module system makes the correct import obvious. The routing table provides an equivalent mechanism for agents: a declarative, complete map from intent to capability that makes correct server selection the path of least resistance. The find_capability gate extends this logic one level further. It is not sufficient to route agents to the correct server — agents must also be prevented from implementing capabilities that existing servers already provide. The gate converts this requirement from a best practice into an architectural constraint. What emerges from the combination of structured routing, capability-check enforcement, and session initialization is not merely a well-organized tool collection. It is an operating environment: a set of structural constraints that shape agent behavior toward precision, coordination, and non-duplication. The agent operating system framing is not metaphorical. The routing table, the capability registry, the governance trigger, and the session context service perform functions analogous to process scheduling, library linking, permission enforcement, and state persistence in conventional operating systems — with agents as the processes and MCP servers as the kernel services. As autonomous agent organizations scale to larger tool inventories, more concurrent agents, and deeper integration with platform services, the investment in structured routing infrastructure will compound. Organizations that establish domain-bounded server architecture and capability-check discipline now will find that adding new capabilities, onboarding new agents, and auditing agent behavior are all substantially simpler than they would be in a flat-tool environment. The architecture described here represents a baseline from which that scale can be achieved without the coordination failures that unstructured tool proliferation reliably produces.

All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.

Overview

Practical Guides

Insights & Debate

MCP Tool Routing as an Agent Operating System

Executive Summary

Key Findings

1. The Tool Discovery Problem: Why Unstructured Tool Sets Degrade Agent Performance

2. Architecture: Nine Specialized Servers as Distinct Capability Domains

3. The Routing Table: Deterministic Server Selection From Intent

3.1 Structured Routing vs. Unstructured Tool Lists

4. Capability-First Development: `find_capability()` as a Pre-Implementation Gate

4.1 What the Gate Protects

5. Session Initialization: Orientation Before Action

6. The Reinvention Problem: What Happens Without the Gate

7. Implementation Constraints

8. Recommendations

9. Conclusion: The Routing Table as Agent Infrastructure

Overview

Practical Guides

Insights & Debate

Documentation Index

​Executive Summary

​Key Findings

​1. The Tool Discovery Problem: Why Unstructured Tool Sets Degrade Agent Performance

​2. Architecture: Nine Specialized Servers as Distinct Capability Domains

​3. The Routing Table: Deterministic Server Selection From Intent

​3.1 Structured Routing vs. Unstructured Tool Lists

​4. Capability-First Development: find_capability() as a Pre-Implementation Gate

​4.1 What the Gate Protects

​5. Session Initialization: Orientation Before Action

​6. The Reinvention Problem: What Happens Without the Gate

​7. Implementation Constraints

​8. Recommendations

​9. Conclusion: The Routing Table as Agent Infrastructure

Executive Summary

Key Findings

1. The Tool Discovery Problem: Why Unstructured Tool Sets Degrade Agent Performance

2. Architecture: Nine Specialized Servers as Distinct Capability Domains

3. The Routing Table: Deterministic Server Selection From Intent

3.1 Structured Routing vs. Unstructured Tool Lists

4. Capability-First Development: `find_capability()` as a Pre-Implementation Gate

4.1 What the Gate Protects

5. Session Initialization: Orientation Before Action

6. The Reinvention Problem: What Happens Without the Gate

7. Implementation Constraints

8. Recommendations

9. Conclusion: The Routing Table as Agent Infrastructure