Documentation Index
Fetch the complete documentation index at: https://www.aidonow.com/llms.txt
Use this file to discover all available pages before exploring further.
Executive Summary
As AI agent organizations mature, agents accumulate tools at a rate that outpaces any individual agent’s ability to route among them effectively. Without structure, agents select suboptimal tools, load irrelevant context into constrained windows, or fail to discover the correct server entirely — defaulting to reimplementation of capabilities that already exist. This paper examines a structured MCP (Model Context Protocol) routing architecture comprising nine specialized servers and ninety-one total tools, organized into discrete capability domains. The central finding is that domain separation, combined with a mandatory capability-check gate enforced before any implementation begins, eliminates an entire class of failure mode — capability reinvention — while simultaneously improving routing precision across all agent sessions. The architecture functions as an agent operating system: a structured dispatch layer that transforms an unordered tool list into a deterministic, navigable capability map.Key Findings
- Unstructured tool sets degrade agent performance as they scale. Beyond approximately twenty tools, agents in flat tool lists exhibit measurable routing errors: wrong server selection, missed capabilities, and context window bloat from loading irrelevant tool definitions.
- Domain separation forces precision about the nature of the problem being solved. Distributing tools across servers named for their domain — authentication, tenancy, governance, requirements — requires the agent to classify its intent before querying, which surfaces ambiguity that a flat list would silently absorb.
- The
find_capability()gate is the single highest-value constraint in the architecture. A mandatory pre-implementation check against the platform’s capability registry prevents agents from rebuilding shared services that already exist, a failure mode documented across more than twenty capability categories in the observed system. - Governance automation via a dedicated MCP server decouples review-trigger logic from agent judgment. When the decision of whether an architecture review or security review is required is delegated to a
detect_required_reviewscall, agents cannot skip review stages through omission or misclassification. - Session initialization via a context-retrieval call provides the orientation that cold-start agents lack. Beginning every session with
get_session_contextprovides active task state, prior work, and in-progress decisions — eliminating the silent failure mode in which an agent begins new work that partially duplicates a task already underway. - MCP routing is the agent analogue of a module system. Just as software engineers import packages rather than reimplementing standard libraries, agents should query specialized capability servers rather than synthesizing behavior from first principles.
1. The Tool Discovery Problem: Why Unstructured Tool Sets Degrade Agent Performance
Flat tool lists do not scale. This finding emerges from the observable behavior of AI agents operating across large tool sets without routing structure: agents select plausible-but-incorrect tools, fail to discover specialized capabilities buried deep in alphabetical or registration-order lists, and — most consequentially — proceed to implement capabilities from scratch when the correct tool is present but not found. The failure modes are distinct from each other and compound in practice. Wrong tool selection occurs when multiple tools share surface-level semantic similarity. An agent navigating a flat list of ninety tools will select the first plausible match rather than the most precise one. In a multi-domain system, “get authentication patterns” and “get permission patterns” may resolve to different servers with fundamentally different scopes. Without domain structure, the agent has no mechanism for distinguishing them. Context window bloat occurs when tool definitions — descriptions, parameter schemas, examples — must be loaded to determine relevance. A ninety-tool flat list loaded in full consumes substantial context window capacity. Domain routing allows agents to load only the tool definitions relevant to the current intent, preserving context for the work itself. Reinvention failure is the most expensive mode. When an agent cannot locate the correct capability through tool discovery, it defaults to implementation. In a mature platform, this means rebuilding notification delivery, approval workflows, permission checks, or SLA tracking — services that already exist, that are already tested, and that other parts of the system depend on. The cost is not merely the time spent building the duplicate; it is the ongoing maintenance burden, the behavioral divergence from the canonical implementation, and the architectural fragmentation that accumulates over time. Structured routing addresses all three failure modes simultaneously. Domain-bounded servers reduce the search space for tool selection. Targeted server queries reduce context window consumption. And the capability-check gate — described in detail in Section 4 — converts capability discovery into an explicit, mandatory step rather than an optional best practice.2. Architecture: Nine Specialized Servers as Distinct Capability Domains
The routing architecture partitions ninety-one tools across nine servers. Each server corresponds to a coherent capability domain with clear ownership boundaries. An agent consulting the routing table can determine, from a single declarative mapping, which server to query for any given intent. The nine domains are: Task management and session context handles the lifecycle of work items — creation, state transitions, closure — and provides the session-orientation capability that agents require at startup. This server also exposes task search, enabling agents to retrieve prior work before initiating potentially duplicative efforts. Platform capability registry is the capability-check gateway. Its primary purpose is to answer the question: does this already exist? Before any implementation begins, agents query this server to determine whether the platform already provides what they are about to build. It also exposes crate and library guides for implementation-pattern consistency. Authentication and authorization covers the permission model: obtaining permission patterns, validating permission strings, and exposing known anti-patterns. This domain is intentionally narrow — it concerns how agents interact with the access control layer, not how they implement access control themselves. Multi-tenancy and isolation addresses the structural requirements of a multi-tenant platform: partition key scoping, tenant lifecycle, data isolation patterns, and the conceptual model governing how tenant boundaries are expressed in the data layer. This server’s separation from authentication is architecturally significant. Tenant scoping is a data concern; authentication is an identity concern. Conflating them produces design errors. User experience and interface components covers the design system, UI component library, and SDK integration patterns. Agents building frontend functionality query this server before writing component code, ensuring alignment with the established component vocabulary rather than introducing new, one-off implementations. Backend implementation patterns provides the scaffolding and validation tools for backend development: entity generation, aggregate definition, repository scaffolding, layer-boundary rules, and domain-driven design patterns. This server encodes the structural conventions of the codebase as queryable tools rather than documentation. Governance and review orchestration automates the determination of when human or senior-agent review is required. Rather than leaving this judgment to individual agent sessions — which will vary in their conservatism — adetect_required_reviews call returns a deterministic answer based on the nature of the change: whether it affects security boundaries, data models, public APIs, or multi-tenant isolation.
Requirements management tracks the relationship between implemented work and the formal requirement documents that authorized it. Agents creating or closing work query this server to ensure traceability from implementation back to source requirements.
Post-closure quality auditing operates after work is complete. It identifies audit candidates in closed tasks, generates structured audit reports, and surfaces improvement proposals. This server exists because quality assurance cannot be limited to pre-implementation gates; patterns that are individually acceptable may produce systemic issues at scale, and post-closure audit is the mechanism for detecting them.
3. The Routing Table: Deterministic Server Selection From Intent
The routing table is the operational core of the architecture. It is not a conceptual model — it is a concrete lookup that agents consult to determine which server to call for any given intent. The table below presents the canonical mapping.| Intent | Server | Representative Tools |
|---|---|---|
| Create, update, start, or end tasks; search prior work; orient at session start | Task management server | create_task, start_task, end_task, search_past_tasks, get_session_context |
| Find existing platform capabilities; check whether code reimplements something that exists | Platform capability server | find_capability, check_reinvention, get_crate_guide |
| Authentication patterns, permission models, access control anti-patterns | Authentication server | get_permission_patterns, validate_permission_string, get_auth_anti_patterns |
| Tenant scoping, partition key design, isolation patterns, tenant lifecycle | Multi-tenancy server | check_tenant_scoping, get_key_design_guide, get_isolation_patterns |
| UI components, design system, frontend SDK integration | UX and interface server | find_ui_component, get_coding_patterns, get_sdk_integration_patterns |
| Entity scaffolding, aggregate definition, repository generation, layer rules | Backend patterns server | tool_scaffold_*, tool_validate_*, tool_get_pattern, tool_get_layer_rules |
| Determine review requirements; trigger architecture, security, or audit reviews | Governance server | detect_required_reviews, request_code_reviews, post_architect_review |
| Track requirements; create or merge requirement pull requests | Requirements server | list_requirements, track_requirement, create_requirement_pr |
| Post-closure audits; surface quality improvement candidates | Quality audit server | find_audit_candidates, post_audit_report, create_proposal_task |
3.1 Structured Routing vs. Unstructured Tool Lists
The following comparison characterizes the practical differences between operating with a structured routing table versus a flat, unordered tool list.| Dimension | Unstructured Tool List | Structured Routing Table |
|---|---|---|
| Tool discovery method | Sequential scan or semantic similarity matching | Declarative lookup by intent |
| Domain ambiguity resolution | Absent — agent selects first plausible match | Explicit — domain separation forces classification |
| Context window consumption | High — all tool definitions loaded or scanned | Low — only target server’s tools loaded |
| Capability reinvention risk | High — absent discovery mechanism, agent builds | Low — find_capability gate enforced pre-implementation |
| Governance compliance | Agent-dependent — varies by session | Deterministic — detect_required_reviews returns required gates |
| Onboarding of new agents | Manual — agent must learn tool landscape | Immediate — routing table provides complete map |
| Maintenance as tools are added | Degrades — new tools buried in flat list | Stable — new tools added to existing domain server |
4. Capability-First Development: find_capability() as a Pre-Implementation Gate
The most consequential constraint in the architecture is not a server or a tool — it is a protocol. Before implementing any new feature, agents must execute a capability check against the platform capability server. This check is not advisory. It is a mandatory gate, and its enforcement is what separates the architecture described here from a tool collection with good intentions.
The protocol requires three steps:
Step 1: Call find_capability("<description of what you are about to build>") against the platform capability server. The call must precede any code generation.
Step 2: If code has already been drafted — whether through speculative generation or prior session context — call check_reinvention("<code snippet>") to determine whether the drafted code reimplements an existing service.
Step 3: Post the results of both calls as a structured comment on the active task before proceeding:
4.1 What the Gate Protects
The platform capability registry documents more than twenty shared services that agents must not reimplement. The following table presents the categories most frequently encountered in the observed system.| Capability Need | Existing Service | Reinvention Cost |
|---|---|---|
| Send notifications to users | Notification service | High — delivery reliability, retry logic, template management |
| Add comments or attachments to work items | Collaboration service | Medium — storage, threading, permission scoping |
| Multi-step approval workflows | Approval service | High — state machine, notifications, audit trail |
| SLA tracking and alerting | SLA service | High — timer management, escalation logic, reporting |
| Webhook delivery to external systems | Webhook service | High — retry logic, signature verification, delivery receipts |
| Billing and subscription management | Billing client | Critical — financial logic must not be duplicated |
| Permission checks on resources | Access control service | Critical — security boundary; duplicates create bypass vectors |
find_capability gate is absent or bypassed. The cost column captures not the initial implementation effort but the ongoing maintenance and correctness risk of running a duplicate implementation alongside the canonical one.
The capability registry is not a documentation artifact — it is a queryable service. The distinction matters. Documentation is read when engineers remember to read it. A queryable gate enforced as a pre-implementation protocol is consulted on every implementation, by every agent, in every session. Compliance is structural, not behavioral.
5. Session Initialization: Orientation Before Action
Cold-start agents — those beginning a new session without explicit task context — are statistically the most likely to produce work that duplicates in-progress efforts, contradicts recent architectural decisions, or ignores active dependencies between tasks. The session initialization protocol addresses this through a two-call sequence that must precede any substantive work. Call 1:get_session_context (task management server). This call returns the current active task state, tasks opened but not yet closed, recent completions, and any in-progress decisions or blockers recorded by prior sessions. An agent that begins work without this call may open a task that is already in progress, implement a feature that another agent is simultaneously implementing, or make an architectural choice that a prior session explicitly deferred pending review.
Call 2: find_capability or find_ui_component (platform capability server or UX server, depending on domain). This call performs the pre-work capability check at the session level rather than the task level — confirming that the session’s general intent does not duplicate existing platform functionality before any task-level planning begins.
The sequence imposes a small, fixed overhead at session start. The alternative — cold-start agents proceeding directly to implementation — produces coordination failures that are expensive to detect and expensive to reverse.
6. The Reinvention Problem: What Happens Without the Gate
The reinvention failure mode is well-characterized in the observed system. When thefind_capability gate is absent, bypassed, or inconsistently enforced, agents independently reconstruct shared services with predictable regularity.
The failure pattern follows a consistent structure:
- An agent is assigned a task requiring, incidentally, notification delivery to a user.
- The agent searches its immediate tool context for a “send notification” tool and finds none — because notification delivery is exposed through a service client, not a dedicated MCP tool.
- The agent, unable to locate an existing capability, proceeds to implement notification delivery directly: an email sender, a queue producer, or a direct API call.
- The implementation is functionally correct in isolation but behaviorally divergent from the canonical notification service: different retry logic, different logging, different template handling.
- The duplicate implementation enters production alongside the canonical service. Both are now maintenance surfaces.
check_reinvention tool addresses the case where drafting has preceded the gate — where an agent has already written code before consulting the capability registry. Rather than discarding the draft, the agent passes it to check_reinvention, which compares the draft against the capability registry’s semantic index and returns a match if the code reproduces existing functionality. This provides a recovery path when the gate is applied retroactively rather than proactively.
7. Implementation Constraints
The architecture described in this paper carries implementation dependencies and operational constraints that practitioners should account for before adoption. Constraint 1: Server proliferation requires naming discipline. Nine servers are navigable with a routing table. Fifteen or twenty servers, without strict domain governance, produce a routing problem of their own. Each new server should be evaluated against the question: does this represent a genuinely distinct capability domain, or does it belong to an existing server? The architecture should expand at domain granularity, not at feature granularity. Constraint 2: The capability registry requires maintenance to remain authoritative. Thefind_capability gate is only as effective as the registry it queries. A registry that does not reflect the current state of the platform — missing recently added services, retaining retired capabilities — will produce false negatives that result in reinvention. Registry maintenance must be treated as a first-class engineering responsibility, not housekeeping.
Constraint 3: Governance automation reduces but does not eliminate human judgment. The detect_required_reviews tool automates the determination of whether a review is required; it does not automate the review itself. The architecture shifts the question from “should I request a review?” — which agents will answer inconsistently — to “given what I have built, what reviews are required?” — which the governance server answers deterministically. Human reviewers remain responsible for the substance of those reviews.
Constraint 4: Session initialization is a protocol, not a feature. get_session_context exists and is available, but its value is contingent on agents calling it at the start of every session. An agent that skips this call receives no error; it simply begins work without orientation. Enforcement requires that session initialization be embedded in agent system prompts, onboarding materials, and any automated agent-launch scaffolding.
Constraint 5: Tool counts within servers must remain manageable. The routing table solves the cross-server discovery problem. It does not solve the within-server tool discovery problem. If a single server accumulates forty tools, agents querying that server face the same flat-list challenges at a smaller scale. Individual server tool counts should be monitored, and servers should be decomposed if they exceed approximately fifteen to twenty tools.
8. Recommendations
- Adopt domain-bounded MCP servers as the primary organizational unit for agent tooling. Do not add new tools to a flat list. Identify the capability domain the tool belongs to and add it to the corresponding server. If no server exists for the domain, create one — but only if the domain is genuinely distinct from all existing servers.
-
Enforce the
find_capabilitygate as a non-negotiable pre-implementation step. Encode this requirement in your agent system prompts, in your task templates, and in any automated workflow scaffolding. An agent that bypasses this gate should have its output flagged for reinvention review before merge. Treat compliance as a code review criterion. -
Begin every agent session with
get_session_contextbefore issuing any task instruction. Build this call into the session initialization logic of your agent launch scripts. Cold-start agents operating without session orientation are a coordination failure waiting to materialize. - Maintain the capability registry as a living document, not a snapshot. Assign ownership for registry maintenance to the team or role responsible for platform capability development. When a new shared service is added to the platform, the registry must be updated in the same pull request. Registry staleness is a security and quality risk, not a documentation gap.
-
Use the governance server’s
detect_required_reviewsas the authoritative trigger for human review. Remove review-trigger logic from agent judgment. An agent that decides for itself whether its change requires an architecture review will make that decision inconsistently. The governance server provides a deterministic, auditable answer — use it. - Monitor within-server tool counts and decompose servers that approach twenty tools. A server with twenty-five tools is exhibiting domain sprawl. Review the tool list, identify whether a coherent sub-domain has emerged, and extract it into a new server with its own routing table entry.
- Document the routing table in a location accessible to all agents at session initialization. The routing table is the agent’s map of the capability landscape. It should be available as a queryable resource — not buried in documentation — so that agents encountering an unfamiliar intent can consult it without context window expansion from loading irrelevant tool definitions.
9. Conclusion: The Routing Table as Agent Infrastructure
The nine-server architecture described in this paper is, at its core, a solution to a library-management problem. Software engineers do not reinvent standard libraries because the module system makes the correct import obvious. The routing table provides an equivalent mechanism for agents: a declarative, complete map from intent to capability that makes correct server selection the path of least resistance. Thefind_capability gate extends this logic one level further. It is not sufficient to route agents to the correct server — agents must also be prevented from implementing capabilities that existing servers already provide. The gate converts this requirement from a best practice into an architectural constraint.
What emerges from the combination of structured routing, capability-check enforcement, and session initialization is not merely a well-organized tool collection. It is an operating environment: a set of structural constraints that shape agent behavior toward precision, coordination, and non-duplication. The agent operating system framing is not metaphorical. The routing table, the capability registry, the governance trigger, and the session context service perform functions analogous to process scheduling, library linking, permission enforcement, and state persistence in conventional operating systems — with agents as the processes and MCP servers as the kernel services.
As autonomous agent organizations scale to larger tool inventories, more concurrent agents, and deeper integration with platform services, the investment in structured routing infrastructure will compound. Organizations that establish domain-bounded server architecture and capability-check discipline now will find that adding new capabilities, onboarding new agents, and auditing agent behavior are all substantially simpler than they would be in a flat-tool environment. The architecture described here represents a baseline from which that scale can be achieved without the coordination failures that unstructured tool proliferation reliably produces.
All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.