This is Episode 9 of the Autonomous Dev Org series — an honest account of building a development organization where AI handles implementation and humans handle direction.Documentation Index
Fetch the complete documentation index at: https://www.aidonow.com/llms.txt
Use this file to discover all available pages before exploring further.
The Coordination Tax
Episode 8 gave us the Global Architect — a system-wide view that catches boundary blindness before implementation begins. Episode 7 gave us Hook Shields — sub-2-second enforcement at the point of write. Both worked. Neither was sustainable at scale. The problem was coordination. Every task required a judgment call: is this local enough for Hook Shields only, or complex enough to warrant a full Global Architect pass? We were making that call manually, task by task. In a loop processing dozens of tasks per day, the coordination overhead had become its own full-time job — exactly the kind of work we’d built the loop to eliminate. We had two powerful verification systems and were manually routing between them. That’s not an autonomous organization. That’s a human-in-the-loop wrapper around automation. We needed the loop itself to understand what kind of task it was handling and route accordingly. The Intelligence Router was the missing piece.What We Built
We built an Intelligence Router — a lightweight classification layer that sits in front of both verification systems and decides, for each task, which tier of verification is appropriate. Local tasks stay in the inner loop. Cross-repo tasks escalate to the Global Architect. Persistent violations trigger a new process: the Correction Cycle. The Correction Cycle is what makes the loop genuinely self-healing. When a violation recurs despite Hook Shields — which shouldn’t happen but occasionally does — the cycle doesn’t just fix the code. It analyzes the failure pattern, codifies a new or updated shield, and deploys it across all repositories simultaneously. The organization learns from every bug, at the speed of a git push.The War Room
Architect: Okay, we’ve got local Hook Shields for the inner loop and a Global Architect for the boundaries. But the coordination is starting to feel like a full-time job. How do we scale this without manually routing every task? Director: That’s the Master Agent Fallacy. We tried building one agent that knew everything about every repo. It got slow and overconfident. I’m moving to a Hierarchy of Verification, managed by an Intelligence Router. Architect: Small fixes stay local, big changes go global? Director: Precisely. But the router decides — not us. Trust becomes a routing decision based on context weight and risk, made automatically from the task specification. Builder: What does that look like in practice? Director: The router reads the incoming task. Single repository, no shared types touched? Stays local. Hook Shields run. Touches the Core SDK, or modifies a type used across repo boundaries? The Global Architect runs first and produces a Sync Plan. Domain agents work from the plan. Architect: And when something still fails after all that? Director: That’s the interesting case. A recurring violation — something that slips through existing shields — isn’t just a bug. It’s a gap in the governance model. The Correction Cycle kicks in: analyze the pattern, write a new shield, deploy it everywhere. Builder: So the organization gets smarter every time I mess up? Director: That’s exactly right. In a traditional team, a lesson learned by one person stays with that person. In this system, a violation caught anywhere becomes enforcement applied everywhere. The IQ of the loop increases with every error it encounters.The Routing Decision
The Intelligence Router classifies tasks along two dimensions: scope and risk. Scope answers: how many repositories does this task touch?- Single-repo: the task is bounded. Local verification is sufficient.
- Cross-repo: the task has boundary effects. Global verification is required.
- Low: isolated behavior, easily reversed, no downstream consumers affected.
- High: shared types, public contracts, or infrastructure changes with wide impact.
| Scope | Risk | Verification Tier |
|---|---|---|
| Single-repo | Low | Inner loop only (Hook Shields + compile + tests) |
| Single-repo | High | Inner loop + targeted boundary check |
| Cross-repo | Low | Global Architect scan + local execution |
| Cross-repo | High | Full Global Architect pass + SDK-First gate + coordinated execution |
The Correction Cycle
Hook Shields have a coverage problem: they catch what you’ve already thought of. A new violation class — one that wasn’t anticipated when the shields were written — can slip through the inner loop undetected. The Correction Cycle is how the loop responds when this happens.Violation Detected
The Verifier agent catches a violation in a merged PR that existing shields
didn’t prevent. This is the signal that governance has a gap.
Pattern Analysis
The Global Architect analyzes the failure: what rule was violated, why the
existing shield didn’t catch it, and whether this represents a new pattern
or a gap in existing coverage. The analysis produces a structured description
of the violation class.
Shield Codification
A new Hook Shield is written — or an existing one updated — to cover the
gap. The shield is specific: it targets the exact pattern that slipped through,
with an error message that names the rule and explains the fix.
The Self-Healing Insight
Traditional teams fix bugs without systematically converting them into prevention. A developer catches a floating-point currency bug, fixes it, leaves a comment, maybe adds a test. Six months later, a new developer makes the same mistake — unfamiliar with the history, unseen by any enforcement. The knowledge stays with the individual who encountered it. The team’s aggregate experience doesn’t automatically become the team’s aggregate enforcement. The Correction Cycle changes this. Every violation is:- Fixed (immediate)
- Analyzed (understanding the pattern)
- Encoded (shield written)
- Deployed (enforcement applied everywhere)
What Governance Looks Like in Practice
Six months into running the full system — Router + inner loop + Global Architect- Correction Cycle — here’s what the daily operation looks like:
What Didn’t Work
A monolithic router. Our first router design tried to classify tasks in a single pass — read the task, output a tier. For ambiguous tasks (the 5% that genuinely require judgment), the router was wrong too often. We added a confidence score: if confidence falls below threshold, the task is flagged for human routing before proceeding. Correction Cycle without human shield review. Early on, we let the Correction Cycle deploy new shields automatically after analysis. Two of the first five shields had false-positive patterns — they blocked valid code. We added a mandatory human review step before shield deployment. The review takes two minutes and prevents the kind of feedback loop failure where the cure is worse than the disease. Treating all violations equally. Some violations are Class A — critical, block-immediately — and some are Class B — advisory, worth flagging but not blocking. We initially ran everything as Class A. It caused friction on edge cases where the violation class had legitimate exceptions. Tiering the shields resolved this: Class A exits with code 1, Class B exits with 0 but writes to stderr. The agent is informed but not halted.AI Collaboration in This Episode
The Intelligence Router is itself an agent — a lightweight classifier that reads task specifications and outputs routing decisions. A smaller, faster model works well for this role. The classification task doesn’t require deep reasoning; it requires consistent application of a routing matrix against a structured input. The irony of the self-healing loop is that agents participate in their own governance improvement. When the Correction Cycle produces a new shield, we describe the failure pattern to Claude and ask it to write the detection logic. It produces more comprehensive coverage than our manual sketches — catching edge cases in the pattern we hadn’t anticipated. The agent writes the enforcement that governs itself. We review what it wrote. This division of labor is stable and efficient: the model is better at exhaustive pattern coverage, humans are better at judging where to draw the boundary.The Principle Behind the System
We set out to build a development organization where AI handles implementation and humans handle direction. What we built is closer to: AI handles execution, AI handles enforcement, and humans handle judgment at boundary conditions. The boundary conditions are the interesting part. They’re where the system’s rules are ambiguous, where context is genuinely hard to encode mechanically, where a wrong call cascades into something expensive. Those are the moments that require a human who understands the intent behind the rules, not just the rules themselves. Everything else — the execution, the verification, the enforcement, the correction — runs autonomously at a pace and consistency no human team could match. Governance isn’t the bottleneck anymore. It’s the engine.What’s Next
The loop executes, verifies, enforces, and corrects. It doesn’t get tired, and it doesn’t forget. But it still only does what it’s told. The remaining question — and it’s a harder one than any of the engineering problems we’ve solved — is whether the direction itself can be shaped by what the loop learns. Whether the human-authored strategy at the top can start to feel pressure from the autonomous execution at the bottom. Whether leadership changes when the execution layer stops being a constraint. That’s Episode 10: how the loop changed what it means to lead a development organization.Episode 10: Leadership Evolution
How the loop changed what it means to lead a development organization.
Series Overview
The full arc from loop to organization.
Earlier in the Series
- Episode 1: The Orchestration Problem — building the autonomous loop from scratch
- Episode 3: Blast Radius — Tree-sitter + KuzuDB impact graph for safe refactoring
- Episode 4: Baseline Drift — when the agent reports current state but humans carry expected state
- Episode 6: Dog-Fooding — what building customer zero revealed about the loop’s real quality
All content represents personal learning from personal and side projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.