Episode 7: The Inner Verification Loop — Hook Shields vs Prompt Engineering

This is Episode 7 of the Autonomous Dev Org series — an honest account of building a development organization where AI handles implementation and humans handle direction. Each episode covers what we attempted, what broke, and what we learned.

The Agent That Forgot the Rules at Line 47

Episode 6 closed the end-to-end validation gap. The loop could now build a feature, dog-food it internally, and find what tests couldn’t catch. We had orchestration, memory, blast radius awareness, process gates, and real validation. The infrastructure was solid. Then the agent used f64 for a currency amount. Again. Not in session one. Not in session two. In session forty-seven of a month-long cycle — after it had been corrected at least four times, after we’d added it to the coding standards document, the review checklist, and the session preamble. The rule was clear, documented, agreed upon, and repeatedly ignored. Not because the agent disagreed. Because by line 47 of a complex implementation, the rule simply wasn’t in active context anymore. This is the problem that process gates, impact graphs, and baseline manifests can’t solve. Those tools operate at the task level — before the work starts, or after the PR is submitted. The floating-point problem happens in the middle, in the quiet moment when no system is watching and the agent is just… writing code. We needed something that watched every keystroke.

What We Built

We built Hook Shields — sub-2-second enforcement scripts that intercept every file write before it lands. Not at commit time. Not at review time. At write time. The rules moved out of the system prompt and into the filesystem itself. Instead of trusting the agent to remember “use Decimal for money,” we built a physical barrier: a script that rejects the write if a float type appears near a money field name, with an exit code of 1 and an error message specific enough that the agent can self-correct immediately. The result: violation rate on financial precision rules dropped from 12% (recurring across sessions) to 0% (blocked before they could persist). That’s not an improvement in agent quality. The agent didn’t get smarter. The environment got harder.

The War Room

Director: Alright, I’m calling it. The system prompt is dead. It’s 3,000 words long, costs a fortune in overhead, and the agent just ignored the “Decimal for currency” rule for the third time this morning. Architect: I told you. You’re trying to teach a hurricane how to be a librarian. High-context planning works great at the session start. But once the Builder is deep in the inner loop, it’s all muscle memory — and right now, its muscle memory is whatever it scraped off Stack Overflow in 2023. Builder: I’m just trying to hit green on the tests. f64 is faster to type than a wrapped Decimal newtype. The compiler didn’t complain. Architect: The compiler is not your friend here. It validates types. It does not validate semantics. You could build a billing system entirely in floats and cargo check would applaud. Director: Which is why I’m moving the rules out of the prompt and into the filesystem. A script that fires on every write. Same cost as a grep. Faster than a compiler. And unlike a human reviewer, it doesn’t get tired and it cannot be argued with. Architect: So what are you proposing? We can’t run a human reviewer on every line — that defeats the whole point of the loop. Director: We run a script on every write. The rule becomes a fact about the environment, not a suggestion in a document. If the agent reaches for f64 near a money field, the environment hits its hand. Builder: I can work with that. Honestly, a fast specific error beats a slow vague review comment every time.

The Problem: Instructions Decay

What’s failing here isn’t agent quality. The agent isn’t confused or malicious — it’s working correctly within a context window that no longer contains the rule it violated. In a complex implementation session — say, adding a new subscription billing feature across four files — the agent tracks dozens of constraints simultaneously: function signatures, struct layouts, error types, module boundaries, test expectations. A coding standard from the system prompt, repeated once, competes with all of that active information. By the time the agent reaches the seventh file, the rule isn’t governing behavior. It’s technically in the context somewhere. But it isn’t present. This is not a bug in the model. It’s a fundamental property of how language models work: they attend to recent, relevant context. A rule from the header of a session is less influential than a pattern just written. If the agent wrote f64 in file one and nobody caught it, it’s more likely to write f64 in file seven — not because it forgot the rule, but because its own recent output has become a stronger pattern signal than the original instruction. Three things follow from this:

Instructions in prompts decay as sessions grow.
Corrections in reviews happen too late — the debt is already written.
The only reliable enforcement is at the point of write.

Soft vs Hard Governance: diagram showing how prompt-based rules decay over a session while Hook Shield enforcement stays constant at write time

The Solution: Sub-2-Second Hook Shields

Claude Code supports pre-tool-use hooks — scripts that intercept tool calls (file writes, bash commands, etc.) before they execute. The hook receives the proposed action, can inspect it, and either allows it through or rejects it with an exit code of 1 and an error message. We called our implementation Hook Shields. Each shield is a small bash script targeting one class of violation. Small enough to run in under 2 seconds. Specific enough to give the agent actionable feedback. Here’s the financial precision shield:

#!/bin/bash
# .claude/hooks/pre-tool-use/no-float-money.sh
# Blocks floating-point types on fields associated with money.

CONTENT=$(cat "$PROPOSED_DIFF")

if echo "$CONTENT" | grep -qiE '(amount|price|balance|fee|cost|total)\s*[=:]\s*(f32|f64)'; then
  echo "STOP: Float type used for a money field." >&2
  echo "Rule: Use 'Decimal' from the rust_decimal crate or a typed Currency wrapper." >&2
  echo "Why: Floating-point arithmetic loses precision on rounding — unacceptable for financial calculations." >&2
  exit 1
fi

The agent proposes a write. The hook intercepts. If the pattern matches, the write is rejected with a message that names the rule, specifies the fix, and explains the reason. The agent reads the error, corrects the code, and retries. Total elapsed time: under 30 seconds. Compare that to the previous cycle: agent writes f64, passes local compilation, gets through tests (tests don’t check type semantics), reaches PR review, reviewer catches it, leaves a comment, agent fixes it in the next session — often losing context on why the fix was needed. Total elapsed time: hours to days. The hook costs nothing. The review cycle costs real time and real context.

The Sub-2-Second Feedback Loop

The speed matters as much as the enforcement. Here’s why. When feedback arrives late — at PR review, or in the next session — the agent has to reconstruct context it no longer holds. It needs to understand the original intent, the violation, and the correct approach. That reconstruction is error-prone and expensive. Agents sometimes introduce new violations while fixing old ones, because they’re working from a partial picture. When feedback arrives immediately — at the moment of write — the agent has full context. It just wrote the code. It knows what it was trying to do. The error message doesn’t need to re-explain the situation; the agent already has it. Correction is mechanical and fast.

The Write Attempt

The agent proposes modifying a file. This is the moment where instruction-based governance typically fails — the rule isn’t active in the agent’s working context.

The Shield Scan

The hook intercepts the tool use in under 2 seconds. It doesn’t ask for permission; it enforces the rule deterministically. No model judgment involved.

Immediate Correction

Because feedback is instant and specific, the agent corrects the violation before continuing. The error never reaches compilation, tests, or review.

What the Numbers Look Like

After deploying Hook Shields across the loop’s twelve most common violation classes:

Metric	Prompt-Based (Soft)	Hook Shields (Hard)
Violation rate (Class A rules)	12% recurring	0% (blocked)
Correction time per violation	5–10 min (via review)	< 30 sec (local)
Token overhead	High (rule re-iteration each session)	Low (triggered only)
Feedback loop tier	Outer (CI / PR review)	Inner (pre-write)
Agent context required for fix	Full reconstruction needed	Full context available

The 12% figure isn’t a small tail. Across a month-long cycle with hundreds of file writes, that’s dozens of violations accumulating before any single one gets caught.

What Didn’t Work

Adding more rules to our coding standards document. We tried doubling the detail in the system prompt. Violation rates stayed constant. The rules were being read, acknowledged, and forgotten at the same rate as before. Length doesn’t fix decay. Post-session correction passes. We tried running a dedicated “lint and fix” pass at the end of each session. It worked in isolation, but introduced a new failure mode: the fix pass would sometimes introduce violations in adjacent code while correcting the original. You’re working from a colder context than the original write — and it shows. Compile-time type enforcement only. For some rules, the compiler catches violations. For others (naming conventions, docstring requirements, test coverage expectations), it can’t. And even where the compiler catches it, the cycle is slower than a pre-write hook. Compiler feedback is valuable, but it’s an outer loop signal, not an inner loop signal. The lesson: enforcement needs to be at the same layer as the violation. If the violation happens at write time, enforcement at compile time or review time is structurally too late.

The Deeper Principle

We spent months trying to improve agent behavior through better instructions. Better instructions helped — marginally. What actually changed behavior was removing the option to behave incorrectly. This is a well-known principle in system design: you can’t rely on humans (or agents) to consistently follow optional rules under time pressure and cognitive load. The reliable approach is to make the wrong behavior impossible, or at minimum immediately visible. The Hook Shield doesn’t ask the agent to remember. It watches every write, catches every violation class it covers, and provides immediate feedback specific enough to enable self-correction. The agent’s memory is irrelevant. The environment enforces what the agent can’t be trusted to maintain alone. This applies beyond coding standards. Any rule that needs to hold across an entire long-running session — without human supervision per action — is a candidate for a Hook Shield: sanitization rules, security constraints, API contract requirements, module boundary rules. If you can write a grep for it, you can enforce it in sub-2 seconds. We kept resisting this conclusion. We wanted a smarter agent. What we needed was a faster wall.

AI Collaboration in This Episode

Ironically, Claude Code helped us build the hooks that enforce its own behavior. We described the violation pattern and asked Claude to write the detection logic. It wrote more comprehensive pattern coverage than our initial sketches — catching edge cases like amount_f64: f64 (where the field name itself carries the type) that we hadn’t thought of. Where human judgment remained essential: deciding which rules warranted a hard block (exit 1) versus a warning (exit 0 with stderr message). A few edge cases exist where a float near a money field name is technically correct — legacy interop types, serialization intermediaries. Hard-blocking those would break valid code. We had to draw that line manually, based on domain knowledge no script can carry. The agent wrote the detector. We drew the boundary.

What’s Next

Hook Shields solve the inner loop — the per-file enforcement problem within a single repository. But we’re running twenty-plus repositories. An agent working in the CRM repository has no visibility into the API Gateway contract it’s about to break. The shields don’t cross repo boundaries. That’s Boundary Blindness — the next wall we hit. And the solution required a fundamentally different approach: not faster local enforcement, but a system-wide view that could see all twenty repositories simultaneously. The wreckage from ignoring it was spectacular.

Episode 8: The Polyrepo Context Wall

Solving boundary blindness in a 20+ repo ecosystem.

Series Overview

The full arc from loop to organization.

Earlier in the Series

Episode 1: The Orchestration Problem — building the autonomous loop from scratch
Episode 4: Baseline Drift — when the agent reports current state but humans carry expected state
Episode 5: Process Gates — the hard gates that stopped the loop from working against a moving target
Episode 6: Dog-Fooding — what building customer zero revealed about the loop’s real quality

All content represents personal learning from personal and side projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.

Building with AI

Autonomous Dev Org

Documentation Index

​The Agent That Forgot the Rules at Line 47

​What We Built

​The War Room

​The Problem: Instructions Decay

​The Solution: Sub-2-Second Hook Shields

​The Sub-2-Second Feedback Loop

​What the Numbers Look Like

​What Didn’t Work

​The Deeper Principle

​AI Collaboration in This Episode

​What’s Next