When AI Writes the Code, Standards Can't Live in Your Head

Executive Summary

AI coding agents do not retain corrections across session boundaries. Architectural standards encoded in system prompts degrade in influence as context window depth increases and task complexity rises. This analysis documents a structural failure mode — institutional knowledge placed in documentation rather than enforcement — and presents a remediation architecture based on Claude Code’s hook system. The evidence suggests that the distinction between advisory rules and enforced constraints is the central variable determining whether AI-assisted development produces consistent, standards-compliant output. Organizations deploying autonomous coding agents should treat this distinction as a first-order architectural concern.

Key Findings

System prompt rules are advisory; hooks are constraints. The behavioral difference between a rule an agent may ignore and a hook that blocks execution is not a matter of degree — it is categorical.
AI agents exhibit no cross-session memory of corrections. A rule added to a system prompt after a violation does not prevent recurrence in subsequent sessions operating under different task contexts.
Long system prompts degrade in enforcement effectiveness. As context depth increases, rules established at prompt initialization receive diminishing attention weight during deep implementation tasks.
Every manual correction is a candidate for automated enforcement. The correction-to-hook pattern — log the correction, build the hook, retire the manual review — converts institutional knowledge into durable machine-executable constraints.
Hook coverage is bounded by prior violation history. Enforcement layers built from observed failures provide no protection against novel architectural errors in previously unencountered domains.

1. AI Agents Lack Cross-Session Memory, Making System Prompt Rules Structurally Unreliable as the Sole Mechanism for Enforcing Architectural Standards

Senior engineers carry accumulated architectural judgment — not syntax, but constraint knowledge acquired from production incidents and peer review. Never use floats for money. Every database query must scope by tenant. Domain layer never imports from infrastructure. This knowledge becomes reflexive precisely because the engineer retains it: each correction refines a persistent mental model that generalizes across contexts. AI agents operate under a fundamentally different memory architecture. Each session begins without access to prior corrections. There is no accumulated model, no “this was corrected last week.” The system prompt is the complete context — and its influence on agent behavior is not uniform across a session. The evidence for prompt degradation is direct: an agent instructed, via system prompt, never to use f32 or f64 for financial fields complied in the session immediately following the correction. One week later, operating on a different file under a different task, the same agent introduced a f64 on a field named amount. The rule was present in the prompt. The violation recurred regardless. This is not a failure of instruction quality. It is a failure of instruction placement. The rule lived where it could be deprioritized. The following architecture places rules where they cannot be.

The failure mode described here — an agent spiraling through self-generated corrections without a persistent enforcement layer — is analyzed in depth in When AI Fails in Cascading Errors. The hook architecture presented in this paper is one structural mitigation for that failure class.

2. Hook-Based Enforcement Removes Memory From the Equation by Blocking Non-Compliant Writes Before They Reach the Filesystem

The Rust newtype pattern illustrates the underlying principle: encode domain constraints into the type system rather than documentation. A UserId and a TenantId are not interchangeable not because documentation says so, but because the compiler enforces the distinction. Memory is removed from the equation entirely. Claude Code’s hook system applies the same principle at the file write boundary. Hooks are small scripts configured to run before or after tool use. When configured as PreToolUse hooks, they execute before the agent writes a file. If the hook exits with a non-zero status, the write is blocked and the agent receives the error output as feedback. The agent then corrects course within the same session, without human intervention. The following table characterizes the behavioral difference between the two approaches:

Dimension	System Prompt Rules	Hook Enforcement
Persistence across sessions	None — no cross-session memory	Full — hooks are filesystem artifacts
Degradation under complexity	High — rules receive less attention weight as task depth increases	None — hooks execute unconditionally on write
Agent circumvention	Possible — agent may proceed if rule is not pattern-matched	Not possible — hook blocks execution before write completes
Enforcement mechanism	Attention weight on rule text	Process exit code
Failure mode	Silent non-compliance	Explicit block with error message

The hook system does not modify the agent’s internal model. It modifies the environment the agent operates in. The distinction is significant: the agent is not taught the rule; it encounters a constraint that makes the violation impossible. This is analogous to removing the hot element from a stove rather than posting a warning sign.

3. Three Enforcement Patterns — Financial Type Safety, Multi-Tenant Query Scoping, and Architectural Boundary Isolation — Address the Highest-Consequence Violation Classes

3.1 Float Types on Financial Fields Are a Correctness Risk With Billing and Audit Implications — a PreToolUse Hook Blocks the Violation Before the File Is Written

Financial precision loss from floating-point representation is a well-documented failure mode. In multi-tenant SaaS systems, a f64 on a field named amount is not a style violation — it is a correctness risk with downstream billing and audit implications. The following hook enforces the constraint at write time, before the file enters the codebase. The hook examines the content the agent is about to write and blocks any file where a float type is assigned to a money-related field:

# PreToolUse hook — runs before every file write
# If the file uses f32/f64 on a money-related field, block it.

if echo "$CONTENT" | grep -qiE '(amount|price|balance|fee|revenue)\s*:\s*(f32|f64)'; then
  echo "HOOK BLOCKED: f32/f64 on financial field — use Decimal or a typed newtype" >&2
  exit 1
fi

The error message is surfaced directly to the agent, which receives it as environmental feedback. The agent understands the constraint, selects a compliant type, and continues. No human intervention is required. The violation class does not recur within the session.

3.2 A Database Query Without Tenant Scoping Is a Data Isolation Failure — Hook Enforcement Blocks Any Repository File That Does Not Reference a Tenant Identifier

In multi-tenant architectures, a database query that omits tenant scoping is a data isolation failure with legal exposure. The constraint — every query must reference tenant_id — is non-negotiable and not amenable to probabilistic enforcement through review. The following hook checks for the presence of the scoping concept in any file containing database operations:

# If the file is a repository/query file and runs database operations
# without referencing tenant_id anywhere — block it.

if ! echo "$CONTENT" | grep -qi 'tenant_id\|tenant\.id\|TenantId'; then
  echo "HOOK BLOCKED: database queries in $(basename $FILE_PATH) without tenant_id" >&2
  echo "All queries in multi-tenant repos must scope by tenant" >&2
  exit 1
fi

This hook does not verify query logic. It verifies the presence of the constraint concept. A query file with no reference to tenant_id anywhere in its content has not addressed the isolation requirement. The hook blocks the write and requires the agent to produce content that demonstrates the constraint has been considered.

3.3 Domain Layer Imports From Infrastructure Are Blocked at Write Time, Enforcing Clean Architecture Boundaries Independent of Agent Internalization

Domain-Driven Design requires that the domain layer remain free of infrastructure dependencies. This separation is the condition under which the domain is independently testable and the infrastructure is independently replaceable. An AI agent encountering a domain struct that requires a database call will, absent enforcement, resolve the dependency by importing from infrastructure — the path of least resistance to compilation. The following hook enforces the dependency boundary by blocking any domain layer file that imports from infrastructure or API packages:

# If the file is in the domain layer and imports from infrastructure or API — block it.

if echo "$FILE_PATH" | grep -qiE '/domain/'; then
  ILLEGAL=$(echo "$CONTENT" | grep -nE '^use (crate::|super::)*(infrastructure|infra|api)::')
  if [[ -n "$ILLEGAL" ]]; then
    echo "HOOK BLOCKED: Domain layer imports infrastructure — domain must be dependency-free" >&2
    exit 1
  fi
fi

The agent cannot shortcut past the architectural boundary. The rule is enforced before the file is written, independent of whether the agent has internalized the design rationale.

The relationship between this enforcement layer and multi-agent orchestration is examined in Multi-Agent Workflow. Hooks operate as a cross-cutting enforcement layer across Evaluator, Builder, and Verifier agents — the constraint applies regardless of which agent role is writing the file.

4. Every Manual Correction Is a Candidate for Automated Enforcement — Logging Corrections at Occurrence Converts Institutional Knowledge Into Durable Machine-Executable Constraints

The three patterns above address known violation classes. A fourth pattern addresses the mechanism by which new violation classes are identified and converted into durable constraints. The correction feedback loop operates on a simple observation: every manual correction represents a failure of existing enforcement. If a correction is logged at the time it occurs, the correction log becomes a structured record of gaps in the enforcement layer. That record drives hook development. The logging mechanism is minimal — a timestamped append to a session log file:

# Called explicitly when a correction occurs
# Usage: log-correction.sh "description of what was corrected"

LOGFILE="/tmp/corrections-$(date +%Y%m%d).log"
TIMESTAMP=$(date -u '+%Y-%m-%dT%H:%MZ')
echo "[$TIMESTAMP] $*" >> "$LOGFILE"

The value of this mechanism is not in the log itself but in the discipline it enforces on correction handling. A correction that is logged is a correction that is available for analysis. A session end hook that surfaces the correction log as a notification ensures that each session’s violations are reviewed before the next session begins. Implementation experience demonstrates a consistent pattern: a violation is caught manually, logged, and converted into an automated hook. The violation class does not recur. The float rule, the tenant scoping rule, and the architectural boundary rule all originated as manual corrections. The correction-to-hook cycle is the mechanism by which institutional knowledge is translated from human memory into machine-executable constraints.

5. Hook Coverage Is Bounded by Prior Violation History — Novel Failure Modes in Unfamiliar Domains Remain Outside the Enforcement Envelope Until Experience Surfaces Them

An enforcement layer built from prior violation history provides no coverage for violation classes that have not yet been observed. This is a structural limitation, not an implementation deficiency. Hooks encode the constraints that experience has identified as necessary. They do not anticipate the architectural errors that experience has not yet surfaced. This limitation has a corollary: the quality of the enforcement layer is a function of the depth of experience behind it. Organizations with extensive prior incident history in a given domain will produce more comprehensive hooks. Organizations entering a new technical domain will have minimal hook coverage for that domain’s failure modes.

Novel failure modes — architectural errors in domains not previously encountered, security vulnerabilities in unfamiliar dependency trees, correctness issues in new business logic domains — remain outside the enforcement envelope. Human judgment from engineers with domain-specific experience is not substitutable by automated hooks for these cases. The AI Limitations Boundary analysis catalogs seven categories where AI agents fail consistently; hook-based enforcement mitigates several but does not close all identified gaps.

The hooks-plus-human-judgment model produces a constructive asymmetry: as the enforcement layer matures, the volume of violations requiring human review decreases. This reduction allows engineering attention to concentrate on the decisions that require genuine architectural judgment — the decisions that are, by definition, not yet encodable as hooks.

6. Recommendations

Classify all engineering standards as either advisory or non-negotiable. Advisory standards belong in documentation and code review checklists. Non-negotiable standards — those whose violation would cause a data leak, financial error, or architectural failure — belong in automated enforcement. Mixing the two categories degrades both.
Implement PreToolUse hooks for every identified non-negotiable constraint. Begin with the highest-consequence violation classes: financial type safety, data isolation boundaries, and architectural layer constraints. Each hook should produce an actionable error message that the agent can act on without human clarification.
Establish a correction logging protocol for all AI-assisted development sessions. Every manual correction should be logged at the time it occurs, with sufficient description to drive hook development. Session logs should be reviewed before subsequent sessions begin.
Convert every logged correction into a hook within 48 hours. A correction that has not been converted into an enforcement rule remains a recurring risk. The correction-to-hook cycle should be treated as a mandatory step in the development workflow, not an optional improvement.
Audit the enforcement layer against new technical domains before beginning work in them. When the development scope expands into an unfamiliar domain — a new data store, a new infrastructure layer, a new compliance requirement — assess the hook coverage proactively. Gaps identified before work begins are less costly than violations identified in review.

Conclusion

The distinction between advisory rules and enforced constraints is the central architectural variable in AI-assisted development workflows. System prompts that encode institutional knowledge are subject to the same degradation mechanisms as any documentation artifact: they are read but not retained, understood in isolation but not applied under complexity. Hook-based enforcement removes the memory requirement from the equation entirely. As autonomous AI development expands into higher-consequence domains — financial systems, multi-tenant data architectures, regulated industries — the organizations that have encoded their non-negotiable constraints as machine-executable enforcement will operate at a structural advantage. The enforcement layer is not a substitute for engineering judgment. It is the mechanism by which engineering judgment is made durable, session-independent, and scalable across autonomous agents operating in parallel. The patterns documented here represent current practice; as the tooling matures and violation taxonomies deepen, hook-based enforcement will become a baseline expectation of AI-assisted development rather than an advanced practice.

All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.

Overview

Practical Guides

Insights & Debate

When AI Writes the Code, Standards Can't Live in Your Head

Executive Summary

Key Findings

1. AI Agents Lack Cross-Session Memory, Making System Prompt Rules Structurally Unreliable as the Sole Mechanism for Enforcing Architectural Standards

2. Hook-Based Enforcement Removes Memory From the Equation by Blocking Non-Compliant Writes Before They Reach the Filesystem

3. Three Enforcement Patterns — Financial Type Safety, Multi-Tenant Query Scoping, and Architectural Boundary Isolation — Address the Highest-Consequence Violation Classes

3.1 Float Types on Financial Fields Are a Correctness Risk With Billing and Audit Implications — a PreToolUse Hook Blocks the Violation Before the File Is Written

3.2 A Database Query Without Tenant Scoping Is a Data Isolation Failure — Hook Enforcement Blocks Any Repository File That Does Not Reference a Tenant Identifier

3.3 Domain Layer Imports From Infrastructure Are Blocked at Write Time, Enforcing Clean Architecture Boundaries Independent of Agent Internalization

4. Every Manual Correction Is a Candidate for Automated Enforcement — Logging Corrections at Occurrence Converts Institutional Knowledge Into Durable Machine-Executable Constraints

5. Hook Coverage Is Bounded by Prior Violation History — Novel Failure Modes in Unfamiliar Domains Remain Outside the Enforcement Envelope Until Experience Surfaces Them

6. Recommendations

Conclusion

Overview

Practical Guides

Insights & Debate

Documentation Index

​Executive Summary

​Key Findings

​1. AI Agents Lack Cross-Session Memory, Making System Prompt Rules Structurally Unreliable as the Sole Mechanism for Enforcing Architectural Standards

​2. Hook-Based Enforcement Removes Memory From the Equation by Blocking Non-Compliant Writes Before They Reach the Filesystem

​3. Three Enforcement Patterns — Financial Type Safety, Multi-Tenant Query Scoping, and Architectural Boundary Isolation — Address the Highest-Consequence Violation Classes

​3.1 Float Types on Financial Fields Are a Correctness Risk With Billing and Audit Implications — a PreToolUse Hook Blocks the Violation Before the File Is Written

​3.2 A Database Query Without Tenant Scoping Is a Data Isolation Failure — Hook Enforcement Blocks Any Repository File That Does Not Reference a Tenant Identifier

​3.3 Domain Layer Imports From Infrastructure Are Blocked at Write Time, Enforcing Clean Architecture Boundaries Independent of Agent Internalization

​4. Every Manual Correction Is a Candidate for Automated Enforcement — Logging Corrections at Occurrence Converts Institutional Knowledge Into Durable Machine-Executable Constraints

​5. Hook Coverage Is Bounded by Prior Violation History — Novel Failure Modes in Unfamiliar Domains Remain Outside the Enforcement Envelope Until Experience Surfaces Them

​6. Recommendations

​Conclusion

Executive Summary

Key Findings

1. AI Agents Lack Cross-Session Memory, Making System Prompt Rules Structurally Unreliable as the Sole Mechanism for Enforcing Architectural Standards

2. Hook-Based Enforcement Removes Memory From the Equation by Blocking Non-Compliant Writes Before They Reach the Filesystem

3. Three Enforcement Patterns — Financial Type Safety, Multi-Tenant Query Scoping, and Architectural Boundary Isolation — Address the Highest-Consequence Violation Classes

3.1 Float Types on Financial Fields Are a Correctness Risk With Billing and Audit Implications — a PreToolUse Hook Blocks the Violation Before the File Is Written

3.2 A Database Query Without Tenant Scoping Is a Data Isolation Failure — Hook Enforcement Blocks Any Repository File That Does Not Reference a Tenant Identifier

3.3 Domain Layer Imports From Infrastructure Are Blocked at Write Time, Enforcing Clean Architecture Boundaries Independent of Agent Internalization

4. Every Manual Correction Is a Candidate for Automated Enforcement — Logging Corrections at Occurrence Converts Institutional Knowledge Into Durable Machine-Executable Constraints

5. Hook Coverage Is Bounded by Prior Violation History — Novel Failure Modes in Unfamiliar Domains Remain Outside the Enforcement Envelope Until Experience Surfaces Them

6. Recommendations

Conclusion