Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.aidonow.com/llms.txt

Use this file to discover all available pages before exploring further.

Executive Summary

The Plan → Implement → Verify pattern is a structured three-phase AI-assisted development methodology that enforces cognitive separation between architectural planning, code generation, and quality assurance. Applied to the construction of a complete CRM domain layer—comprising seven domain models, a custom DynamoDB derive macro, a full event-sourcing integration, and a 35-route API layer—the pattern intercepted 18 defects before production deployment, including five that would have caused critical production failures. Fresh Verifier sessions operating without knowledge of Builder implementation decisions consistently identified defect classes that Builder-authored tests cannot detect: requirement gaps, cross-entity consistency violations, and architectural flaws such as non-atomic event and state writes. At an AI cost of $12.68, the pattern delivered an estimated 4.4x velocity improvement against manual development benchmarks.

Key Findings

  • Requirement-level gaps are a distinct defect class from code-level bugs. Builder sessions optimize for passing tests; Verifier sessions operating from a fresh context optimize for requirement adherence. These are structurally different evaluations requiring separate execution contexts.
  • AI models hallucinate features, not only code. Builder sessions will infer and implement functionality that seems logically consistent with specified requirements but is not actually required. Independent verification is the primary mechanism for detecting and preventing scope creep of this kind.
  • Fresh Verifier sessions identify 100 percent more defects than reused sessions. In controlled comparison, reusing a Builder session for subsequent verification produced zero defect detections for a codebase that contained four confirmed defects.
  • Cross-entity consistency verification requires explicit prompting. Individual entity verification does not automatically detect foreign key type mismatches, naming convention inconsistencies, or protocol divergence across related entities. A dedicated cross-entity verification pass is required.
  • Atomic event-and-state write patterns must be enforced architecturally, not assumed. Builder sessions without explicit guidance on event-sourcing atomicity will produce non-atomic implementations that appear correct under test but create state divergence under partial failure conditions.

1. A Seven-Model CRM Domain Layer Provided Sufficient Complexity to Surface Structural Limitations of Single-Session AI Development

The scope of the implementation under analysis was the complete initialization of a CRM module for a multi-tenant platform:
  • 7 core domain models: Account, Contact, Lead, Opportunity, Activity, Product, Address
  • Event-sourcing integration via EventStore trait
  • DynamoDB single-table design with custom derive macro
  • Financial configuration system with tenant-scoped settings
  • ISO reference data (ISO 3166 countries, ISO 4217 currencies)
  • Full API layer with 35 routes and permission-based access control
  • Integration tests against a local DynamoDB emulator
The implementation was subdivided into seven sequential sub-tasks, each following the complete Plan → Implement → Verify cycle before proceeding to the next.

2. Cognitive Separation Across Three Distinct Sessions — Evaluator, Builder, and Verifier — Is the Structural Mechanism That Makes Independent Review Possible

2.1 The Evaluator Session Produces a Human-Approved Architecture Document That Governs All Subsequent Implementation Decisions

The planning phase is conducted in an Evaluator session using a reasoning-optimized model. The Evaluator explores the existing codebase to understand established patterns, then produces a written plan document that is approved by a human engineer before any implementation begins. Reference prompt:
Planning session for CRM crate initialization.

Context:
- Multi-tenant SaaS platform
- Need CRM domain models (Account, Contact, Lead, Opportunity, etc.)
- Must integrate with existing event sourcing infrastructure
- DynamoDB single-table design

Explore the codebase to understand:
1. Existing domain model patterns
2. Event sourcing conventions
3. DynamoDB entity patterns
4. Repository trait patterns

Then design the CRM crate architecture with:
- Domain model structure
- Event integration strategy
- DynamoDB schema design
- Testing approach
For this implementation, the Evaluator produced a 12-page plan document containing the following architecture decisions: Decision 1: Single-Table DynamoDB Design
  • One table for all CRM entities
  • Partition key pattern: TENANT#{tenant_id}#ACCOUNT#{account_id}
  • GSI patterns for cross-entity queries
Decision 2: Event-Sourcing Integration
  • Each domain model emits events via the EventStore trait
  • Event naming convention: {Entity}{Action} (e.g., AccountCreated)
  • Repository pattern wraps both state storage and event publishing
Decision 3: Macro-Driven DynamoDB Entities
  • Custom #[derive(DynamoDbEntity)] macro
  • Auto-generates partition key, sort key, and GSI attributes
  • Eliminates boilerplate across all seven domain models
Decision 4: Four-Level Testing
  • L1: Domain model validation (unit tests)
  • L2: Repository CRUD operations (integration tests against local emulator)
  • L3: Event publishing flow (EventBridge to SQS verification)
  • L4: End-to-end CRM workflows
Human approval: Proceed with the plan; begin with the DynamoDB macro.

2.2 The Builder Session Executes Against the Approved Plan in Isolation, With No Knowledge of Prior or Subsequent Verification Activity

The Builder session uses a throughput-optimized model to implement against the approved plan. Each of the seven sub-tasks was issued as a separate Builder session with a specific plan reference. Reference prompt for DynamoDB macro sub-task:
Implement DynamoDbEntity derive macro per plan:
.plans/275-crm-initialization.md - Section 3.2

Requirements:
1. Parse struct fields to identify PK, SK, GSI keys
2. Generate impl blocks for DynamoDB attribute conversion
3. Support #[pii] attribute for sensitive fields
4. Follow existing macro patterns from the api-macros crate

Create comprehensive tests showing all attribute combinations.
Builder deliverables for this sub-task:
  • Created the dynamodb-derive crate
  • Implemented the proc macro using syn and quote
  • Generated 15 unit tests covering attribute pattern combinations
  • All tests passing on local execution

2.3 The Verifier Session, Initialized Without Builder Context, Evaluates Implementation Against Requirements Rather Than Against the Builder’s Own Tests

The Verifier session is initialized without any prior context from the Builder. This session independently reads the original requirements and the plan, then assesses the implementation against both. Reference prompt:
Verify DynamoDB macro implementation for issue #284.

Read:
- Plan: .plans/275-crm-initialization.md - Section 3.2
- Implementation: git diff main...HEAD
- Tests: dynamodb-derive/tests/

Check:
- Does macro handle all PK/SK/GSI patterns from plan?
- Are #[pii] attributes properly propagated?
- Edge cases: optional fields, nested structs, enums?
- Test coverage adequate?
Defects identified by the Verifier in this sub-task:
Issue 1: Missing GSI Projection TypeThe Builder implemented GSI key generation but did not specify projection type (KEYS_ONLY vs. ALL). The plan referenced projection types in a footnote that the Builder did not incorporate.Production impact if undetected: DynamoDB queries would fail at runtime with “projection type not specified” errors, blocking all GSI-dependent queries.
Issue 2: PII Attribute Not Enforced on Type ConstraintsThe macro permitted application of the #[pii] attribute to non-String fields (for example, #[pii] amount: i64). Tests only covered valid usage patterns; no negative tests existed.Production impact if undetected: Silent failures in PII encryption, which operates on String values. Numeric fields would be written to the database without encryption, resulting in unencrypted sensitive data.
Both defects were remediated. Re-verification passed. The sub-task was merged.

3. The Most Consequential Defect Class — Event-Sourcing Atomicity Violation — Is Undetectable by Builder-Authored Tests and Requires Independent Architectural Review

The most consequential defect detected during this engagement occurred during the event-sourcing integration sub-task and illustrates the value of independent verification most clearly.

3.1 A State-First Write Order Creates Irrecoverable Divergence Between the Database and the Event Log Under Partial Failure

The Builder produced the following pattern for account creation:
// ❌ WRONG - not atomic
pub async fn create_account(&self, account: Account) -> Result<()> {
    // Save to DynamoDB
    self.dynamodb_repo.save(&account).await?;

    // Emit event (might fail after save succeeds)
    self.event_store.append(AccountCreated { ... }).await?;

    Ok(())
}
This implementation is consistent with naive event-sourcing integration: save state first, then emit an event. The implementation compiled, and all Builder-authored tests passed.

3.2 The Verifier, Reading Requirements Independently, Identified the Partial Failure Scenario That Builder Tests Cannot Reach

The Verifier, reading requirements independently, raised the following finding:
“What happens if DynamoDB save succeeds but EventStore append fails? You now have state divergence between the database and the event log.”
This is a correct architectural observation. DynamoDB and the EventStore are separate systems. A failure between the two write operations leaves the account in the database without an audit trail in the event log, violating the event-sourcing guarantee that the event log is the source of truth.

3.3 The Two-Phase Commit Pattern — Event Append First, State Persist Second With Rollback — Became the Standard Across All Seven Domain Models

A planning session with the Evaluator produced the two-phase commit pattern that became the standard across all seven domain models:
// ✅ CORRECT - atomic with rollback
pub async fn create_account(&self, account: Account) -> Result<()> {
    // Phase 1: Append event first (source of truth)
    let event_id = self.event_store.append(AccountCreated {
        account_id: account.id,
        ...
    }).await?;

    // Phase 2: Save to DynamoDB with event reference
    let mut entity = AccountEntity::from(account);
    entity.last_event_id = event_id;

    match self.dynamodb_repo.save(&entity).await {
        Ok(_) => Ok(()),
        Err(e) => {
            // Rollback: mark event as failed
            self.event_store.mark_failed(event_id).await?;
            Err(e)
        }
    }
}
Builder-authored tests cannot detect this class of defect. Unit and integration tests exercise the happy path; they do not test the behavioral contract between two separate storage systems under partial failure. Independent architectural review through a fresh Verifier session is the only mechanism that consistently identifies this pattern.

4. Builder Tests and Verifier Findings Address Non-Overlapping Defect Classes — They Are Structurally Different Evaluations, Not Redundant Ones

A consistent pattern emerged across all seven sub-tasks in this implementation.
Defect ClassDetected By Builder TestsDetected By Fresh Verifier
Compilation errors and type mismatchesYesNot applicable
Happy-path functional correctnessYesRedundant
Missing negative test casesNoYes — 8 instances
Requirement gaps (plan said X, code did Y)NoYes — 6 instances
Hallucinated features not in requirementsNoYes — noted below
Cross-entity consistency violationsNoYes — 4 instances
Architectural protocol violationsNoYes — 2 instances (including atomicity)
Builder sessions optimize for making tests pass. Verifier sessions operating from a fresh context optimize for requirement adherence. These are structurally different evaluations; they are not redundant.

5. Three Defect Case Studies Illustrate the Requirement-Gap, Hallucination, and Consistency-Violation Classes That Fresh Verifier Sessions Consistently Detect

5.1 AI Requirement Hallucination: The preferred_name Field

During Contact domain model implementation, the Builder added a preferred_name: Option<String> field not present in the requirements. The Builder’s commit message cited a specific section of the product requirements document; that section contained no such reference. The Builder inferred that a system with first_name and last_name fields would benefit from a preferred name. The inference is reasonable; the implementation is out of scope. The Verifier identified this as a hallucinated feature and it was removed. A GitHub issue was created to evaluate it formally in a future iteration. Implication: AI models hallucinate features, not only code. Independent verification controls scope creep, not only technical defects.

5.2 Cross-Entity Consistency: Foreign Key Type Mismatch

Following individual verification of each entity, integration testing revealed a failure in the relationship between Opportunity and Account:
// Account
pub struct Account {
    pub id: AccountId(Uuid),  // Format: UUID v4
    ...
}

// Opportunity (WRONG)
pub struct Opportunity {
    pub account_id: String,  // Format: String "ACC-{ulid}"
    ...
}
The Opportunity entity’s foreign key reference to Account used a different ID format than the Account entity’s primary key. Each entity had been verified in isolation and passed. The cross-entity relationship was not verified as part of individual entity review. Resolution: A dedicated cross-entity verification pass was added to the standard workflow, checking foreign key type consistency, event naming patterns, repository interface consistency, and API route pattern consistency across all related entities. Cross-entity verification prompt:
After verifying individual entities, perform cross-entity validation:

1. Foreign key consistency:
   - Do foreign key types match primary key types?
   - Are ID formats consistent across relationships?

2. Event naming consistency:
   - Do all {Entity}Created events have the same structure?
   - Are event versioning patterns consistent?

3. Repository pattern consistency:
   - Do all repositories implement the same trait?
   - Are CRUD method signatures consistent?

4. API route consistency:
   - Are path patterns consistent? (/accounts/:id vs /account/:account_id)
   - Are HTTP methods consistent across entities?
   - Are permission naming patterns consistent?

Report any cross-entity inconsistencies as BLOCKING issues.

5.3 Permission Naming Inconsistency Across API Routes

During API integration, the Builder applied inconsistent permission naming patterns across entity routes:
  • Account routes: "crm.account.read" (period delimiter)
  • Contact routes: "crm:contact:read" (colon delimiter, different from Account)
  • Lead routes: "crm-lead-read" (hyphen delimiter, entirely different pattern)
The Verifier identified this as a systematic issue arising from an absence of a single authoritative source for permission naming conventions. Resolution: Type-safe permission constants were created in a shared common module:
pub const CRM_ACCOUNT_READ: &str = "crm.account.read";
pub const CRM_ACCOUNT_WRITE: &str = "crm.account.write";
// ... 43 additional constants
All 35 routes were refactored to reference these constants. The original string-based pattern produced runtime authorization failures; the constant-based pattern produces compile-time errors for any undeclared permission reference.

6. Four Principles Codified From This Engagement Govern Event Ordering, Macro Review, Type Safety, and Requirement Traceability in All Subsequent Implementations

The following principles were codified from the defects identified and resolved during this implementation: Principle 1: Events Are the Source of Truth When integrating event sourcing with a persistent store, the event must be appended before state is updated. Event append failure causes the operation to fail. State persistence failure causes event rollback. The inverse order—state first, event second—creates irrecoverable divergence under partial failure. Principle 2: Macro-Generated Code Requires Independent Review Code generated by derive macros compiles and executes but may violate conventions not encoded in the macro itself, such as DynamoDB projection type requirements. All macro-generated code must be inspected via macro expansion tooling as part of the verification process. Principle 3: Type Safety Supersedes Runtime Validation for Cross-Cutting Concerns String-based permissions, event names, and table names are defect vectors. Compile-time enforcement through typed constants eliminates the class of runtime errors caused by typos and naming inconsistencies:
// ❌ BEFORE: Runtime error if typo
#[eva_api(permission = "crm.account.raed")]  // typo!

// ✅ AFTER: Compile error if typo
#[eva_api(permission = CRM_ACCOUNT_READ)]
Principle 4: Test Coverage Metrics Require Requirement Traceability High test coverage does not establish requirement coverage. Each test must reference a specific requirement to provide evidence that the requirement is tested. Coverage metrics without traceability are necessary but insufficient quality indicators.

7. Single-Session Approaches Produce No Planning Artifacts, No Cross-Entity Verification, and No Architectural Flaw Detection — Deficiencies Absent From the Three-Phase Pattern

DimensionSingle-Session ApproachPlan → Implement → Verify
Defects reaching productionPresentZero in observed deployment
Requirement coverageVariableSystematically verified
Feature scope controlNone — AI may add unrequested featuresControlled — Verifier flags hallucinated requirements
Cross-entity consistencyNot verifiedExplicit verification pass
Architectural flaw detectionAbsentPresent via independent review
Planning artifactsNonePlan document serves as governance record
Verification costNegligible180k tokens per engagement

8. Observed Metrics: 18 Defects Intercepted, Zero Reaching Production, 92 Percent Test Coverage, and 4.4x Velocity Improvement at $12.68 AI Cost

MetricValue
Sub-tasks completed7
Defects identified in verification18 (5 critical, 8 moderate, 5 minor)
Defects reaching production0
Test coverage92% — 85 unit, 42 integration, 12 event flow, 3 E2E
Production code6,800 lines; test code 2,400 lines
Velocity vs. manual estimate4.4x
Total AI cost~$12.68 (145k Opus + 700k Sonnet tokens)
Average rework cycles per sub-task2.3

9. The Complete CRM Domain Layer — 6,800 Lines of Production Code, 142 Tests Across Four Levels — Was Delivered Through This Methodology With Zero Production Defects

The complete CRM domain layer delivered through this methodology comprised seven domain models (Account, Contact, Lead, Opportunity, Activity, Product, Address), a custom #[derive(DynamoDbEntity)] macro with PK/SK/GSI auto-generation and PII field encryption, trait-based repositories with in-memory and DynamoDB implementations, 21 domain events integrated with DynamoDB Streams and EventBridge, a 35-route API layer with typed permission constants, tenant-scoped financial configuration, and ISO 3166/4217 reference data. Total scope: 6,800 lines of production code, 2,400 lines of tests, 142 tests across four levels.

10. Recommendations

  1. Treat Verifier session isolation as a non-negotiable engineering standard. The evidence is unambiguous: reused sessions detect zero defects where fresh sessions detect multiple. This is not a cost optimization opportunity—it is the mechanism that makes independent verification meaningful.
  2. Require cross-entity verification as a mandatory step in multi-model domain implementations. Individual entity verification is necessary but insufficient. The prompt template in Section 5.2 provides a repeatable starting point.
  3. Establish typed constants for all cross-cutting concerns before beginning API implementation. Permission names, event type identifiers, and table names must be compile-time constants before any routes are written. Retrofitting from string-based patterns is expensive.
  4. Encode event-sourcing atomicity requirements explicitly in Builder prompts. Do not assume Builder sessions will infer the correct write ordering. The two-phase commit pattern in Section 3.3 should be provided as a reference.
  5. Require requirement traceability in test naming conventions. Tests not traceable to a requirement provide coverage without verification. Enforce this through code review policy.
  6. Document AI cost and return on investment per major feature. Measurable ROI data provides the organizational evidence base for workflow adoption and continuous refinement.
When introducing the Plan → Implement → Verify pattern to a team, the cross-entity verification pass (Section 5.2) is the highest-value addition to an existing workflow. Teams already using AI for implementation typically lack systematic cross-entity review; adding this single step prevents the class of consistency defects that most commonly survive individual code review.

11. Conclusion and Forward Outlook

The Plan → Implement → Verify pattern is a production-validated methodology for AI-assisted software development that addresses structural limitations of single-session approaches. Its core insight—that independent Verifier sessions detect defect classes that Builder sessions cannot by design—is not a limitation of current AI capabilities but a property of any cognitive system that evaluates its own output. As AI model capabilities advance, the return on structured workflows will increase. More capable models will produce higher-quality implementations and more thorough verifications, but they will not eliminate the value of cognitive separation between planning, implementation, and review. Organizations that establish workflow discipline now will be positioned to systematically capture the value of future model improvements rather than experiencing them as unpredictable quality variations. The five critical defects prevented by this methodology in a single engagement—including an event-sourcing atomicity violation that would have produced irrecoverable state divergence in production—represent the category of failure that structured AI-assisted development is specifically designed to address.
All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.