Documentation Index
Fetch the complete documentation index at: https://www.aidonow.com/llms.txt
Use this file to discover all available pages before exploring further.
Executive Summary
Large-scale breaking changes in distributed data systems represent a category of engineering risk that ad-hoc approaches consistently underestimate. This paper documents a structured migration methodology applied to a six-entity capsule isolation enforcement initiative affecting 1,003 tests, 47 API handlers, and 127 call sites across a production multi-tenant platform. The systematic approach—comprising comprehensive planning via a long-context reasoning model, template-entity pattern validation, parallel AI-assisted implementation, and staged verification—completed the migration in 32 hours against a manual estimate of 160 to 240 hours, yielding a 5 to 7x time reduction. A critical finding is that the planning investment of 8 hours produced a 464-page migration specification that prevented six production-class defects. Organizations executing breaking changes without upfront comprehensive planning incur compounding rework costs that frequently exceed the original migration effort by an order of magnitude. This paper presents the decision framework, execution methodology, empirical metrics, and replicable patterns for engineering teams facing analogous migration challenges.Key Findings
- Comprehensive upfront planning eliminates downstream rework. An 8-hour planning investment produced a migration specification that identified the critical GSI pattern defect before implementation began, preventing the defect from propagating across all six entities.
- Template-entity validation is the highest-leverage quality gate in multi-entity migrations. Migrating the simplest entity first and verifying it completely before parallelizing catches pattern defects at unit cost rather than multiplied cost.
- AI-assisted parallel implementation achieved a 5 to 7x time reduction relative to manual sequential migration, primarily through systematic call-site updates and bulk test data modifications.
- AI agents require explicit dependency ordering for hierarchical entity migrations. Without human-specified migration sequences, agents default to alphabetical or arbitrary ordering that violates parent-child data relationships.
- AI-generated data migration plans are insufficient for concurrent-access scenarios. Human engineers must supply atomic transaction strategies; AI agents do not independently reason about race conditions in distributed data stores.
- Verification must be applied to every entity without exception. Skipping verification for entities presumed to follow an established template resulted in a staging-environment defect that required 2 additional hours to diagnose and resolve.
1. Problem Statement: Data Contamination Across Isolation Boundaries
1.1 The Production Anomaly
During the third week of integration testing, monitoring logs surfaced a critical isolation violation.1.2 Scope Analysis
Systematic analysis revealed that the defect was not isolated to a single entity. Six entities shared the same structural deficiency.| Entity | Scope | Issue | Risk |
|---|---|---|---|
| FinancialConfig | Tenant | Test configs in prod queries | High |
| Contract | Tenant | Test contracts in revenue reports | Critical |
| ContractLineItem | Tenant | Test line items in billing | Critical |
| ContractAmendment | Tenant | Test amendments in audit trail | High |
| RevenueSchedule | Tenant | Test revenue in financial reporting | Critical |
| AccessEntity | Tenant | Test access grants in security queries | Medium |
1.3 Root Cause
The vulnerable entity definition illustrates the structural problem.capsule_id in the partition key meant that query operations could not enforce isolation between environments sharing the same tenant identifier.
2. Migration Strategy Selection
2.1 Option Evaluation
Two migration strategies were evaluated against the constraints of zero downtime, no data loss, backward compatibility during transition, and atomic completion across all six entities. Option A: Incremental Migration This approach would migrate one entity at a time, deploying after each entity completion over a six-week period. The approach was rejected because a security boundary in a partially enforced state offers no isolation guarantee. Additionally, conditional query logic to determine whether each entity is capsule-scoped introduces complexity proportional to the number of migration stages, and six deployment cycles create six independent risk windows. Option B: Coordinated Comprehensive Migration (Selected) This approach plans all six entities before implementation begins, executes in parallel, and deploys in a single coordinated release. The breaking change is absorbed once rather than distributed across six increments. Selection Rationale: Capsule isolation is a security boundary. The property is binary—either enforced or not. Incremental enforcement provides a false sense of security and introduces query complexity that must subsequently be removed. A coordinated migration accepts higher upfront planning cost in exchange for a single, verifiable transition.3. Planning Methodology
3.1 Specification Development
A comprehensive migration specification was produced through a long-context reasoning session with the following input.Key Architecture Decisions from Migration Plan
Key Architecture Decisions from Migration Plan
Decision 1: Dual-Write Migration Strategy
Phase 1: Add capsule_id, maintain old PK pattern- Add capsule_id field to entities
- Still use old PK:
TENANT#{tenant_id}#CONTRACT#{id} - Dual-write: Write to both old and new patterns
- Queries use old pattern (no behavior change)
- Start querying new PK:
TENANT#{tenant_id}#CAPSULE#{capsule_id}#CONTRACT#{id} - Still dual-write to both patterns
- Monitor for issues
- Stop writing to old pattern
- Clean up old data
- Remove dual-write code
Decision 2: GSI Pattern Update
Old pattern:Decision 3: Test Data Migration
Challenge: 1,003 tests use hard-coded tenant_id, no capsule_id.Options:- Update all tests to include capsule_id (manual)
- Create default capsule for tests (automated)
- Generate migration script for test data
- Test helper:
test_capsule()returns default CapsuleId for all tests - Critical tests (cross-capsule scenarios): Explicit capsule_id values
4. Execution Methodology
4.1 Template Entity Migration
The migration began withFinancialConfig, selected as the template entity for two reasons: it has the fewest call sites (21 versus 40+ for Contract) and no foreign key dependencies on other entities in the migration scope. This property makes it the lowest-risk candidate for pattern validation.
The template migration produced one critical finding.
This single finding justified the template-entity approach. Had all six entities been migrated in parallel without prior validation, the same GSI defect would have appeared in all six, requiring a second remediation pass across the entire scope.
4.2 Parallel Implementation
With the corrected pattern validated in the template entity, five concurrent implementation sessions were launched. Session 1 — Contract Entity Group:- ContractEntity
- ContractLineItemEntity
- ContractAmendmentEntity
- RevenueScheduleEntryEntity
- AccessEntity
4.3 AI Performance Assessment
The following table documents AI agent performance by task category during the migration.| Task Category | Scope | AI Time | Manual Estimate | AI Efficacy |
|---|---|---|---|---|
| Call-site updates | 127 sites, 7 function signatures | 2 hours | 8 hours | High |
| Test data updates | 1,003 tests | 3 hours | 2–3 weeks | High |
| Dependency ordering | Entity hierarchy analysis | Not applicable | N/A | Insufficient — human required |
| Atomic data migration | Concurrent-access strategy | Not applicable | N/A | Insufficient — human required |
5. The Migration Pattern
The following code sequence represents the validated migration pattern established through the template entity process and subsequently applied to all remaining entities.The Consolidated Migration Pattern
The following code represents the complete validated pattern applied to all six entities. It combines entity schema update, repository signature update, API handler extraction, and cross-capsule isolation verification in a single reference implementation.6. Empirical Results
- Scope
- Time
- Quality
- Cost
Entities migrated: 6Files modified: 47Lines changed:
- Added: 1,247 lines
- Removed: 721 lines
- Net: +526 lines (additional isolation code)
7. Established Principles
The following principles emerged from the migration and apply to analogous multi-entity breaking change scenarios.| Principle | Rule | Rationale |
|---|---|---|
| Comprehensive planning | Complete the full migration specification before writing any implementation code | Ad-hoc migration of six entities would have required 12 migration passes to correct propagated defects |
| Template entity first | Migrate the simplest entity completely and verify before parallelizing | Defects found in the template entity are corrected once; defects found after parallelization are corrected N times |
| Merge order discipline | Identify shared files upfront; establish patterns in the first-merged entity; subsequent entities adopt those patterns | Parallel sessions generating conflicting changes to shared files require manual conflict resolution that negates velocity gains |
| Tests as part of migration | Migrate entity tests concurrently with entity schemas, not afterward | Tests are the primary evidence that the migration succeeded; deferring them defers verification |
| Durable migration artifacts | Preserve migration specifications alongside the codebase in version control | Future engineers, auditors, and onboarding personnel require documented rationale for breaking change decisions |
8. Recommendations
- Require comprehensive migration specifications before implementation begins. For breaking changes affecting more than two entities or more than 50 call sites, a detailed migration plan is not optional. The planning investment consistently returns 5 to 10 times its cost in prevented rework.
- Designate a template entity for every multi-entity migration. Select the entity with the fewest dependencies and call sites. Complete its migration and verification fully before parallelizing. Treat any defects found in the template entity as specification defects requiring plan correction before proceeding.
- Supply explicit dependency ordering to AI implementation agents. AI agents do not independently infer entity hierarchies. Engineering leads must analyze the dependency graph and provide migration sequencing as an explicit input to implementation sessions.
- Require human review of all data migration plans involving concurrent access. AI agents produce logically correct migration sequences for single-writer scenarios but do not account for distributed concurrency. All data migration plans involving live systems must include human review of atomicity and race condition handling.
- Apply verification to every migrated entity without exception. Verification shortcuts based on confidence in established patterns have been demonstrated to allow defects to reach staging environments. A consistent verification checklist applied to every entity is the only reliable quality gate.
- Preserve migration specifications as durable artifacts. Migration plans stored alongside the codebase provide institutional knowledge for future engineers, serve as the basis for post-migration audits, and accelerate onboarding for engineers joining after the migration is complete.
9. Forward-Looking Considerations
The migration framework described in this paper addresses the present state of AI-assisted development, in which agents excel at systematic pattern application but require human guidance for dependency analysis, concurrency strategy, and cross-entity coordination. As AI reasoning capabilities mature, the boundary between human and AI responsibility in migration planning will shift. However, the structural requirement for comprehensive upfront specification before implementation will persist, regardless of which party authors it. Organizations that institutionalize rigorous migration planning practices now will be able to leverage more capable AI agents as they become available, without accumulating the architectural debt that results from ad-hoc migration approaches. The compounding cost of unplanned breaking changes—measured in this study as 18 times the duration of a planned migration—provides a durable financial argument for sustained investment in migration methodology.Disclaimer: All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.