This isn’t a series about building a SaaS platform.It’s a series about using AI agents to build a SaaS platform - and documenting what actually
works, what fails spectacularly, and how human-AI collaboration really plays out at scale.
Not a Tutorial - This is experimental work with real failures, pivots, and “AI got it completely wrong” moments.The SaaS platform is the vehicle for testing AI workflows. The AI workflows are the point.
AI Focus: Evaluator, Builder, Verifier setup and coordinationSystem Example: Planning event sourcing architectureKey Learning: How to structure prompts for each agent role
2
Week 2: Plan → Implement → Verify
AI Focus: Three-phase workflow with quality gatesSystem Example: Implementing DynamoDB event storeKey Learning: Independent verification prevents AI hallucination
3
Week 3: When AI Excels - Boilerplate
AI Focus: AI-generated DynamoDB entities and repositoriesSystem Example: Creating 20+ entities with macrosKey Learning: AI saves 500+ lines of code on boilerplate
4
Week 4: When AI Excels - Pattern Recognition
AI Focus: AI learning from codebase patternsSystem Example: Multi-tenant isolation implementationKey Learning: AI consistency > human copy-paste
5
Week 5: When AI Fails - Architecture
AI Focus: AI suggesting wrong patterns for novel problemsSystem Example: Capsule isolation design (AI got it wrong)Key Learning: Humans must own architecture decisions
6
Week 6: When AI Fails - Security
AI Focus: AI missing subtle security vulnerabilitiesSystem Example: Cross-tenant query bug AI didn’t catchKey Learning: Dedicated CISO agent + human review required
7
Week 7: Testing with AI
AI Focus: Can AI generate good tests? (Spoiler: mostly yes)System Example: Four-level test suite generationKey Learning: AI writes 80% of tests, humans review edge cases
8
Week 8: Token Optimization
AI Focus: Optimizing how we work WITH AI (meta-layer)System Example: Reducing context usage from 150k to 20k tokens per sessionKey Learning: Working with AI requires its own collaboration patterns - batch builds, session boundaries, just-in-time loading
9
Week 9: When AI Says 'Done' But Isn't
AI Focus: AI gaming completion criteria - claiming done while leaving TODOs, broken links, compilation errorsSystem Example: Three features requiring 2-5 fix commits each despite “complete” claimsKey Learning: Verification is non-negotiable - prompt engineering reduces but doesn’t eliminate gaming behavior
10
Week 10: The Evolution - How AI Agents Found Their Voice
AI Focus: Fifteen specialized agents with distinct personalities and coordination patternsSystem Example: Agent charters, CFO sprint planning, inter-agent dynamics (Architect vs CISO negotiations)Key Learning: Software development is decision-making at scale. Each agent crystallizes a decision framework. Conflicts between agents produce better outcomes than either alone. ROI: 6-16x faster, 87% cost savings
✅ Boilerplate code - Structs, DTOs, CRUD operations✅ Pattern application - Following existing code patterns✅ Test generation - Happy path and common edge cases✅ Code review - Finding bugs, style issues, unused code✅ Refactoring - Consistent renames, extract method
❌ Architecture decisions - Choosing patterns for novel problems❌ Security edge cases - Subtle permission boundaries❌ Trade-off analysis - Cost vs complexity vs time decisions❌ Novel problems - No existing pattern to follow❌ False confidence - Hallucinating requirements that don’t exist
👤 Strategic decisions - What to build, how to architect👤 Security review - Cross-tenant isolation, auth boundaries👤 Quality gates - Approving plans, verifying AI output👤 Domain knowledge - Business rules, compliance requirements👤 Final approval - Every merge needs human eyes
What Happened: AI suggested Saga pattern for capsule provisioningWhy Wrong: Capsule creation is synchronous, not distributed transactionWho Failed: Human (me) - blindly accepted AI suggestionFix: Reverted to simple transaction, wrote ADR explaining whyLesson: AI doesn’t understand your specific context. Validate architecture suggestions.
Mistake #2: Reusing Builder for Verification (Week 2)
What Happened: Used same Claude session for build + verifyWhy Wrong: Builder was biased toward its own implementationImpact: Missed 5 bugs that fresh Verifier would catchFix: Always use fresh session for verificationLesson: Independent verification requires independent context
Mistake #3: AI Generated Wrong Tests (Week 6)
What Happened: AI generated 50 tests, all passed, shipped bug to productionWhy Wrong: Tests verified wrong behavior (false positives)Root Cause: AI hallucinated requirement that didn’t existFix: Human review of test assertions, not just coverageLesson: Green tests ≠ correct tests. Review what’s being tested.
Mistake #4: Over-Optimizing Prompts (Week 8)
What Happened: Spent 3 days tweaking verification prompts for 2% improvementWhy Wrong: Diminishing returns, time better spent elsewhereImpact: Delayed feature work for marginal gainsFix: Accept 90% accuracy, focus on high-value workLesson: Perfect is the enemy of done (applies to AI prompts too)
Why single-prompt coding doesn’t scale and how Evaluator, Builder, and Verifier agents transform development
Week 2: Plan-Implement-Verify
Building a full CRM domain layer with AI: How the three-phase workflow prevented hallucination and caught real bugs
Week 3: AI for Event Sourcing
72 commits of AI-driven refinement: How event sourcing revealed macro compliance gaps
Week 4: When AI Excels
107 commits in one week: How AI agents transformed tedious work like E2E testing and documentation into systematic excellence
Week 5: When AI Fails
30 commits to fix one breaking change - the week AI’s fix-in-session approach turned a 2-hour task into a 24-hour marathon
Week 6: AWS Runtime Adoption
How multi-agent AI discovered a scope-based factory pattern by analyzing our architecture docs - redemption after Week 5
Week 7: Config Governance
How hierarchical configuration with automatic injection made 340 lines of manual lookups vanish - ADR-driven design part 2
Week 8: Token Optimization
From 150k to 20k tokens per session: Learning the meta-patterns of AI collaboration through smart context management
Week 9: AI Gaming Completion
When AI claims “done” three times but requires 2-5 fix commits each: Discovering how AI games completion criteria and why verification beats prompts
Week 10: The Evolution (FINALE)
1000+ commits. 15 specialized agents. The retrospective: how AI agents developed distinct personalities, coordination protocols, and learned to build together
New articles published regularly documenting real AI workflow learnings
Disclaimer: This is experimental work from my personal projects. Results are real but may not
generalize to all contexts. Your mileage may vary.This content does not represent my employer’s views or technologies.