Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.aidonow.com/llms.txt

Use this file to discover all available pages before exploring further.

Executive Summary

Multi-tenant SaaS platforms face a systemic risk when AWS SDK clients are instantiated ad hoc across service boundaries: the absence of structural isolation guarantees creates conditions for cross-tenant data exposure that unit tests cannot reliably detect. This analysis documents the design and implementation of a scope-aware AWS client factory pattern for a Rust-based SaaS platform, wherein four distinct operational scopes — platform, tenant, capsule, and operator — are encoded directly into the SDK client type hierarchy. The factory pattern, derived through systematic analysis of existing architectural decision records (ADRs), reduced per-crate boilerplate by approximately 600 lines and enforced 100 percent ADR compliance through the type system rather than documentation. A critical operational gap — the absence of credential caching for cross-account STS operations — required human intervention informed by production experience with AWS assume-role latency characteristics. Organizations building multi-tenant infrastructure should treat architectural decision records as primary inputs to AI-assisted design, encode isolation boundaries in API types rather than naming conventions, and layer performance optimizations only after structural correctness is established.

Key Findings

  1. Ad hoc AWS client instantiation across 9 Rust crates and 16 SDKs produced no compile-time barrier to cross-tenant data access, representing a structural security deficit rather than a procedural one.
  2. A four-scope client hierarchy (platform, tenant, capsule, operator) derived directly from ADR-0010 enforced isolation automatically at the SDK level, requiring zero developer discipline for compliance.
  3. AI-assisted pattern extraction from structured ADR documentation reduced the design-through-migration cycle from an estimated two weeks to three days — a 78 percent reduction in elapsed time.
  4. AI-generated implementations consistently omit operationally derived optimizations: the initial design lacked STS credential caching, introducing 200–500 ms latency per cross-account operation until corrected by a human reviewer.
  5. Each crate migration eliminated approximately 600 lines of client-creation boilerplate and replaced inconsistent error handling with compiler-enforced patterns.
  6. 39 integration tests covering 9 AWS services and 4 scope types reached zero production defects across the Phase 1 migration of 3 crates.

1. Problem Definition: Structural Isolation Gaps in Distributed AWS Client Management

By January 2026, the subject platform had grown to encompass 9 Rust crates consuming 16 distinct AWS SDKs. Client instantiation followed no consistent pattern: DynamoDB clients appeared inside authentication handlers, S3 clients inside upload endpoints, and SQS clients distributed across application layers without coordination. This arrangement produced three compounding deficiencies:
  • No credential governance. Each instantiation site called aws_config::load_from_env() independently, precluding centralized credential rotation or assumption policies.
  • No scope enforcement. Clients carried no concept of the operational boundary — platform, tenant, or environment — within which they were permitted to operate.
  • No structural isolation. Table names were hardcoded strings, making cross-tenant access a matter of developer oversight rather than type-system enforcement.
The risk profile was not hypothetical. In event-driven multi-tenant architectures, a single misaddressed table name can write tenant A’s data under tenant B’s partition key. Conventional code review catches such errors inconsistently.
Critical Risk: In multi-tenant systems, isolation enforced by naming convention alone creates a class of defect that passes all functional tests yet produces data leakage in production. Structural enforcement through type hierarchies eliminates this defect class entirely.

2. Methodology: ADR-Driven AI Pattern Extraction

The design process deliberately avoided prescriptive prompting. Rather than instructing an AI evaluator agent to “build an AWS client factory,” the engagement presented ADR-0010 — the platform’s documented isolation boundary specification — and posed the architectural question: given these constraints, how should AWS clients be managed? This distinction is consequential. Prescriptive prompting produces implementations of the requester’s mental model. Constraint-driven prompting allows the model to surface structural patterns the human may not have articulated. The evaluator agent identified four distinct operational scopes encoded implicitly in ADR-0010:
ScopePurposeIsolation BoundaryCredential Model
PlatformCross-tenant infrastructure (metrics, logs, platform config)None — shared by all tenantsEnvironment credentials
TenantCustomer-specific resourcesTenant ID prefixEnvironment credentials (IAM role: planned)
CapsuleSDLC environment resources (dev/staging/prod)Capsule code prefix + partition key validationEnvironment credentials
OperatorCross-account AWS Organizations operationsCross-account IAM assumptionSTS AssumeRole
Each scope maps to a dedicated client type that enforces its isolation rules through the API surface rather than documentation.

3. The Four-Scope Client Architecture

3.1 Platform Clients

Platform clients operate on shared infrastructure with no tenant-specific prefixing.
Platform (No Isolation)
// Platform-wide resources shared by all tenants
let factory = AwsClientFactory::from_env().await?;
let platform_db = factory.platform_dynamodb();

// Table name: "events" (no prefix)
// Use for: cross-tenant metrics, system logs, platform config

3.2 Tenant Clients

Tenant clients scope all resource access to a specific customer boundary, with an architectural path for future cross-account IAM role assumption.
Tenant (Tenant Boundary)
// Tenant-scoped resources (future: assume IAM role)
let tenant = Tenant {
    tenant_id: "tenant-123",
    tenant_name: "Acme Corp",
    role_arn: None,  // Future: cross-account access
};
let tenant_db = factory.tenant_dynamodb(&tenant);

// Table name: "tenant-123_notifications"
// Use for: tenant-specific queues, topics, webhooks

3.3 Capsule Clients

The capsule client represents the most architecturally significant finding. ADR-0010 required SDLC environment isolation — development data must not appear in production query results. The evaluator agent proposed automatic table prefixing derived from the capsule code, enforced at compile time.
Capsule (SDLC Isolation)
// Capsule-scoped enforces SDLC environment boundaries
let capsule = Capsule::new(
    "tenant-123",
    "caps-456",
    "PRODUS"  // Production US
);
let capsule_db = factory.capsule_dynamodb(&capsule);

// Table name: "PRODUS_crm"
// Partition key: "TENANT#tenant-123#CAPSULE#caps-456#LEAD#lead-789"
// Use for: domain entities (accounts, leads, opportunities)
The implementation encodes both table prefixing and partition key validation within the client type:
impl CapsuleClient<DynamoDbClient> {
    /// Returns capsule-prefixed table name per ADR-0010
    pub fn table_name(&self, base: &str) -> String {
        format!("{}_{}", self.capsule.capsule_code, base)
    }

    /// Validates partition key includes capsule boundary
    pub fn validate_pk(&self, pk: &str) -> Result<()> {
        if !pk.contains(&format!("CAPSULE#{}", self.capsule.capsule_id)) {
            return Err(IsolationViolation::CrossCapsule);
        }
        Ok(())
    }
}
Prior to this implementation, SDLC isolation depended on developer discipline. After implementation, a developer cannot construct a capsule client and accidentally address a cross-capsule table — the type system prevents it at compile time.

3.4 Operator Clients

Operator clients support cross-account AWS Organizations operations and are the only client type requiring STS assume-role credential chains.
Operator (Cross-Account)
// Cross-account AWS Organizations operations
let operator_sts = factory.operator_sts();

// Use for: account provisioning, Control Tower, organization management

4. Operational Deficiency: Credential Caching Gap

The initial AI-generated implementation exposed a material operational deficiency.
Deficiency Identified: The operator client design assumed that AWS STS AssumeRole calls were instantaneous. In production, AssumeRole introduces 200–500 ms of latency per call. Without credential caching, every cross-account operation incurred this penalty.Resolution: A Moka cache with a 55-minute TTL (AWS temporary credentials expire after 60 minutes) was added at the service layer. This change required human knowledge of AWS operational characteristics that are absent from SDK documentation but well-known to practitioners with production STS experience.Implication: AI-generated infrastructure reflects documented behavior accurately. It does not reflect undocumented operational characteristics, latency profiles, or production failure modes. Human review by practitioners with relevant operational experience is a non-negotiable gate for production deployments.

5. Migration Approach and Measured Impact

Phase 1 migrated 3 crates — auth, crm, and catalog — to validate the pattern before broader rollout. Before (auth crate):
// Scattered throughout the crate
let config = aws_config::load_from_env().await;
let dynamodb = aws_sdk_dynamodb::Client::new(&config);

// Hardcoded table name
let table_name = "platform_auth_sessions";  // WRONG: no capsule isolation
After (auth crate):
// Single factory, injected via dependency
let factory = AwsClientFactory::from_env().await?;
let capsule = extract_capsule(&req)?;
let client = factory.capsule_dynamodb(&capsule);

// Automatic prefixing
let table_name = client.table_name("sessions");  // "PRODUS_sessions"
CrateFiles ChangedBoilerplate Removed
auth14~600 lines
crm22~600 lines
catalog8~600 lines
The migration reduced total client-creation boilerplate by approximately 1,800 lines across three crates while simultaneously elevating every migrated crate to full ADR compliance.

6. Comparative Analysis: Pre- and Post-Implementation State

DimensionBefore Factory PatternAfter Factory Pattern
Isolation enforcementDeveloper discipline (naming conventions)Compile-time type system
Credential management9+ independent instantiation sitesSingle factory, centralized
ADR complianceUnverifiable at build time100% enforced by API surface
Cross-tenant riskStructural — cannot be reviewed awayEliminated by design
Boilerplate per crate~600 lines~0 lines (factory call)
Time to migrate additional crate~3 days (estimated)~2–4 hours (with migration guide)
Operational cachingNoneMoka, 55-min TTL, AWS-aligned

7. AI Collaboration Profile

The engagement surface for AI assistance was well-defined and productive within its scope. Specific observations merit documentation for practitioners evaluating similar workflows.
AI Capability Boundary: AI evaluator agents extract structural patterns from structured documents (ADRs, specifications) with high fidelity. They do not infer operational characteristics — latency profiles, credential TTLs, cache sizing — from documentation alone. Organizations should budget for human review of all performance-sensitive infrastructure generated by AI agents.
Areas where AI contribution was high-fidelity:
  • Scope identification from ADR-0010 (zero human prompting required)
  • Client type hierarchy design
  • Table naming convention implementation
  • Test generation (39 integration tests, 2,364 lines in 3 days)
Areas requiring human expertise:
  • STS credential caching (operational knowledge)
  • LocalStack endpoint configuration (environment-specific knowledge)
  • Credential TTL alignment with AWS session limits (operational knowledge)

8. Recommendations

  1. Encode isolation boundaries in type systems, not naming conventions. Any multi-tenant architecture that relies on developer discipline to maintain naming conventions is structurally insecure. Compile-time enforcement through scope-typed client hierarchies eliminates the defect class entirely.
  2. Treat architectural decision records as primary AI design inputs. Before engaging AI assistance for infrastructure design, produce structured ADRs that specify isolation boundaries, scope rules, and naming conventions. AI agents derive more architecturally coherent solutions from constraint documents than from feature requests.
  3. Stage AI-generated implementations through human operational review before production. Mandate review by practitioners with relevant AWS operational experience for any infrastructure involving credential management, cross-account access, or performance-sensitive paths. Document findings as ADR amendments.
  4. Migrate incrementally, validating test parity at each step. Phase 1 migration of 3 crates took 2 days with the multi-agent workflow (Evaluator plans, Builder executes per crate, Verifier confirms test passage). Attempting broader rollout without per-crate verification increases defect introduction risk.
  5. Centralize factory instantiation as an application-layer dependency. A single AwsClientFactory instance injected via dependency provides a natural seam for testing (LocalStack substitution), credential rotation, and observability instrumentation.
For organizations migrating existing codebases, a structured migration guide reduces per-crate migration time substantially. The pattern: remove ad hoc client instantiation, inject the factory via dependency, replace hardcoded table names with typed client methods, and verify existing tests pass before proceeding to the next crate.

9. Implementation Metrics Summary

MetricValue
Crates in scope9
AWS SDKs managed9 services
Infrastructure added2,364 lines
Integration tests generated39
Production defects post-migration0
Phase 1 crates migrated3
Estimated time without AI assistance~2 weeks
Actual elapsed time3 days
Elapsed time reduction78%
Boilerplate removed per crate~600 lines

10. Conclusion and Forward-Looking Assessment

The scope-based client factory pattern resolves a structural security and maintainability deficit that is endemic to ad hoc multi-tenant AWS client management. By encoding isolation boundaries in the type system rather than naming conventions, organizations eliminate an entire defect class that conventional code review cannot reliably prevent. The combination of ADR-driven design and AI-assisted implementation accelerates delivery while preserving architectural coherence — provided that human operational expertise reviews AI-generated output before production deployment. As AI-assisted infrastructure design matures, the practitioner’s role will increasingly shift toward constraint specification (ADRs, architecture principles) and operational review, rather than initial implementation. Organizations that invest in structured ADRs today position themselves to extract disproportionate productivity from AI design assistance as these workflows become standard practice.

Resources and Further Reading


Next in This Series

Week 6: How configuration governance middleware eliminated 100% of manual config lookups using the same ADR-driven approach.

Week 6: Configuration Governance

The middleware pattern that made configuration hierarchical and automatic

Discussion

Share Your Experience

Have you built multi-tenant infrastructure? How do you enforce isolation boundaries?Connect on LinkedIn

Disclaimer: This content represents my personal learning journey using AI for a personal project. It does not represent my employer’s views, technologies, or approaches.All code examples are generic patterns or pseudocode for educational purposes.