Performance Optimization Patterns at Scale

Executive Summary

Production performance degradation in multi-tenant SaaS platforms typically originates from architectural decisions that perform acceptably at low data volumes but degrade predictably as tenant data grows. A systematic analysis of seven bottlenecks in a DynamoDB-backed platform identified five high-leverage optimization patterns: query scoping to database level, Global Secondary Index utilization, in-memory caching with TTL, HTTP connection pooling, and derive-macro code generation. Implementing these patterns required fewer than 500 lines of total code changes and produced latency improvements ranging from 33 percent to 1000x, with monthly infrastructure cost savings of $420. This article documents each pattern, its applicability conditions, its implementation, and the empirical results observed in production.

Key Findings

In-memory filtering of database results is the highest-leverage performance anti-pattern to eliminate. Moving filter logic to the database layer reduces latency and read costs by one or more orders of magnitude, with minimal code changes.
Global Secondary Indices convert O(n) table scans into O(1) partition lookups. The latency differential between a full table scan of 100,000 records and a direct GSI lookup is approximately 50x in observed production conditions.
Appropriate caching of low-churn reference data reduces database read volume by over 90 percent. An in-memory cache with a five-minute TTL achieved a 96 percent hit rate for configuration data that changes at most weekly.
Connection setup overhead accounts for one-third of HTTP request latency when pooling is absent. Enabling connection pooling is typically a configuration change, not a code change, and should be the first optimization applied to any HTTP-heavy workload.
Derive macros eliminate an entire class of performance bugs by preventing missing GSI annotations that cause accidental full table scans undetectable in development environments.
Profiling consistently reveals that the highest-impact bottleneck differs from initial intuition. Measurement is a prerequisite for effective optimization, not a best practice.

1. Development Data Volumes Mask Performance Failures That Become Critical as Tenant Data Grows — Production Degradation in Multi-Tenant Platforms Is Predictable, Not Accidental

Multi-tenant platforms are particularly susceptible to performance degradation at scale because development and testing typically occur with data volumes that are orders of magnitude smaller than production. A query that retrieves 50 records from a 10-record development dataset is indistinguishable from the same query retrieving 50 records from a 10,000-record production dataset—until the production query must load and filter 10,000 records to find the 50. A production platform built on DynamoDB with event sourcing exhibited the following symptoms following growth in tenant data volume:

Session queries: 500ms average against a target of 50ms
Contact lookups: Full table scans on a 100,000-record table at 800ms to 2,000ms
Webhook execution: 200ms per call, of which 50ms was connection establishment overhead
Memory usage: 50MB per simple query, caused by full dataset loading

The following sections document each optimization pattern applied, the conditions under which it applies, and the measured outcomes.

2. Pattern 1: Moving Filter Logic to the Database Layer Eliminates the Primary Source of Unnecessary Memory Allocation and Read Amplification

2.1 In-Memory Filtering of Full Partition Loads Produced 10x Latency Overhead and 50MB Memory Allocation Per Request at Production Data Volumes

Session retrieval for a specific tenant context was implemented as a full tenant load followed by in-memory filtering:

// Load ALL sessions for the tenant, then filter in memory
pub async fn get_sessions_for_capsule(
    &self,
    tenant_id: &str,
    capsule_id: &str,
) -> Result<Vec<Session>> {
    // Query DynamoDB for ALL tenant sessions
    let all_sessions = self.repository
        .query_by_tenant(tenant_id)
        .await?;

    // Filter in memory to find capsule sessions
    let filtered: Vec<Session> = all_sessions
        .into_iter()
        .filter(|s| s.capsule_id == capsule_id)
        .collect();

    Ok(filtered)
}

At development data volumes (10 sessions per tenant), this pattern performed adequately. At production volumes (1,000 sessions per tenant), the query required loading 1,000 records and allocating 50MB of memory to return 50 records.

2.2 Pushing the Filter Predicate Into the DynamoDB Query Expression Reduces Data Transfer Without Requiring Index Changes

Push the filtering predicate to the DynamoDB query layer:

// Filter at the database level using DynamoDB expressions
pub async fn get_sessions_for_capsule(
    &self,
    tenant_id: &str,
    capsule_id: &str,
) -> Result<Vec<Session>> {
    // Query with filter expression - DynamoDB does the filtering
    let sessions = self.repository
        .query_by_tenant(tenant_id)
        .filter_expression("capsule_id = :capsule_id")
        .expression_values(hashmap! {
            ":capsule_id" => AttributeValue::S(capsule_id.to_string())
        })
        .await?;

    Ok(sessions)
}

2.3 Query Scoping Delivered 10x Latency Reduction, 90 Percent Memory Savings, and 20x Fewer DynamoDB Read Units

Metric	Before	After	Improvement
Query latency	500ms	50ms	10x
Memory per request	50MB	5MB	90% reduction
DynamoDB read units	1,000 items	50 items	20x fewer

DynamoDB filter expressions reduce data transfer between database and application but still consume read capacity units for all scanned items. For workloads requiring true O(1) lookups regardless of partition size, a Global Secondary Index is required (see Pattern 2). Filter expressions are appropriate when scan-to-result ratios are manageable and the overhead of an additional GSI is not justified.

2.4 Filter Expressions Are Appropriate When Scan-to-Result Ratios Are Manageable — True O(1) Lookups Require a Global Secondary Index

This pattern yields significant returns when:

The filter attribute has high cardinality relative to the partition being scanned
The query pattern is executed with sufficient frequency to amortize optimization effort
The unfiltered result set contains more than 100 records

3. Pattern 2: Global Secondary Indices Convert O(n) Full Table Scans Into O(1) Partition Lookups, Producing 53x Latency Improvements on High-Frequency Query Paths

3.1 Contact Retrieval Without an Index Required Scanning 100,000 Records, Producing 800ms to 2,000ms Latency That Scaled Linearly With Table Growth

Contact retrieval by account identifier required a full table scan:

// Find contact by account_id - requires full table scan
pub async fn get_contact_by_account(
    &self,
    account_id: &str,
) -> Result<Option<Contact>> {
    let all_contacts = self.repository
        .scan()  // ⚠️ FULL TABLE SCAN
        .await?;

    all_contacts
        .into_iter()
        .find(|c| c.account_id == account_id)
        .ok_or(Error::NotFound)
}

With 100,000 contacts in the table, every lookup required scanning the entire table. Best-case latency was 800ms; worst-case exceeded 2,000ms. Query complexity was O(n), scaling linearly with table growth.

3.2 Adding GSIs for the Three Highest-Frequency Query Patterns Replaced Full Table Scans With Direct Partition Reads

Add Global Secondary Indices for high-frequency query patterns:

#[derive(DynamoDbEntity)]
#[dynamodb(table = "contacts")]
pub struct ContactEntity {
    #[dynamodb(partition_key)]
    pub id: String,

    #[dynamodb(sort_key)]
    pub tenant_id: String,

    // GSI5: PrimaryContactIndex for account lookups
    #[dynamodb(gsi5_partition_key)]
    pub account_id: String,

    // GSI6: ExecutiveIndex for role-based queries
    #[dynamodb(gsi6_partition_key)]
    pub role: String,

    #[dynamodb(gsi6_sort_key)]
    pub department: String,
}

// Generated query method (from derive macro)
pub async fn query_by_account(
    &self,
    account_id: &str,
) -> Result<Vec<Contact>> {
    self.repository
        .query_gsi5(account_id)  // Direct GSI lookup
        .await
}

3.3 GSI Adoption Reduced Contact Lookup Latency From 800ms to 15ms and Converted Query Complexity From O(n) to O(1)

Metric	Before	After	Improvement
Query latency	800ms	15ms	53x
Query complexity	O(n)	O(1)	Constant time
Read cost	Full table scan	Single partition read	Proportional to result set

3.4 Three Indices Cover Account Lookups, Role-Based Queries, and Temporal Status Filtering — the Three Access Patterns With the Highest Observed Query Frequency

GSI indices were created for the three highest-frequency query patterns:

Query Pattern	Index	Primary Use Case
Account lookups	GSI5 (account_id)	Contact retrieval by account
Role queries	GSI6 (role + department)	Executive and role-based dashboards
Status filters	GSI7 (status + created_at)	Active contact lists with temporal ordering

3.5 GSI Provisioning Is Justified When the Query Attribute Is Outside the Primary Key, Frequency Exceeds 100 Executions Per Day, and the Table Exceeds 10,000 Items

Create a GSI when:

The query attribute is not part of the primary key
The query pattern executes more than 100 times per day
The table contains more than 10,000 items

Each GSI consumes additional write capacity and storage. GSI count per table should be bounded to 3–4 in most implementations to balance query performance against cost and write amplification. Evaluate each proposed GSI against actual query frequency data before provisioning.

4. Pattern 3: In-Memory Caching With TTL Achieves 96 Percent Hit Rates and 1000x Latency Improvement for Low-Churn Configuration Data

4.1 Webhook Configuration Was Fetched From DynamoDB on Every Invocation Despite Changing at Most Weekly, Producing 10,000 Unnecessary Reads Per Day

Webhook execution required loading hook configuration from DynamoDB on every invocation:

// Every webhook call hits DynamoDB
pub async fn execute_webhook(
    &self,
    hook_id: &str,
    payload: &str,
) -> Result<()> {
    // Load hook config from DynamoDB - 100ms
    let hook = self.repository
        .get_hook(hook_id)
        .await?;

    // Execute webhook - 150ms
    self.http_client
        .post(&hook.url)
        .body(payload)
        .send()
        .await?;

    Ok(())
}

Hook configuration changes at most weekly, yet was being retrieved on every execution. At 10,000 webhook calls per day, this produced 10,000 unnecessary DynamoDB reads representing 100ms of cumulative latency per call.

4.2 A Five-Minute TTL Cache Reduces Database Reads to 288 Per Day While Bounding Maximum Configuration Staleness to an Operationally Acceptable Window

In-memory cache with a five-minute TTL:

use std::sync::Arc;
use tokio::sync::RwLock;
use std::time::{Duration, Instant};

pub struct CachedHookRepository {
    repository: Arc<DynamoDbRepository>,
    cache: Arc<RwLock<HashMap<String, CachedHook>>>,
    ttl: Duration,
}

struct CachedHook {
    hook: Hook,
    expires_at: Instant,
}

impl CachedHookRepository {
    pub fn new(repository: DynamoDbRepository) -> Self {
        Self {
            repository: Arc::new(repository),
            cache: Arc::new(RwLock::new(HashMap::new())),
            ttl: Duration::from_secs(300), // 5 minutes
        }
    }

    pub async fn get_hook(&self, hook_id: &str) -> Result<Hook> {
        // Check cache first
        {
            let cache = self.cache.read().await;
            if let Some(cached) = cache.get(hook_id) {
                if cached.expires_at > Instant::now() {
                    return Ok(cached.hook.clone()); // Cache hit: 0.1ms
                }
            }
        }

        // Cache miss or expired - fetch from DynamoDB
        let hook = self.repository.get_hook(hook_id).await?;

        // Update cache
        {
            let mut cache = self.cache.write().await;
            cache.insert(hook_id.to_string(), CachedHook {
                hook: hook.clone(),
                expires_at: Instant::now() + self.ttl,
            });
        }

        Ok(hook)
    }
}

4.3 Caching Reduced Cache-Hit Latency From 100ms to 0.1ms and Daily DynamoDB Read Volume From 10,000 to 288 at a 96 Percent Hit Rate

Metric	Before	After	Improvement
Cache hit latency	100ms	0.1ms	1000x
Daily DynamoDB reads	10,000	288	97% reduction
Cache hit rate	N/A	96%	—

4.4 A Five-Minute TTL Optimizes the Hit-Rate-to-Staleness Trade-off — In-Memory Cache Is Appropriate When a Single Instance Handles the Workload and Cross-Instance Consistency Is Not Required

A five-minute TTL balances the 83 percent hit rate of a one-minute window against the staleness risk of a 15-minute window. An in-memory cache is appropriate here because the application runs as a single instance, the dataset is smaller than 10MB, and cross-instance consistency is not required. Distributed cache infrastructure would add operational overhead without benefit for this workload.

4.5 In-Memory Caching Is Appropriate When Read-to-Write Ratio Exceeds 10:1, Staleness of Several Minutes Is Acceptable, and the Dataset Fits Within 100MB

In-memory caching is appropriate when:

The read-to-write ratio exceeds 10:1
Staleness of up to several minutes is acceptable
The cached dataset fits within 100MB of application memory
The application runs as a single instance or consistency across instances is not required

Do not apply in-memory caching to user session data, financial transaction records, or real-time analytics. These data classes require either real-time consistency guarantees or cross-instance synchronization that in-memory caching cannot provide.

5. Pattern 4: Connection Pooling Eliminates TLS Handshake and DNS Lookup Overhead That Would Otherwise Account for One-Third of Per-Request Latency

5.1 Per-Request HTTP Client Instantiation Incurred 50ms of Connection Overhead on Every Webhook Call — One-Third of Total Request Latency

Webhook execution established a new HTTP client on every call, incurring 30–50ms TLS handshake and 10–20ms DNS lookup overhead on each request. Combined with a 100ms HTTP request, connection setup represented one-third of total per-call latency.

5.2 A Shared HTTP Client With Connection Pool Reuses Established Connections, Reducing Per-Call Overhead to Under 1ms on Pool Hits

Shared HTTP client with connection pool:

use reqwest::Client;
use std::time::Duration;

pub struct WebhookExecutor {
    // Shared HTTP client with connection pool
    client: Client,
}

impl WebhookExecutor {
    pub fn new() -> Self {
        let client = Client::builder()
            .pool_max_idle_per_host(32)  // Keep 32 idle connections per host
            .pool_idle_timeout(Duration::from_secs(90))
            .timeout(Duration::from_secs(30))
            .build()
            .expect("Failed to build HTTP client");

        Self { client }
    }

    pub async fn execute_webhook(
        &self,
        url: &str,
        payload: &str,
    ) -> Result<()> {
        // Reuses connection from pool - no TLS handshake
        let response = self.client
            .post(url)
            .body(payload)
            .send()  // Only ~100ms (50ms saved)
            .await?;

        Ok(())
    }
}

5.3 Connection Pooling Reduced Connection Overhead by 50x at a 92 Percent Pool Hit Rate, Delivering 33 Percent Total Per-Call Latency Reduction

Metric	Before	After	Improvement
Per-call latency	150ms	100ms	33%
Connection overhead	50ms	Less than 1ms (on pool hit)	50x
Pool hit rate	N/A	92%	—

5.4 Connection Pooling Applies to All Services Making Frequent HTTP Requests and Is Typically a Configuration Change, Not a Code Change

Connection pooling applies to all services that make frequent HTTP requests, all database connections, and any external API calls with non-trivial TLS overhead. Most HTTP client libraries, including reqwest in Rust, include built-in connection pooling. Enabling it requires configuration, not code.

Connection pooling is one of the highest-return, lowest-effort optimizations available for HTTP-heavy workloads. If the application creates HTTP clients per request, this optimization should be the first change applied before any other latency reduction work.

6. Pattern 5: Derive Macro Code Generation Eliminates Missing GSI Annotations — an Accidental Full Table Scan Defect Class Undetectable in Development

6.1 Manual Boilerplate Across Seven Entities Produced 1,050 Lines of Error-Prone Code, Copy-Paste Defects, and Missing GSI Annotations That Caused Accidental Table Scans

Every DynamoDB entity required 100–200 lines of manual boilerplate for attribute mapping and query methods. Manual implementation across seven entities produced 1,050 lines of boilerplate with copy-paste errors, missing GSI annotations (causing accidental full table scans), and no compile-time validation of field names.

6.2 A DynamoDB Derive Macro Generates Attribute Mapping, Query Methods, and Compile-Time Validation From 15-Line Annotated Struct Definitions

Derive macro generating entity boilerplate at compile time:

// Macro-driven implementation - 15 lines total
#[derive(DynamoDbEntity)]
#[dynamodb(table = "contacts")]
pub struct ContactEntity {
    #[dynamodb(partition_key)]
    pub id: String,

    #[dynamodb(sort_key)]
    pub tenant_id: String,

    #[dynamodb(gsi5_partition_key)]
    pub account_id: String,

    #[dynamodb(gsi6_partition_key)]
    pub role: String,
}

// Macro generates:
// - from_item() / to_item() methods
// - query_gsi5() / query_gsi6() methods
// - Type-safe field accessors
// - Compile-time validation

6.3 Code Generation Reduced Boilerplate by 90 Percent per Entity and Moved Missing GSI Detection From Runtime Failures to Compile-Time Errors

Metric	Before	After	Improvement
Lines per entity	150	15	90% reduction
Total boilerplate (7 entities)	1,050 lines	105 lines	945 lines eliminated
Time to add new entity	30 minutes	3 minutes	10x
Copy-paste errors	Present	Eliminated	—
Missing GSI annotations detected	At runtime	At compile time	—

The runtime performance of generated code is identical to equivalent hand-written code. The primary performance benefit is the elimination of a class of bugs—missing GSI annotations—that would otherwise produce accidental full table scans undetectable in development environments.

6.4 Code Generation Through Macros Is Appropriate When Structural Patterns Repeat Across Multiple Entities and Compile-Time Validation Can Prevent Runtime Errors

Code generation through macros or equivalent tooling is appropriate when:

The same structural pattern repeats across multiple entities
Manual implementation introduces error-prone boilerplate
Compile-time validation can prevent runtime errors

7. Profiling Revealed That the Highest-Impact Bottleneck Differed From Initial Intuition — Measurement Is a Prerequisite for Effective Optimization, Not a Recommended Practice

7.1 Session Queries — Not Contact Queries — Were the True Primary Bottleneck, Identified Only Through Frequency-Weighted Profiling

The initial assumption in this engagement was that contact queries were the primary bottleneck, as they were the most visible in application logs. Profiling revealed that session queries were 10 times more frequent and constituted a larger share of total latency. Measurement is a prerequisite for effective optimization, not a recommended practice. Recommended instrumentation: application-level latency at P50, P95, and P99 by operation; database slow query logs; memory allocation profiling per request; distributed tracing for cross-service request flows.

7.2 Optimization Impact Is the Product of Frequency, Latency, and Business Criticality — Not Latency Alone

Not all slow queries merit equal optimization effort. Impact = Frequency × Latency × Business Criticality

Query	Frequency	Latency	Business Criticality	Priority
Session by context	10,000/day	500ms	High (API path)	1
Contact by account	5,000/day	800ms	High (dashboard)	2
Hook configuration load	10,000/day	100ms	Medium (background)	3
Administrative reports	10/day	2,000ms	Low (internal)	4

7.3 Architecture Decision Records Prevent Performance Regression by Preserving the Rationale Behind Cache TTLs, GSI Choices, and Filter Strategies

Each significant optimization decision should be captured in an Architecture Decision Record documenting the problem context, the decision made, positive and negative consequences, measured before-and-after metrics, and references to related decisions. ADRs preserve institutional knowledge and prevent regression when engineers unfamiliar with the original decision later modify the system. For a detailed ADR template, see ADRs as Architecture Documentation.

8. Five Patterns Delivered Latency Improvements From 33 Percent to 1000x, $420 Monthly Cost Savings, and 27 Developer Hours Saved per Month From Fewer Than 500 Lines of Code Changes

Optimization	Metric	Before	After	Improvement
Query Scoping	Latency	500ms	50ms	10x
	Memory	50MB	5MB	10x
	DynamoDB reads	1,000 items	50 items	20x
Contact GSI	Latency	800ms	15ms	53x
	Complexity	O(n)	O(1)	—
Hook Caching	Cache hit latency	100ms	0.1ms	1000x
	Daily DB reads	10,000	288	97% reduction
Connection Pooling	Latency	150ms	100ms	1.5x
	Connection overhead	50ms	Less than 1ms	50x
Code Generation	Lines of code	1,050	105	95% reduction
	Dev time per entity	30 min	3 min	10x

Total savings: $420/month in reduced DynamoDB read capacity; 27 developer hours/month saved; fewer than 500 lines of code changed across all five patterns.

9. Recommendations

Establish performance budgets per operation type before production launch. Define P95 latency targets for API endpoints, background jobs, and database queries; alert before user impact.
Instrument every DynamoDB query with read unit consumption metrics. Full table scans introduced by code changes are undetectable in development. Production instrumentation is the only reliable mechanism.
Require connection pool configuration in all HTTP client initialization. Treat single-use client instantiation as a blocking defect in code review.
Limit GSI proliferation through a formal review process. Each GSI represents write amplification cost. Require benchmark evidence of query frequency before approving additions.
Document significant performance decisions in Architecture Decision Records. Cache TTL values, GSI choices, and filter strategies are not self-evident from code. ADRs prevent regression and preserve institutional knowledge.
Measure P95 and P99 latency, not averages. Average metrics mask tail latency. Optimization targets must be defined against percentile thresholds.

The fastest query is the one that is never executed. Before applying caching or indexing to a query, evaluate whether the query can be eliminated entirely. In one case, 60 percent of hook configuration queries were eliminated by embedding configuration in the triggering event payload, removing the lookup requirement entirely.

10. Resources and Further Reading

DynamoDB Filter Expressions vs GSI
GSI Design Patterns
Cache Invalidation Best Practices
In-Memory vs Distributed Caching Trade-offs
Multi-Agent AI Workflow — Systematic feature development methodology
ADRs as Architecture Documentation — Documenting performance decisions

11. Conclusion and Forward Outlook

The five patterns documented here address distinct performance failure modes that are predictable, measurable, and correctable. Their value lies not in novelty but in systematic application: identifying the correct bottleneck through measurement, selecting the appropriate pattern, implementing it with verification, and documenting the rationale for future engineers. As multi-tenant data volumes continue to grow, the cost of deferred optimization increases. Organizations that instrument performance from initial deployment and establish optimization standards before performance incidents occur will maintain system reliability and cost efficiency at scale. The patterns described here—particularly database-level filtering and connection pooling—require minimal engineering investment relative to their impact and should be treated as baseline standards rather than optional enhancements.

Disclaimer: This content represents my personal learning journey using AI for a personal project. It does not represent my employer’s views, technologies, or approaches.All code examples are generic patterns or pseudocode for educational purposes. Performance numbers are from real implementations but have been sanitized and rounded for clarity.

Overview

Workflows

Process

Infrastructure

Documentation Index

​Executive Summary

​Key Findings

​1. Development Data Volumes Mask Performance Failures That Become Critical as Tenant Data Grows — Production Degradation in Multi-Tenant Platforms Is Predictable, Not Accidental

​2. Pattern 1: Moving Filter Logic to the Database Layer Eliminates the Primary Source of Unnecessary Memory Allocation and Read Amplification

​2.1 In-Memory Filtering of Full Partition Loads Produced 10x Latency Overhead and 50MB Memory Allocation Per Request at Production Data Volumes

​2.2 Pushing the Filter Predicate Into the DynamoDB Query Expression Reduces Data Transfer Without Requiring Index Changes

​2.3 Query Scoping Delivered 10x Latency Reduction, 90 Percent Memory Savings, and 20x Fewer DynamoDB Read Units

​2.4 Filter Expressions Are Appropriate When Scan-to-Result Ratios Are Manageable — True O(1) Lookups Require a Global Secondary Index

​3. Pattern 2: Global Secondary Indices Convert O(n) Full Table Scans Into O(1) Partition Lookups, Producing 53x Latency Improvements on High-Frequency Query Paths

​3.1 Contact Retrieval Without an Index Required Scanning 100,000 Records, Producing 800ms to 2,000ms Latency That Scaled Linearly With Table Growth

​3.2 Adding GSIs for the Three Highest-Frequency Query Patterns Replaced Full Table Scans With Direct Partition Reads

​3.3 GSI Adoption Reduced Contact Lookup Latency From 800ms to 15ms and Converted Query Complexity From O(n) to O(1)

​3.4 Three Indices Cover Account Lookups, Role-Based Queries, and Temporal Status Filtering — the Three Access Patterns With the Highest Observed Query Frequency

​3.5 GSI Provisioning Is Justified When the Query Attribute Is Outside the Primary Key, Frequency Exceeds 100 Executions Per Day, and the Table Exceeds 10,000 Items

​4. Pattern 3: In-Memory Caching With TTL Achieves 96 Percent Hit Rates and 1000x Latency Improvement for Low-Churn Configuration Data

​4.1 Webhook Configuration Was Fetched From DynamoDB on Every Invocation Despite Changing at Most Weekly, Producing 10,000 Unnecessary Reads Per Day

​4.2 A Five-Minute TTL Cache Reduces Database Reads to 288 Per Day While Bounding Maximum Configuration Staleness to an Operationally Acceptable Window

​4.3 Caching Reduced Cache-Hit Latency From 100ms to 0.1ms and Daily DynamoDB Read Volume From 10,000 to 288 at a 96 Percent Hit Rate

​4.4 A Five-Minute TTL Optimizes the Hit-Rate-to-Staleness Trade-off — In-Memory Cache Is Appropriate When a Single Instance Handles the Workload and Cross-Instance Consistency Is Not Required

​4.5 In-Memory Caching Is Appropriate When Read-to-Write Ratio Exceeds 10:1, Staleness of Several Minutes Is Acceptable, and the Dataset Fits Within 100MB

​5. Pattern 4: Connection Pooling Eliminates TLS Handshake and DNS Lookup Overhead That Would Otherwise Account for One-Third of Per-Request Latency

​5.1 Per-Request HTTP Client Instantiation Incurred 50ms of Connection Overhead on Every Webhook Call — One-Third of Total Request Latency

​5.2 A Shared HTTP Client With Connection Pool Reuses Established Connections, Reducing Per-Call Overhead to Under 1ms on Pool Hits

​5.3 Connection Pooling Reduced Connection Overhead by 50x at a 92 Percent Pool Hit Rate, Delivering 33 Percent Total Per-Call Latency Reduction

​5.4 Connection Pooling Applies to All Services Making Frequent HTTP Requests and Is Typically a Configuration Change, Not a Code Change

​6. Pattern 5: Derive Macro Code Generation Eliminates Missing GSI Annotations — an Accidental Full Table Scan Defect Class Undetectable in Development

​6.1 Manual Boilerplate Across Seven Entities Produced 1,050 Lines of Error-Prone Code, Copy-Paste Defects, and Missing GSI Annotations That Caused Accidental Table Scans

​6.2 A DynamoDB Derive Macro Generates Attribute Mapping, Query Methods, and Compile-Time Validation From 15-Line Annotated Struct Definitions

​6.3 Code Generation Reduced Boilerplate by 90 Percent per Entity and Moved Missing GSI Detection From Runtime Failures to Compile-Time Errors

​6.4 Code Generation Through Macros Is Appropriate When Structural Patterns Repeat Across Multiple Entities and Compile-Time Validation Can Prevent Runtime Errors

​7. Profiling Revealed That the Highest-Impact Bottleneck Differed From Initial Intuition — Measurement Is a Prerequisite for Effective Optimization, Not a Recommended Practice

​7.1 Session Queries — Not Contact Queries — Were the True Primary Bottleneck, Identified Only Through Frequency-Weighted Profiling

​7.2 Optimization Impact Is the Product of Frequency, Latency, and Business Criticality — Not Latency Alone

​7.3 Architecture Decision Records Prevent Performance Regression by Preserving the Rationale Behind Cache TTLs, GSI Choices, and Filter Strategies

​8. Five Patterns Delivered Latency Improvements From 33 Percent to 1000x, $420 Monthly Cost Savings, and 27 Developer Hours Saved per Month From Fewer Than 500 Lines of Code Changes

​9. Recommendations

​10. Resources and Further Reading

​11. Conclusion and Forward Outlook

Executive Summary

Key Findings

1. Development Data Volumes Mask Performance Failures That Become Critical as Tenant Data Grows — Production Degradation in Multi-Tenant Platforms Is Predictable, Not Accidental

2. Pattern 1: Moving Filter Logic to the Database Layer Eliminates the Primary Source of Unnecessary Memory Allocation and Read Amplification

2.1 In-Memory Filtering of Full Partition Loads Produced 10x Latency Overhead and 50MB Memory Allocation Per Request at Production Data Volumes

2.2 Pushing the Filter Predicate Into the DynamoDB Query Expression Reduces Data Transfer Without Requiring Index Changes

2.3 Query Scoping Delivered 10x Latency Reduction, 90 Percent Memory Savings, and 20x Fewer DynamoDB Read Units

2.4 Filter Expressions Are Appropriate When Scan-to-Result Ratios Are Manageable — True O(1) Lookups Require a Global Secondary Index

3. Pattern 2: Global Secondary Indices Convert O(n) Full Table Scans Into O(1) Partition Lookups, Producing 53x Latency Improvements on High-Frequency Query Paths

3.1 Contact Retrieval Without an Index Required Scanning 100,000 Records, Producing 800ms to 2,000ms Latency That Scaled Linearly With Table Growth

3.2 Adding GSIs for the Three Highest-Frequency Query Patterns Replaced Full Table Scans With Direct Partition Reads

3.3 GSI Adoption Reduced Contact Lookup Latency From 800ms to 15ms and Converted Query Complexity From O(n) to O(1)

3.4 Three Indices Cover Account Lookups, Role-Based Queries, and Temporal Status Filtering — the Three Access Patterns With the Highest Observed Query Frequency

3.5 GSI Provisioning Is Justified When the Query Attribute Is Outside the Primary Key, Frequency Exceeds 100 Executions Per Day, and the Table Exceeds 10,000 Items

4. Pattern 3: In-Memory Caching With TTL Achieves 96 Percent Hit Rates and 1000x Latency Improvement for Low-Churn Configuration Data

4.1 Webhook Configuration Was Fetched From DynamoDB on Every Invocation Despite Changing at Most Weekly, Producing 10,000 Unnecessary Reads Per Day

4.2 A Five-Minute TTL Cache Reduces Database Reads to 288 Per Day While Bounding Maximum Configuration Staleness to an Operationally Acceptable Window

4.3 Caching Reduced Cache-Hit Latency From 100ms to 0.1ms and Daily DynamoDB Read Volume From 10,000 to 288 at a 96 Percent Hit Rate

4.4 A Five-Minute TTL Optimizes the Hit-Rate-to-Staleness Trade-off — In-Memory Cache Is Appropriate When a Single Instance Handles the Workload and Cross-Instance Consistency Is Not Required

4.5 In-Memory Caching Is Appropriate When Read-to-Write Ratio Exceeds 10:1, Staleness of Several Minutes Is Acceptable, and the Dataset Fits Within 100MB

5. Pattern 4: Connection Pooling Eliminates TLS Handshake and DNS Lookup Overhead That Would Otherwise Account for One-Third of Per-Request Latency

5.1 Per-Request HTTP Client Instantiation Incurred 50ms of Connection Overhead on Every Webhook Call — One-Third of Total Request Latency

5.2 A Shared HTTP Client With Connection Pool Reuses Established Connections, Reducing Per-Call Overhead to Under 1ms on Pool Hits

5.3 Connection Pooling Reduced Connection Overhead by 50x at a 92 Percent Pool Hit Rate, Delivering 33 Percent Total Per-Call Latency Reduction

5.4 Connection Pooling Applies to All Services Making Frequent HTTP Requests and Is Typically a Configuration Change, Not a Code Change

6. Pattern 5: Derive Macro Code Generation Eliminates Missing GSI Annotations — an Accidental Full Table Scan Defect Class Undetectable in Development

6.1 Manual Boilerplate Across Seven Entities Produced 1,050 Lines of Error-Prone Code, Copy-Paste Defects, and Missing GSI Annotations That Caused Accidental Table Scans

6.2 A DynamoDB Derive Macro Generates Attribute Mapping, Query Methods, and Compile-Time Validation From 15-Line Annotated Struct Definitions

6.3 Code Generation Reduced Boilerplate by 90 Percent per Entity and Moved Missing GSI Detection From Runtime Failures to Compile-Time Errors

6.4 Code Generation Through Macros Is Appropriate When Structural Patterns Repeat Across Multiple Entities and Compile-Time Validation Can Prevent Runtime Errors

7. Profiling Revealed That the Highest-Impact Bottleneck Differed From Initial Intuition — Measurement Is a Prerequisite for Effective Optimization, Not a Recommended Practice

7.1 Session Queries — Not Contact Queries — Were the True Primary Bottleneck, Identified Only Through Frequency-Weighted Profiling

7.2 Optimization Impact Is the Product of Frequency, Latency, and Business Criticality — Not Latency Alone

7.3 Architecture Decision Records Prevent Performance Regression by Preserving the Rationale Behind Cache TTLs, GSI Choices, and Filter Strategies

8. Five Patterns Delivered Latency Improvements From 33 Percent to 1000x, $420 Monthly Cost Savings, and 27 Developer Hours Saved per Month From Fewer Than 500 Lines of Code Changes

9. Recommendations

10. Resources and Further Reading

11. Conclusion and Forward Outlook