Documentation Index
Fetch the complete documentation index at: https://www.aidonow.com/llms.txt
Use this file to discover all available pages before exploring further.
Executive Summary
Production performance degradation in multi-tenant SaaS platforms typically originates from architectural decisions that perform acceptably at low data volumes but degrade predictably as tenant data grows. A systematic analysis of seven bottlenecks in a DynamoDB-backed platform identified five high-leverage optimization patterns: query scoping to database level, Global Secondary Index utilization, in-memory caching with TTL, HTTP connection pooling, and derive-macro code generation. Implementing these patterns required fewer than 500 lines of total code changes and produced latency improvements ranging from 33 percent to 1000x, with monthly infrastructure cost savings of $420. This article documents each pattern, its applicability conditions, its implementation, and the empirical results observed in production.
Key Findings
- In-memory filtering of database results is the highest-leverage performance anti-pattern to eliminate. Moving filter logic to the database layer reduces latency and read costs by one or more orders of magnitude, with minimal code changes.
- Global Secondary Indices convert O(n) table scans into O(1) partition lookups. The latency differential between a full table scan of 100,000 records and a direct GSI lookup is approximately 50x in observed production conditions.
- Appropriate caching of low-churn reference data reduces database read volume by over 90 percent. An in-memory cache with a five-minute TTL achieved a 96 percent hit rate for configuration data that changes at most weekly.
- Connection setup overhead accounts for one-third of HTTP request latency when pooling is absent. Enabling connection pooling is typically a configuration change, not a code change, and should be the first optimization applied to any HTTP-heavy workload.
- Derive macros eliminate an entire class of performance bugs by preventing missing GSI annotations that cause accidental full table scans undetectable in development environments.
- Profiling consistently reveals that the highest-impact bottleneck differs from initial intuition. Measurement is a prerequisite for effective optimization, not a best practice.
Multi-tenant platforms are particularly susceptible to performance degradation at scale because development and testing typically occur with data volumes that are orders of magnitude smaller than production. A query that retrieves 50 records from a 10-record development dataset is indistinguishable from the same query retrieving 50 records from a 10,000-record production dataset—until the production query must load and filter 10,000 records to find the 50.
A production platform built on DynamoDB with event sourcing exhibited the following symptoms following growth in tenant data volume:
- Session queries: 500ms average against a target of 50ms
- Contact lookups: Full table scans on a 100,000-record table at 800ms to 2,000ms
- Webhook execution: 200ms per call, of which 50ms was connection establishment overhead
- Memory usage: 50MB per simple query, caused by full dataset loading
The following sections document each optimization pattern applied, the conditions under which it applies, and the measured outcomes.
2. Pattern 1: Moving Filter Logic to the Database Layer Eliminates the Primary Source of Unnecessary Memory Allocation and Read Amplification
2.1 In-Memory Filtering of Full Partition Loads Produced 10x Latency Overhead and 50MB Memory Allocation Per Request at Production Data Volumes
Session retrieval for a specific tenant context was implemented as a full tenant load followed by in-memory filtering:
// Load ALL sessions for the tenant, then filter in memory
pub async fn get_sessions_for_capsule(
&self,
tenant_id: &str,
capsule_id: &str,
) -> Result<Vec<Session>> {
// Query DynamoDB for ALL tenant sessions
let all_sessions = self.repository
.query_by_tenant(tenant_id)
.await?;
// Filter in memory to find capsule sessions
let filtered: Vec<Session> = all_sessions
.into_iter()
.filter(|s| s.capsule_id == capsule_id)
.collect();
Ok(filtered)
}
At development data volumes (10 sessions per tenant), this pattern performed adequately. At production volumes (1,000 sessions per tenant), the query required loading 1,000 records and allocating 50MB of memory to return 50 records.
2.2 Pushing the Filter Predicate Into the DynamoDB Query Expression Reduces Data Transfer Without Requiring Index Changes
Push the filtering predicate to the DynamoDB query layer:
// Filter at the database level using DynamoDB expressions
pub async fn get_sessions_for_capsule(
&self,
tenant_id: &str,
capsule_id: &str,
) -> Result<Vec<Session>> {
// Query with filter expression - DynamoDB does the filtering
let sessions = self.repository
.query_by_tenant(tenant_id)
.filter_expression("capsule_id = :capsule_id")
.expression_values(hashmap! {
":capsule_id" => AttributeValue::S(capsule_id.to_string())
})
.await?;
Ok(sessions)
}
2.3 Query Scoping Delivered 10x Latency Reduction, 90 Percent Memory Savings, and 20x Fewer DynamoDB Read Units
| Metric | Before | After | Improvement |
|---|
| Query latency | 500ms | 50ms | 10x |
| Memory per request | 50MB | 5MB | 90% reduction |
| DynamoDB read units | 1,000 items | 50 items | 20x fewer |
DynamoDB filter expressions reduce data transfer between database and application but still consume read capacity units for all scanned items. For workloads requiring true O(1) lookups regardless of partition size, a Global Secondary Index is required (see Pattern 2). Filter expressions are appropriate when scan-to-result ratios are manageable and the overhead of an additional GSI is not justified.
2.4 Filter Expressions Are Appropriate When Scan-to-Result Ratios Are Manageable — True O(1) Lookups Require a Global Secondary Index
This pattern yields significant returns when:
- The filter attribute has high cardinality relative to the partition being scanned
- The query pattern is executed with sufficient frequency to amortize optimization effort
- The unfiltered result set contains more than 100 records
3. Pattern 2: Global Secondary Indices Convert O(n) Full Table Scans Into O(1) Partition Lookups, Producing 53x Latency Improvements on High-Frequency Query Paths
Contact retrieval by account identifier required a full table scan:
// Find contact by account_id - requires full table scan
pub async fn get_contact_by_account(
&self,
account_id: &str,
) -> Result<Option<Contact>> {
let all_contacts = self.repository
.scan() // ⚠️ FULL TABLE SCAN
.await?;
all_contacts
.into_iter()
.find(|c| c.account_id == account_id)
.ok_or(Error::NotFound)
}
With 100,000 contacts in the table, every lookup required scanning the entire table. Best-case latency was 800ms; worst-case exceeded 2,000ms. Query complexity was O(n), scaling linearly with table growth.
3.2 Adding GSIs for the Three Highest-Frequency Query Patterns Replaced Full Table Scans With Direct Partition Reads
Add Global Secondary Indices for high-frequency query patterns:
#[derive(DynamoDbEntity)]
#[dynamodb(table = "contacts")]
pub struct ContactEntity {
#[dynamodb(partition_key)]
pub id: String,
#[dynamodb(sort_key)]
pub tenant_id: String,
// GSI5: PrimaryContactIndex for account lookups
#[dynamodb(gsi5_partition_key)]
pub account_id: String,
// GSI6: ExecutiveIndex for role-based queries
#[dynamodb(gsi6_partition_key)]
pub role: String,
#[dynamodb(gsi6_sort_key)]
pub department: String,
}
// Generated query method (from derive macro)
pub async fn query_by_account(
&self,
account_id: &str,
) -> Result<Vec<Contact>> {
self.repository
.query_gsi5(account_id) // Direct GSI lookup
.await
}
| Metric | Before | After | Improvement |
|---|
| Query latency | 800ms | 15ms | 53x |
| Query complexity | O(n) | O(1) | Constant time |
| Read cost | Full table scan | Single partition read | Proportional to result set |
3.4 Three Indices Cover Account Lookups, Role-Based Queries, and Temporal Status Filtering — the Three Access Patterns With the Highest Observed Query Frequency
GSI indices were created for the three highest-frequency query patterns:
| Query Pattern | Index | Primary Use Case |
|---|
| Account lookups | GSI5 (account_id) | Contact retrieval by account |
| Role queries | GSI6 (role + department) | Executive and role-based dashboards |
| Status filters | GSI7 (status + created_at) | Active contact lists with temporal ordering |
3.5 GSI Provisioning Is Justified When the Query Attribute Is Outside the Primary Key, Frequency Exceeds 100 Executions Per Day, and the Table Exceeds 10,000 Items
Create a GSI when:
- The query attribute is not part of the primary key
- The query pattern executes more than 100 times per day
- The table contains more than 10,000 items
Each GSI consumes additional write capacity and storage. GSI count per table should be bounded to 3–4 in most implementations to balance query performance against cost and write amplification. Evaluate each proposed GSI against actual query frequency data before provisioning.
4. Pattern 3: In-Memory Caching With TTL Achieves 96 Percent Hit Rates and 1000x Latency Improvement for Low-Churn Configuration Data
4.1 Webhook Configuration Was Fetched From DynamoDB on Every Invocation Despite Changing at Most Weekly, Producing 10,000 Unnecessary Reads Per Day
Webhook execution required loading hook configuration from DynamoDB on every invocation:
// Every webhook call hits DynamoDB
pub async fn execute_webhook(
&self,
hook_id: &str,
payload: &str,
) -> Result<()> {
// Load hook config from DynamoDB - 100ms
let hook = self.repository
.get_hook(hook_id)
.await?;
// Execute webhook - 150ms
self.http_client
.post(&hook.url)
.body(payload)
.send()
.await?;
Ok(())
}
Hook configuration changes at most weekly, yet was being retrieved on every execution. At 10,000 webhook calls per day, this produced 10,000 unnecessary DynamoDB reads representing 100ms of cumulative latency per call.
4.2 A Five-Minute TTL Cache Reduces Database Reads to 288 Per Day While Bounding Maximum Configuration Staleness to an Operationally Acceptable Window
In-memory cache with a five-minute TTL:
use std::sync::Arc;
use tokio::sync::RwLock;
use std::time::{Duration, Instant};
pub struct CachedHookRepository {
repository: Arc<DynamoDbRepository>,
cache: Arc<RwLock<HashMap<String, CachedHook>>>,
ttl: Duration,
}
struct CachedHook {
hook: Hook,
expires_at: Instant,
}
impl CachedHookRepository {
pub fn new(repository: DynamoDbRepository) -> Self {
Self {
repository: Arc::new(repository),
cache: Arc::new(RwLock::new(HashMap::new())),
ttl: Duration::from_secs(300), // 5 minutes
}
}
pub async fn get_hook(&self, hook_id: &str) -> Result<Hook> {
// Check cache first
{
let cache = self.cache.read().await;
if let Some(cached) = cache.get(hook_id) {
if cached.expires_at > Instant::now() {
return Ok(cached.hook.clone()); // Cache hit: 0.1ms
}
}
}
// Cache miss or expired - fetch from DynamoDB
let hook = self.repository.get_hook(hook_id).await?;
// Update cache
{
let mut cache = self.cache.write().await;
cache.insert(hook_id.to_string(), CachedHook {
hook: hook.clone(),
expires_at: Instant::now() + self.ttl,
});
}
Ok(hook)
}
}
4.3 Caching Reduced Cache-Hit Latency From 100ms to 0.1ms and Daily DynamoDB Read Volume From 10,000 to 288 at a 96 Percent Hit Rate
| Metric | Before | After | Improvement |
|---|
| Cache hit latency | 100ms | 0.1ms | 1000x |
| Daily DynamoDB reads | 10,000 | 288 | 97% reduction |
| Cache hit rate | N/A | 96% | — |
4.4 A Five-Minute TTL Optimizes the Hit-Rate-to-Staleness Trade-off — In-Memory Cache Is Appropriate When a Single Instance Handles the Workload and Cross-Instance Consistency Is Not Required
A five-minute TTL balances the 83 percent hit rate of a one-minute window against the staleness risk of a 15-minute window. An in-memory cache is appropriate here because the application runs as a single instance, the dataset is smaller than 10MB, and cross-instance consistency is not required. Distributed cache infrastructure would add operational overhead without benefit for this workload.
4.5 In-Memory Caching Is Appropriate When Read-to-Write Ratio Exceeds 10:1, Staleness of Several Minutes Is Acceptable, and the Dataset Fits Within 100MB
In-memory caching is appropriate when:
- The read-to-write ratio exceeds 10:1
- Staleness of up to several minutes is acceptable
- The cached dataset fits within 100MB of application memory
- The application runs as a single instance or consistency across instances is not required
Do not apply in-memory caching to user session data, financial transaction records, or real-time analytics. These data classes require either real-time consistency guarantees or cross-instance synchronization that in-memory caching cannot provide.
5. Pattern 4: Connection Pooling Eliminates TLS Handshake and DNS Lookup Overhead That Would Otherwise Account for One-Third of Per-Request Latency
5.1 Per-Request HTTP Client Instantiation Incurred 50ms of Connection Overhead on Every Webhook Call — One-Third of Total Request Latency
Webhook execution established a new HTTP client on every call, incurring 30–50ms TLS handshake and 10–20ms DNS lookup overhead on each request. Combined with a 100ms HTTP request, connection setup represented one-third of total per-call latency.
5.2 A Shared HTTP Client With Connection Pool Reuses Established Connections, Reducing Per-Call Overhead to Under 1ms on Pool Hits
Shared HTTP client with connection pool:
use reqwest::Client;
use std::time::Duration;
pub struct WebhookExecutor {
// Shared HTTP client with connection pool
client: Client,
}
impl WebhookExecutor {
pub fn new() -> Self {
let client = Client::builder()
.pool_max_idle_per_host(32) // Keep 32 idle connections per host
.pool_idle_timeout(Duration::from_secs(90))
.timeout(Duration::from_secs(30))
.build()
.expect("Failed to build HTTP client");
Self { client }
}
pub async fn execute_webhook(
&self,
url: &str,
payload: &str,
) -> Result<()> {
// Reuses connection from pool - no TLS handshake
let response = self.client
.post(url)
.body(payload)
.send() // Only ~100ms (50ms saved)
.await?;
Ok(())
}
}
5.3 Connection Pooling Reduced Connection Overhead by 50x at a 92 Percent Pool Hit Rate, Delivering 33 Percent Total Per-Call Latency Reduction
| Metric | Before | After | Improvement |
|---|
| Per-call latency | 150ms | 100ms | 33% |
| Connection overhead | 50ms | Less than 1ms (on pool hit) | 50x |
| Pool hit rate | N/A | 92% | — |
5.4 Connection Pooling Applies to All Services Making Frequent HTTP Requests and Is Typically a Configuration Change, Not a Code Change
Connection pooling applies to all services that make frequent HTTP requests, all database connections, and any external API calls with non-trivial TLS overhead. Most HTTP client libraries, including reqwest in Rust, include built-in connection pooling. Enabling it requires configuration, not code.
Connection pooling is one of the highest-return, lowest-effort optimizations available for HTTP-heavy workloads. If the application creates HTTP clients per request, this optimization should be the first change applied before any other latency reduction work.
6. Pattern 5: Derive Macro Code Generation Eliminates Missing GSI Annotations — an Accidental Full Table Scan Defect Class Undetectable in Development
6.1 Manual Boilerplate Across Seven Entities Produced 1,050 Lines of Error-Prone Code, Copy-Paste Defects, and Missing GSI Annotations That Caused Accidental Table Scans
Every DynamoDB entity required 100–200 lines of manual boilerplate for attribute mapping and query methods. Manual implementation across seven entities produced 1,050 lines of boilerplate with copy-paste errors, missing GSI annotations (causing accidental full table scans), and no compile-time validation of field names.
6.2 A DynamoDB Derive Macro Generates Attribute Mapping, Query Methods, and Compile-Time Validation From 15-Line Annotated Struct Definitions
Derive macro generating entity boilerplate at compile time:
// Macro-driven implementation - 15 lines total
#[derive(DynamoDbEntity)]
#[dynamodb(table = "contacts")]
pub struct ContactEntity {
#[dynamodb(partition_key)]
pub id: String,
#[dynamodb(sort_key)]
pub tenant_id: String,
#[dynamodb(gsi5_partition_key)]
pub account_id: String,
#[dynamodb(gsi6_partition_key)]
pub role: String,
}
// Macro generates:
// - from_item() / to_item() methods
// - query_gsi5() / query_gsi6() methods
// - Type-safe field accessors
// - Compile-time validation
6.3 Code Generation Reduced Boilerplate by 90 Percent per Entity and Moved Missing GSI Detection From Runtime Failures to Compile-Time Errors
| Metric | Before | After | Improvement |
|---|
| Lines per entity | 150 | 15 | 90% reduction |
| Total boilerplate (7 entities) | 1,050 lines | 105 lines | 945 lines eliminated |
| Time to add new entity | 30 minutes | 3 minutes | 10x |
| Copy-paste errors | Present | Eliminated | — |
| Missing GSI annotations detected | At runtime | At compile time | — |
The runtime performance of generated code is identical to equivalent hand-written code. The primary performance benefit is the elimination of a class of bugs—missing GSI annotations—that would otherwise produce accidental full table scans undetectable in development environments.
6.4 Code Generation Through Macros Is Appropriate When Structural Patterns Repeat Across Multiple Entities and Compile-Time Validation Can Prevent Runtime Errors
Code generation through macros or equivalent tooling is appropriate when:
- The same structural pattern repeats across multiple entities
- Manual implementation introduces error-prone boilerplate
- Compile-time validation can prevent runtime errors
7. Profiling Revealed That the Highest-Impact Bottleneck Differed From Initial Intuition — Measurement Is a Prerequisite for Effective Optimization, Not a Recommended Practice
The initial assumption in this engagement was that contact queries were the primary bottleneck, as they were the most visible in application logs. Profiling revealed that session queries were 10 times more frequent and constituted a larger share of total latency. Measurement is a prerequisite for effective optimization, not a recommended practice.
Recommended instrumentation: application-level latency at P50, P95, and P99 by operation; database slow query logs; memory allocation profiling per request; distributed tracing for cross-service request flows.
7.2 Optimization Impact Is the Product of Frequency, Latency, and Business Criticality — Not Latency Alone
Not all slow queries merit equal optimization effort.
Impact = Frequency × Latency × Business Criticality
| Query | Frequency | Latency | Business Criticality | Priority |
|---|
| Session by context | 10,000/day | 500ms | High (API path) | 1 |
| Contact by account | 5,000/day | 800ms | High (dashboard) | 2 |
| Hook configuration load | 10,000/day | 100ms | Medium (background) | 3 |
| Administrative reports | 10/day | 2,000ms | Low (internal) | 4 |
Each significant optimization decision should be captured in an Architecture Decision Record documenting the problem context, the decision made, positive and negative consequences, measured before-and-after metrics, and references to related decisions. ADRs preserve institutional knowledge and prevent regression when engineers unfamiliar with the original decision later modify the system. For a detailed ADR template, see ADRs as Architecture Documentation.
8. Five Patterns Delivered Latency Improvements From 33 Percent to 1000x, $420 Monthly Cost Savings, and 27 Developer Hours Saved per Month From Fewer Than 500 Lines of Code Changes
| Optimization | Metric | Before | After | Improvement |
|---|
| Query Scoping | Latency | 500ms | 50ms | 10x |
| Memory | 50MB | 5MB | 10x |
| DynamoDB reads | 1,000 items | 50 items | 20x |
| Contact GSI | Latency | 800ms | 15ms | 53x |
| Complexity | O(n) | O(1) | — |
| Hook Caching | Cache hit latency | 100ms | 0.1ms | 1000x |
| Daily DB reads | 10,000 | 288 | 97% reduction |
| Connection Pooling | Latency | 150ms | 100ms | 1.5x |
| Connection overhead | 50ms | Less than 1ms | 50x |
| Code Generation | Lines of code | 1,050 | 105 | 95% reduction |
| Dev time per entity | 30 min | 3 min | 10x |
Total savings: $420/month in reduced DynamoDB read capacity; 27 developer hours/month saved; fewer than 500 lines of code changed across all five patterns.
9. Recommendations
-
Establish performance budgets per operation type before production launch. Define P95 latency targets for API endpoints, background jobs, and database queries; alert before user impact.
-
Instrument every DynamoDB query with read unit consumption metrics. Full table scans introduced by code changes are undetectable in development. Production instrumentation is the only reliable mechanism.
-
Require connection pool configuration in all HTTP client initialization. Treat single-use client instantiation as a blocking defect in code review.
-
Limit GSI proliferation through a formal review process. Each GSI represents write amplification cost. Require benchmark evidence of query frequency before approving additions.
-
Document significant performance decisions in Architecture Decision Records. Cache TTL values, GSI choices, and filter strategies are not self-evident from code. ADRs prevent regression and preserve institutional knowledge.
-
Measure P95 and P99 latency, not averages. Average metrics mask tail latency. Optimization targets must be defined against percentile thresholds.
The fastest query is the one that is never executed. Before applying caching or indexing to a query, evaluate whether the query can be eliminated entirely. In one case, 60 percent of hook configuration queries were eliminated by embedding configuration in the triggering event payload, removing the lookup requirement entirely.
10. Resources and Further Reading
11. Conclusion and Forward Outlook
The five patterns documented here address distinct performance failure modes that are predictable, measurable, and correctable. Their value lies not in novelty but in systematic application: identifying the correct bottleneck through measurement, selecting the appropriate pattern, implementing it with verification, and documenting the rationale for future engineers.
As multi-tenant data volumes continue to grow, the cost of deferred optimization increases. Organizations that instrument performance from initial deployment and establish optimization standards before performance incidents occur will maintain system reliability and cost efficiency at scale. The patterns described here—particularly database-level filtering and connection pooling—require minimal engineering investment relative to their impact and should be treated as baseline standards rather than optional enhancements.
Disclaimer: This content represents my personal learning journey using AI for a personal project. It does not represent my employer’s views, technologies, or approaches.All code examples are generic patterns or pseudocode for educational purposes. Performance numbers are from real implementations but have been sanitized and rounded for clarity.