Documentation Index
Fetch the complete documentation index at: https://www.aidonow.com/llms.txt
Use this file to discover all available pages before exploring further.
Executive Summary
Event-driven architectures distribute causal responsibility across services in a way that no single point of observation can naturally recover. As the number of EventBridge rules, Lambda consumers, and DynamoDB Streams triggers grows beyond a handful, answering a routine operational question — “what happens when a booking is cancelled?” — requires reading source code across multiple repositories with no guarantee that the code reflects the deployed configuration. This is not a documentation deficit; documentation is a lagging indicator that decays within weeks of any unannounced event addition. It is an architectural deficit: the event contract is implicit in the producer code and the routing configuration, never declared as a first-class artifact. This paper presents a four-milestone architecture for an event catalog in which producers register event definitions at service startup, routing topology is captured alongside logical event definitions, and a query API makes the full graph available to both human engineers and automated development agents.Key Findings
- Event topology is implicit in producer code by default; making it explicit requires a dedicated registration step at the architectural level, not documentation at the process level. Process-level documentation disciplines — wikis, OpenAPI files, architecture diagrams — are systematically bypassed under delivery pressure and go stale within the first unannounced event addition.
- Startup-time registration produces a catalog that reflects deployed reality, making the catalog a deployment health signal rather than a documentation artifact. If an event is registered, the service that registers it is running. If the service is not running, the event is absent from the catalog.
- Registering EventBridge rule mappings alongside logical event definitions gives the catalog physical routing visibility, enabling routing failure diagnosis without AWS console access. A catalog entry for a given event key includes the rule name, target ARN, and dead-letter queue ARN — the complete routing chain in a single queryable record.
- Schema versioning in the catalog surfaces consumer/producer drift as a discoverable artifact rather than a silent compatibility failure. A consumer registered against schema version 1 when the producer now emits version 2 is a machine-detectable condition; without the catalog, that drift is invisible until a deserialization failure occurs in production.
- A machine-queryable event catalog is qualitatively different infrastructure for AI development agents than source code or documentation. Source code requires interpretation; documentation requires trust. A catalog that reflects deployed reality requires neither — it is executable evidence of system topology.
1. Event Topology Visibility Is a Structural Problem: Documentation Cannot Reflect a Moving Target
The visibility problem in event-driven systems is architectural. A traditional synchronous service exposes its contract at its boundary: an HTTP endpoint has a path, a method, a request schema, and a response schema. That contract is directly inspectable at the boundary. An event producer has no such boundary. It emits to a bus, a stream, or a topic. The schema is in the payload. The consumer is somewhere else. The routing is in the infrastructure configuration. The result is that answering “what is the full downstream effect of event X?” requires three separate lookups: the producer codebase (what does the event payload contain?), the routing configuration (what rules forward this event type, and to what targets?), and each consumer codebase (what does each target do when it receives this event?). In a system with dozens of rules and consumers, this lookup is measured in hours, not minutes. In a system where the same event type is consumed by five services across four repositories, the lookup is unreliable — the developer has no guarantee they have found all consumers. This structural opacity has compounding effects. Developers making changes to event schemas do not know which consumers will break. Operators debugging routing failures must navigate the AWS console rather than a purpose-built interface. New engineers joining the team cannot form a correct mental model of system behavior from any single artifact. And automated development tools — agents writing or reviewing code — have no reliable source of truth for system topology beyond the source code they happen to have in context. The catalog architecture presented in this paper addresses the structural problem directly: rather than adding a documentation process on top of the existing implicit topology, it adds a registration step to the deployment lifecycle that makes topology explicit at the point of production.2. Static Catalogs Go Stale Within Weeks Because They Depend on Continuous Developer Discipline to Stay Current
Static event catalogs are the standard first response to the visibility problem. An engineer notices that no one knows what events exist, writes a YAML file or a wiki page listing the known event types, and publishes it. The catalog is accurate on the day it is written. The problem is structural. A static catalog is accurate only if every subsequent change to the event topology — new event types, schema changes, new consumers, retired consumers — is reflected in the catalog update as a mandatory step in the development workflow. Under any delivery pressure, that step is skipped. Within weeks of the first unannounced event addition, the catalog is partially wrong. Within months, it is unreliable enough that developers stop consulting it. The failure mode is not negligence; it is incentive misalignment. The catalog update provides no immediate benefit to the engineer making the change — it is pure overhead. The benefit accrues to the next engineer who needs to understand the topology, who is not present in the room. No social contract or process discipline reliably overcomes this incentive structure at scale.| Property | Static Catalog (YAML / Wiki) | Startup-Time Registration |
|---|---|---|
| Initial accuracy | High | High |
| Accuracy after 1 month | Degraded (depends on discipline) | Reflects deployed reality |
| Accuracy after 6 months | Low (systematically stale) | Reflects deployed reality |
| Requires developer discipline | Yes — every change must update catalog | No — registration is part of the service |
| Detects unregistered services | No | Yes — absence is signal |
| Machine-queryable | Only if tooling is built on top | Native (registration API) |
| Deployment health signal | No | Yes |
| Routing topology | Not typically included | Included (M2) |
The registration model does not eliminate the need for human-readable documentation entirely. Architecture decision records, sequence diagrams, and onboarding guides serve a different purpose than the catalog — they explain intent and rationale, not deployed state. The catalog replaces the class of documentation that tries to track deployed state, which is the class that goes stale.
3. Startup-Time Registration Ties Catalog State to Deployed Reality Rather Than Developer Documentation Effort
The fundamental design decision is that the catalog is populated by services at startup, not by developers at development time. Each service, on initialization, calls the catalog registration API with the event definitions it produces. If the service is running, its events are registered. If it is not running, they are not. This single decision changes the semantic of the catalog. Rather than “events that someone documented,” the catalog contains “events that a currently-running service has declared it produces.” The catalog is not a description of intent — it is an observation of deployed state. The registration payload for a single event definition carries the complete logical contract: the event key, the event type classification (Domain, Integration, or Notification), the owning aggregate, the service that owns it, the schema version, and the JSON schema itself. The following registration payload illustrates a canonical domain event definition:4. Four Milestones Deliver Queryable Value at Each Stage Without Requiring Full Implementation Upfront
The catalog architecture is delivered in four milestones, each adding a distinct layer of visibility. Each milestone is independently deployable and provides immediate value without requiring the subsequent milestones to be in place.Milestone 1 — Event Definition Registration
The first milestone establishes the registration API and the event definition model. Services register their event definitions at startup. The catalog stores these and exposes a query interface. At the end of M1, the catalog can answer: “what events exist in the system, what are their schemas, and which service produces each one?” This milestone alone eliminates the most common topology question: “does event X exist, and what does its payload look like?” Before the catalog, answering this question requires finding the producer service and reading the event type definition in source code. After M1, it requires one catalog query.Milestone 2 — Consumer and Routing Catalog
The second milestone adds consumer registration and EventBridge rule mapping. Each consumer service registers, at startup, the event keys it handles along with the handler function name and the routing mechanism (EventBridge rule ARN, DynamoDB Streams trigger ARN, or SNS subscription ARN). The catalog now holds the complete physical routing topology alongside the logical event definitions. At the end of M2, the catalog can answer: “who consumesbooking.cancelled, via what routing path, and which DLQ handles failures?” This query previously required navigating the AWS console across multiple EventBridge bus views and Lambda trigger configurations. After M2, it is a single API call.
The routing catalog entry for a single event key at M2 produces a response of the following form:
Milestone 3 — Event Browser UI
The third milestone adds a browsable UI within the platform’s admin shell, accessible at/admin/events. The UI exposes three views:
Event List. A searchable, filterable table of all registered events. Filters include aggregate name, event type (Domain / Integration / Notification), and producing service. The search field matches against event key, aggregate, and service name. The list reflects the catalog’s current state — events that are not registered because their service is not running are absent, which is itself visible information.
Event Detail. For a selected event key, the detail view shows the full JSON schema, producer information, consumer list with handler names and registered schema versions, recent audit log entries for that event key, and the complete routing chain including the EventBridge rule and DLQ ARN.
Consumer Graph. A visual topology diagram for a selected event key, showing producer nodes connected to consumer nodes via routing edges. The graph is generated from the catalog data, not from a separately maintained diagram. It reflects the same deployed reality as the catalog.
Milestone 4 — External Repository Extension
The fourth milestone extends registration to services in separate repositories, once the cross-repository provisioning infrastructure is in place. The registration API and data model are unchanged. The only addition is the authentication and discovery mechanism that allows services outside the primary monorepo to locate and call the catalog registration endpoint. M4 is the milestone at which the catalog becomes a platform-wide artifact rather than a within-repository tool. The value of the catalog scales with the number of registered services; M4 is the step that maximizes coverage.5. Schema Versioning in the Catalog Enables CI Gates That Catch Consumer Drift Before Production
Event schemas evolve. Abooking.cancelled event that carried booking_id, cancelled_at, and reason in version 1 may acquire cancelled_by_user_id and refund_eligible in version 2. Producers emit the new schema. Consumers that have not been updated to handle the new fields continue to deserialize against the old schema. In most cases, this produces a silent partial deserialization — the new fields are ignored, and the consumer functions without error until a downstream workflow depends on a field that is not present.
The catalog addresses this by storing schema definitions per version key. A producer registering version 2 of booking.cancelled submits a new registration with "version": "2". Consumers register the version they handle. The catalog can then surface, as a query result, all consumers whose registered version does not match the current producer version.
| Scenario | Without Catalog | With Catalog (Schema Versioning) |
|---|---|---|
| Producer upgrades schema to v2 | Consumers fail silently or on next deploy | Catalog flags version drift immediately |
| New consumer added for v1 event | No validation; v2 compatibility unknown | Registration rejected or flagged if producer is on v2 |
| Consumer decommissioned | No record; topology is stale | Deregistration removes consumer entry |
| v1 consumer survives v2 migration | Silent partial deserialization | Visible as “version drift” in Event Detail view |
registered_against_version does not match the producer’s current version, and fail the deployment or trigger an alert. This transforms schema compatibility from an implicit social contract into a machine-enforced invariant.
The following CI validation fragment illustrates a deployment gate that queries the catalog for version drift and fails if any consumer is lagging behind the current producer version:
6. A Machine-Queryable Topology Layer Changes What Autonomous Development Agents Can Do
The catalog’s value to human engineers is straightforward: it answers topology questions faster and more reliably than source code navigation. The catalog’s value to AI development agents is qualitatively different, not merely quantitative. An AI agent that must answer “what events does this system emit and who handles them?” without a catalog has two options: read source code across all relevant repositories, or rely on documentation provided in its context window. Source code reading is comprehensive but requires the agent to interpret implementation details to infer topology — a task that scales poorly with system size and requires the agent to hold a growing volume of source context. Documentation reading is faster but requires the agent to trust that the documentation is current, which it frequently is not. A catalog that reflects deployed reality offers a third option: query the topology directly. The agent does not need to interpret source code to learn thatbooking.cancelled is produced by the scheduling service, consumed by the notification and audit services, routed through a specific EventBridge rule, and currently on schema version 2 with one consumer still registered against version 1. That information is available in a single API call, in a structured format, reflecting the actual deployed state at query time.
This distinction — between topology that must be inferred and topology that can be queried — is the difference between an agent that can assist with changes in a known bounded context and an agent that can navigate changes across the full system. The catalog is infrastructure for the latter.
As AI development tooling matures, the catalog query interface — event definitions, consumer relationships, routing topology, schema versions — becomes an input to automated change impact analysis. An agent proposing a schema change can query the catalog to identify all consumers registered against the current version, assess the scope of the required migration, and generate the consumer update plan as part of the same change. Without the catalog, that analysis requires manual source code review for every consumer.
7. Recommendations for Building a Production Event Catalog
- Treat the event catalog as infrastructure in your system, not tooling. Deploy the catalog registration service before services begin emitting events to a new event bus. Retrofitting catalog registration onto a system with dozens of existing events is significantly more expensive than establishing registration at system inception.
- Make catalog registration failures non-fatal in your service initialization logic. A catalog service outage must not cascade to a platform outage. Log the failure, emit a metric, and allow the service to start without registration. Implement a retry loop in the background that re-attempts registration once the catalog becomes available.
- Include the physical routing topology in every consumer registration entry in your catalog — rule ARN, target ARN, DLQ ARN. Do not treat the catalog as a logical-only artifact. The routing details are the first thing needed in a routing failure diagnosis. Requiring an AWS console lookup defeats the operational value of the catalog.
- Implement the CI schema drift gate in your pipeline before upgrading any event schema to a new version. The gate has no cost when all consumers are current. Its cost when a consumer is lagging is a blocked deployment — which is the correct outcome. Establish the gate before it is needed, not after the first silent compatibility failure.
- Design your catalog query API as a first-class interface for automated tooling. Endpoint contracts, authentication, and response schemas should be versioned and documented. The catalog’s value as infrastructure for AI development agents depends on the query interface being stable and machine-readable, not just human-navigable through the UI.
- Extend registration to all services in your system during M4 before treating the catalog as authoritative. A catalog that covers 70% of the services is not 70% reliable — it is unreliable, because the absent 30% are precisely the services whose topology is unknown. Drive registration to full coverage before decommissioning other topology documentation artifacts.
Conclusion
The event catalog architecture described in this paper resolves the structural visibility deficit in event-driven systems by treating event topology as a first-class artifact that is populated at deployment time, not maintained at development time. Startup-time registration, physical routing capture, schema versioning, and a machine-queryable API together produce a catalog that reflects deployed reality rather than documented intent. The near-term effect is operational: developers answer topology questions from the catalog rather than from source code, routing failures are diagnosed from catalog entries rather than the AWS console, and schema drift is surfaced as a CI gate rather than a production failure. The longer-term effect is architectural: as autonomous development tooling becomes more capable, a machine-queryable catalog of system topology becomes a prerequisite for agents that operate across service boundaries. Systems that lack this infrastructure will encounter an upper bound on autonomous tooling capability that is set by context window size and source code interpretability. Systems that have built the catalog will not. The four-milestone architecture is designed to deliver value at each stage without requiring full implementation before any value is realized. M1 alone — event definition registration — eliminates the most common topology question in distributed systems. M2 adds routing visibility. M3 makes the topology browsable. M4 extends coverage to the full platform. Teams operating event-driven systems at any stage of maturity will find an entry point in this progression that matches their current topology visibility needs.All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.