Event Catalog: Making Distributed Event Topology a Queryable First-Class Artifact

Event catalog architecture: producers register at startup, a query API surfaces topology, a browser UI exposes event detail, consumer graph, and schema versions

Executive Summary

Event-driven architectures distribute causal responsibility across services in a way that no single point of observation can naturally recover. As the number of EventBridge rules, Lambda consumers, and DynamoDB Streams triggers grows beyond a handful, answering a routine operational question — “what happens when a booking is cancelled?” — requires reading source code across multiple repositories with no guarantee that the code reflects the deployed configuration. This is not a documentation deficit; documentation is a lagging indicator that decays within weeks of any unannounced event addition. It is an architectural deficit: the event contract is implicit in the producer code and the routing configuration, never declared as a first-class artifact. This paper presents a four-milestone architecture for an event catalog in which producers register event definitions at service startup, routing topology is captured alongside logical event definitions, and a query API makes the full graph available to both human engineers and automated development agents.

Key Findings

Event topology is implicit in producer code by default; making it explicit requires a dedicated registration step at the architectural level, not documentation at the process level. Process-level documentation disciplines — wikis, OpenAPI files, architecture diagrams — are systematically bypassed under delivery pressure and go stale within the first unannounced event addition.
Startup-time registration produces a catalog that reflects deployed reality, making the catalog a deployment health signal rather than a documentation artifact. If an event is registered, the service that registers it is running. If the service is not running, the event is absent from the catalog.
Registering EventBridge rule mappings alongside logical event definitions gives the catalog physical routing visibility, enabling routing failure diagnosis without AWS console access. A catalog entry for a given event key includes the rule name, target ARN, and dead-letter queue ARN — the complete routing chain in a single queryable record.
Schema versioning in the catalog surfaces consumer/producer drift as a discoverable artifact rather than a silent compatibility failure. A consumer registered against schema version 1 when the producer now emits version 2 is a machine-detectable condition; without the catalog, that drift is invisible until a deserialization failure occurs in production.
A machine-queryable event catalog is qualitatively different infrastructure for AI development agents than source code or documentation. Source code requires interpretation; documentation requires trust. A catalog that reflects deployed reality requires neither — it is executable evidence of system topology.

1. Event Topology Visibility Is a Structural Problem: Documentation Cannot Reflect a Moving Target

The visibility problem in event-driven systems is architectural. A traditional synchronous service exposes its contract at its boundary: an HTTP endpoint has a path, a method, a request schema, and a response schema. That contract is directly inspectable at the boundary. An event producer has no such boundary. It emits to a bus, a stream, or a topic. The schema is in the payload. The consumer is somewhere else. The routing is in the infrastructure configuration. The result is that answering “what is the full downstream effect of event X?” requires three separate lookups: the producer codebase (what does the event payload contain?), the routing configuration (what rules forward this event type, and to what targets?), and each consumer codebase (what does each target do when it receives this event?). In a system with dozens of rules and consumers, this lookup is measured in hours, not minutes. In a system where the same event type is consumed by five services across four repositories, the lookup is unreliable — the developer has no guarantee they have found all consumers. This structural opacity has compounding effects. Developers making changes to event schemas do not know which consumers will break. Operators debugging routing failures must navigate the AWS console rather than a purpose-built interface. New engineers joining the team cannot form a correct mental model of system behavior from any single artifact. And automated development tools — agents writing or reviewing code — have no reliable source of truth for system topology beyond the source code they happen to have in context. The catalog architecture presented in this paper addresses the structural problem directly: rather than adding a documentation process on top of the existing implicit topology, it adds a registration step to the deployment lifecycle that makes topology explicit at the point of production.

2. Static Catalogs Go Stale Within Weeks Because They Depend on Continuous Developer Discipline to Stay Current

Static event catalogs are the standard first response to the visibility problem. An engineer notices that no one knows what events exist, writes a YAML file or a wiki page listing the known event types, and publishes it. The catalog is accurate on the day it is written. The problem is structural. A static catalog is accurate only if every subsequent change to the event topology — new event types, schema changes, new consumers, retired consumers — is reflected in the catalog update as a mandatory step in the development workflow. Under any delivery pressure, that step is skipped. Within weeks of the first unannounced event addition, the catalog is partially wrong. Within months, it is unreliable enough that developers stop consulting it. The failure mode is not negligence; it is incentive misalignment. The catalog update provides no immediate benefit to the engineer making the change — it is pure overhead. The benefit accrues to the next engineer who needs to understand the topology, who is not present in the room. No social contract or process discipline reliably overcomes this incentive structure at scale.

Property	Static Catalog (YAML / Wiki)	Startup-Time Registration
Initial accuracy	High	High
Accuracy after 1 month	Degraded (depends on discipline)	Reflects deployed reality
Accuracy after 6 months	Low (systematically stale)	Reflects deployed reality
Requires developer discipline	Yes — every change must update catalog	No — registration is part of the service
Detects unregistered services	No	Yes — absence is signal
Machine-queryable	Only if tooling is built on top	Native (registration API)
Deployment health signal	No	Yes
Routing topology	Not typically included	Included (M2)

The critical asymmetry is in the last row before “Routing topology.” A static catalog cannot detect that a service is not running. A startup-time registration catalog can: an event that is absent from the catalog is either a service that is down or a service that has not yet been onboarded to registration. Both are actionable signals. A static catalog treats absence as “unknown.” The registration catalog treats absence as evidence.

The registration model does not eliminate the need for human-readable documentation entirely. Architecture decision records, sequence diagrams, and onboarding guides serve a different purpose than the catalog — they explain intent and rationale, not deployed state. The catalog replaces the class of documentation that tries to track deployed state, which is the class that goes stale.

3. Startup-Time Registration Ties Catalog State to Deployed Reality Rather Than Developer Documentation Effort

The fundamental design decision is that the catalog is populated by services at startup, not by developers at development time. Each service, on initialization, calls the catalog registration API with the event definitions it produces. If the service is running, its events are registered. If it is not running, they are not. This single decision changes the semantic of the catalog. Rather than “events that someone documented,” the catalog contains “events that a currently-running service has declared it produces.” The catalog is not a description of intent — it is an observation of deployed state. The registration payload for a single event definition carries the complete logical contract: the event key, the event type classification (Domain, Integration, or Notification), the owning aggregate, the service that owns it, the schema version, and the JSON schema itself. The following registration payload illustrates a canonical domain event definition:

{
  "event_key": "booking.cancelled",
  "event_type": "Domain",
  "aggregate": "Booking",
  "crate": "scheduling-service",
  "version": "1",
  "schema": {
    "booking_id": "String",
    "cancelled_at": "DateTime",
    "reason": "String?"
  }
}

The Rust struct that drives this registration is defined in a shared crate, ensuring that the registration payload is derived from the same type that generates the actual event — not from a separately maintained declaration.

/// EventRegistration is submitted to the catalog API at service startup
/// for each event variant the service produces.
#[derive(Debug, Serialize, Deserialize)]
pub struct EventRegistration {
    /// Dot-separated key: aggregate.event_name (e.g., "booking.cancelled")
    pub event_key: String,

    /// Logical classification of the event
    pub event_type: EventType,

    /// The aggregate this event belongs to (e.g., "Booking", "EventType")
    pub aggregate: String,

    /// The owning service / crate name
    pub crate_name: String,

    /// Schema version string (e.g., "1", "2")
    pub version: String,

    /// JSON schema as a map of field name to type descriptor
    pub schema: HashMap<String, String>,
}

#[derive(Debug, Serialize, Deserialize)]
pub enum EventType {
    /// Represents a state change within the bounded context
    Domain,
    /// Crosses a bounded context boundary; consumed by external services
    Integration,
    /// Triggers a side effect (email, push notification, webhook)
    Notification,
}

At service startup, the registration loop iterates the service’s complete event inventory and submits each definition. The following illustrates the startup registration pattern for a service that produces three domain events:

pub async fn register_events(catalog_client: &CatalogClient) -> Result<()> {
    let registrations = vec![
        EventRegistration {
            event_key: "booking.cancelled".to_string(),
            event_type: EventType::Domain,
            aggregate: "Booking".to_string(),
            crate_name: "scheduling-service".to_string(),
            version: "1".to_string(),
            schema: booking_cancelled_schema(),
        },
        EventRegistration {
            event_key: "booking.confirmed".to_string(),
            event_type: EventType::Domain,
            aggregate: "Booking".to_string(),
            crate_name: "scheduling-service".to_string(),
            version: "1".to_string(),
            schema: booking_confirmed_schema(),
        },
        EventRegistration {
            event_key: "booking.rescheduled".to_string(),
            event_type: EventType::Domain,
            aggregate: "Booking".to_string(),
            crate_name: "scheduling-service".to_string(),
            version: "1".to_string(),
            schema: booking_rescheduled_schema(),
        },
    ];

    for registration in registrations {
        catalog_client
            .register_event(registration)
            .await
            .context("Event catalog registration failed at startup")?;
    }

    Ok(())
}

If the catalog registration API is unavailable at service startup and the service treats registration failure as fatal, a catalog service outage will cascade to a full platform outage. The recommended pattern is to treat catalog registration failure as a warning — log it, emit a metric, and continue startup. The catalog reflects best-effort deployed state; it is not a prerequisite for the service to function.

4. Four Milestones Deliver Queryable Value at Each Stage Without Requiring Full Implementation Upfront

The catalog architecture is delivered in four milestones, each adding a distinct layer of visibility. Each milestone is independently deployable and provides immediate value without requiring the subsequent milestones to be in place.

Milestone 1 — Event Definition Registration

The first milestone establishes the registration API and the event definition model. Services register their event definitions at startup. The catalog stores these and exposes a query interface. At the end of M1, the catalog can answer: “what events exist in the system, what are their schemas, and which service produces each one?” This milestone alone eliminates the most common topology question: “does event X exist, and what does its payload look like?” Before the catalog, answering this question requires finding the producer service and reading the event type definition in source code. After M1, it requires one catalog query.

Milestone 2 — Consumer and Routing Catalog

The second milestone adds consumer registration and EventBridge rule mapping. Each consumer service registers, at startup, the event keys it handles along with the handler function name and the routing mechanism (EventBridge rule ARN, DynamoDB Streams trigger ARN, or SNS subscription ARN). The catalog now holds the complete physical routing topology alongside the logical event definitions. At the end of M2, the catalog can answer: “who consumes booking.cancelled, via what routing path, and which DLQ handles failures?” This query previously required navigating the AWS console across multiple EventBridge bus views and Lambda trigger configurations. After M2, it is a single API call. The routing catalog entry for a single event key at M2 produces a response of the following form:

{
  "event_key": "booking.cancelled",
  "event_type": "Domain",
  "aggregate": "Booking",
  "producer": {
    "service": "scheduling-service",
    "version": "1",
    "registered_at": "2026-04-28T09:14:02Z"
  },
  "routing": {
    "rule_name": "booking-cancelled-to-notification-service",
    "rule_arn": "arn:aws:events:us-east-1:aws-account-123:rule/booking-cancelled-to-notification-service",
    "target_arn": "arn:aws:lambda:us-east-1:aws-account-123:function:notification-handler",
    "dlq_arn": "arn:aws:sqs:us-east-1:aws-account-123:booking-cancelled-dlq"
  },
  "consumers": [
    {
      "service": "notification-service",
      "handler": "handle_booking_cancelled",
      "registered_against_version": "1"
    },
    {
      "service": "audit-service",
      "handler": "record_booking_event",
      "registered_against_version": "1"
    }
  ]
}

The physical routing details — rule name, target ARN, DLQ ARN — are first-class fields, not annotations. This is a deliberate design decision: routing failure diagnosis begins with these values, and requiring a developer to navigate the AWS console to retrieve them adds latency to every debugging session.

Milestone 3 — Event Browser UI

The third milestone adds a browsable UI within the platform’s admin shell, accessible at /admin/events. The UI exposes three views: Event List. A searchable, filterable table of all registered events. Filters include aggregate name, event type (Domain / Integration / Notification), and producing service. The search field matches against event key, aggregate, and service name. The list reflects the catalog’s current state — events that are not registered because their service is not running are absent, which is itself visible information. Event Detail. For a selected event key, the detail view shows the full JSON schema, producer information, consumer list with handler names and registered schema versions, recent audit log entries for that event key, and the complete routing chain including the EventBridge rule and DLQ ARN. Consumer Graph. A visual topology diagram for a selected event key, showing producer nodes connected to consumer nodes via routing edges. The graph is generated from the catalog data, not from a separately maintained diagram. It reflects the same deployed reality as the catalog.

Milestone 4 — External Repository Extension

The fourth milestone extends registration to services in separate repositories, once the cross-repository provisioning infrastructure is in place. The registration API and data model are unchanged. The only addition is the authentication and discovery mechanism that allows services outside the primary monorepo to locate and call the catalog registration endpoint. M4 is the milestone at which the catalog becomes a platform-wide artifact rather than a within-repository tool. The value of the catalog scales with the number of registered services; M4 is the step that maximizes coverage.

5. Schema Versioning in the Catalog Enables CI Gates That Catch Consumer Drift Before Production

Event schemas evolve. A booking.cancelled event that carried booking_id, cancelled_at, and reason in version 1 may acquire cancelled_by_user_id and refund_eligible in version 2. Producers emit the new schema. Consumers that have not been updated to handle the new fields continue to deserialize against the old schema. In most cases, this produces a silent partial deserialization — the new fields are ignored, and the consumer functions without error until a downstream workflow depends on a field that is not present. The catalog addresses this by storing schema definitions per version key. A producer registering version 2 of booking.cancelled submits a new registration with "version": "2". Consumers register the version they handle. The catalog can then surface, as a query result, all consumers whose registered version does not match the current producer version.

Scenario	Without Catalog	With Catalog (Schema Versioning)
Producer upgrades schema to v2	Consumers fail silently or on next deploy	Catalog flags version drift immediately
New consumer added for v1 event	No validation; v2 compatibility unknown	Registration rejected or flagged if producer is on v2
Consumer decommissioned	No record; topology is stale	Deregistration removes consumer entry
v1 consumer survives v2 migration	Silent partial deserialization	Visible as “version drift” in Event Detail view

The version drift detection is not a runtime check — it is a catalog query. A CI gate or a deployment monitor can query the catalog after each deployment, retrieve all events where any consumer’s registered_against_version does not match the producer’s current version, and fail the deployment or trigger an alert. This transforms schema compatibility from an implicit social contract into a machine-enforced invariant. The following CI validation fragment illustrates a deployment gate that queries the catalog for version drift and fails if any consumer is lagging behind the current producer version:

# .gitea/workflows/deploy.yml (extract)
- name: Validate event schema compatibility
  run: |
    DRIFT=$(curl -sf "$CATALOG_API/events/drift?service=$SERVICE_NAME")
    COUNT=$(echo "$DRIFT" | jq '.version_drift | length')
    if [ "$COUNT" -gt "0" ]; then
      echo "ERROR: $COUNT consumer(s) registered against stale schema versions:"
      echo "$DRIFT" | jq '.version_drift[] | "\(.consumer_service) handles \(.event_key) at v\(.consumer_version) but producer is at v\(.producer_version)"'
      exit 1
    fi
    echo "Schema compatibility check passed."

This gate does not require the CI pipeline to understand event schemas. It delegates the schema compatibility check to the catalog — which has the full version history — and acts on the binary pass/fail result.

6. A Machine-Queryable Topology Layer Changes What Autonomous Development Agents Can Do

The catalog’s value to human engineers is straightforward: it answers topology questions faster and more reliably than source code navigation. The catalog’s value to AI development agents is qualitatively different, not merely quantitative. An AI agent that must answer “what events does this system emit and who handles them?” without a catalog has two options: read source code across all relevant repositories, or rely on documentation provided in its context window. Source code reading is comprehensive but requires the agent to interpret implementation details to infer topology — a task that scales poorly with system size and requires the agent to hold a growing volume of source context. Documentation reading is faster but requires the agent to trust that the documentation is current, which it frequently is not. A catalog that reflects deployed reality offers a third option: query the topology directly. The agent does not need to interpret source code to learn that booking.cancelled is produced by the scheduling service, consumed by the notification and audit services, routed through a specific EventBridge rule, and currently on schema version 2 with one consumer still registered against version 1. That information is available in a single API call, in a structured format, reflecting the actual deployed state at query time. This distinction — between topology that must be inferred and topology that can be queried — is the difference between an agent that can assist with changes in a known bounded context and an agent that can navigate changes across the full system. The catalog is infrastructure for the latter. As AI development tooling matures, the catalog query interface — event definitions, consumer relationships, routing topology, schema versions — becomes an input to automated change impact analysis. An agent proposing a schema change can query the catalog to identify all consumers registered against the current version, assess the scope of the required migration, and generate the consumer update plan as part of the same change. Without the catalog, that analysis requires manual source code review for every consumer.

7. Recommendations for Building a Production Event Catalog

Treat the event catalog as infrastructure in your system, not tooling. Deploy the catalog registration service before services begin emitting events to a new event bus. Retrofitting catalog registration onto a system with dozens of existing events is significantly more expensive than establishing registration at system inception.
Make catalog registration failures non-fatal in your service initialization logic. A catalog service outage must not cascade to a platform outage. Log the failure, emit a metric, and allow the service to start without registration. Implement a retry loop in the background that re-attempts registration once the catalog becomes available.
Include the physical routing topology in every consumer registration entry in your catalog — rule ARN, target ARN, DLQ ARN. Do not treat the catalog as a logical-only artifact. The routing details are the first thing needed in a routing failure diagnosis. Requiring an AWS console lookup defeats the operational value of the catalog.
Implement the CI schema drift gate in your pipeline before upgrading any event schema to a new version. The gate has no cost when all consumers are current. Its cost when a consumer is lagging is a blocked deployment — which is the correct outcome. Establish the gate before it is needed, not after the first silent compatibility failure.
Design your catalog query API as a first-class interface for automated tooling. Endpoint contracts, authentication, and response schemas should be versioned and documented. The catalog’s value as infrastructure for AI development agents depends on the query interface being stable and machine-readable, not just human-navigable through the UI.
Extend registration to all services in your system during M4 before treating the catalog as authoritative. A catalog that covers 70% of the services is not 70% reliable — it is unreliable, because the absent 30% are precisely the services whose topology is unknown. Drive registration to full coverage before decommissioning other topology documentation artifacts.

Conclusion

The event catalog architecture described in this paper resolves the structural visibility deficit in event-driven systems by treating event topology as a first-class artifact that is populated at deployment time, not maintained at development time. Startup-time registration, physical routing capture, schema versioning, and a machine-queryable API together produce a catalog that reflects deployed reality rather than documented intent. The near-term effect is operational: developers answer topology questions from the catalog rather than from source code, routing failures are diagnosed from catalog entries rather than the AWS console, and schema drift is surfaced as a CI gate rather than a production failure. The longer-term effect is architectural: as autonomous development tooling becomes more capable, a machine-queryable catalog of system topology becomes a prerequisite for agents that operate across service boundaries. Systems that lack this infrastructure will encounter an upper bound on autonomous tooling capability that is set by context window size and source code interpretability. Systems that have built the catalog will not. The four-milestone architecture is designed to deliver value at each stage without requiring full implementation before any value is realized. M1 alone — event definition registration — eliminates the most common topology question in distributed systems. M2 adds routing visibility. M3 makes the topology browsable. M4 extends coverage to the full platform. Teams operating event-driven systems at any stage of maturity will find an entry point in this progression that matches their current topology visibility needs.

All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.

Overview

Data & State

Code & Tooling

Debugging & Design

Infrastructure

Event Catalog: Making Distributed Event Topology a Queryable First-Class Artifact

Executive Summary

Key Findings

1. Event Topology Visibility Is a Structural Problem: Documentation Cannot Reflect a Moving Target

2. Static Catalogs Go Stale Within Weeks Because They Depend on Continuous Developer Discipline to Stay Current

3. Startup-Time Registration Ties Catalog State to Deployed Reality Rather Than Developer Documentation Effort

4. Four Milestones Deliver Queryable Value at Each Stage Without Requiring Full Implementation Upfront

Milestone 1 — Event Definition Registration

Milestone 2 — Consumer and Routing Catalog

Milestone 3 — Event Browser UI

Milestone 4 — External Repository Extension

5. Schema Versioning in the Catalog Enables CI Gates That Catch Consumer Drift Before Production

6. A Machine-Queryable Topology Layer Changes What Autonomous Development Agents Can Do

7. Recommendations for Building a Production Event Catalog

Conclusion

Overview

Data & State

Code & Tooling

Debugging & Design

Infrastructure

Documentation Index

​Executive Summary

​Key Findings

​1. Event Topology Visibility Is a Structural Problem: Documentation Cannot Reflect a Moving Target

​2. Static Catalogs Go Stale Within Weeks Because They Depend on Continuous Developer Discipline to Stay Current

​3. Startup-Time Registration Ties Catalog State to Deployed Reality Rather Than Developer Documentation Effort

​4. Four Milestones Deliver Queryable Value at Each Stage Without Requiring Full Implementation Upfront

​Milestone 1 — Event Definition Registration

​Milestone 2 — Consumer and Routing Catalog

​Milestone 3 — Event Browser UI

​Milestone 4 — External Repository Extension

​5. Schema Versioning in the Catalog Enables CI Gates That Catch Consumer Drift Before Production

​6. A Machine-Queryable Topology Layer Changes What Autonomous Development Agents Can Do

​7. Recommendations for Building a Production Event Catalog

​Conclusion

Executive Summary

Key Findings

1. Event Topology Visibility Is a Structural Problem: Documentation Cannot Reflect a Moving Target

2. Static Catalogs Go Stale Within Weeks Because They Depend on Continuous Developer Discipline to Stay Current

3. Startup-Time Registration Ties Catalog State to Deployed Reality Rather Than Developer Documentation Effort

4. Four Milestones Deliver Queryable Value at Each Stage Without Requiring Full Implementation Upfront

Milestone 1 — Event Definition Registration

Milestone 2 — Consumer and Routing Catalog

Milestone 3 — Event Browser UI

Milestone 4 — External Repository Extension

5. Schema Versioning in the Catalog Enables CI Gates That Catch Consumer Drift Before Production

6. A Machine-Queryable Topology Layer Changes What Autonomous Development Agents Can Do

7. Recommendations for Building a Production Event Catalog

Conclusion