Documentation Index
Fetch the complete documentation index at: https://www.aidonow.com/llms.txt
Use this file to discover all available pages before exploring further.
Executive Summary
This paper documents the design and implementation of a Kubernetes operator that transforms a singlekubectl apply into a fully provisioned, isolated SaaS tenant. Six milestones — CRD schema, Helm chart bundling, secret injection, NATS JetStream provisioning, air-gap distribution, and homelab E2E validation — were completed in a single sprint. The operator is written in Rust using the kube-rs framework and deploys on self-hosted k3s. The analysis surfaces three patterns with broad applicability: RAII-based secret cleanup via Rust’s Drop trait, annotation-driven blue/green upgrade coordination with no orchestration layer, and per-capsule JetStream stream provisioning as the messaging isolation primitive. A breaking API change in kube-core 0.88 (labels() return type) produced a production-blocking compile error and is documented as an implementation constraint for teams adopting the same dependency version.
Key Findings
- A Kubernetes operator is the correct abstraction for multi-tenant SaaS provisioning: encoding tenant lifecycle as a cluster-scoped custom resource makes provisioning declarative, auditable, and resumable after operator restart — properties that ad-hoc provisioning scripts cannot provide.
- Rust’s
Droptrait provides RAII-based secret cleanup at zero cost: writing Helm secret values to a tmpfs-backed file and wrapping the path in a struct whoseDropimpl callsfs::remove_fileguarantees secret deletion even on panic, without requiringdefersemantics or explicit cleanup code at call sites. - Blue/green operator upgrades can be coordinated entirely through CR annotations without a coordination service: a per-CR annotation stamped with the owning operator version allows two operator deployments to coexist safely, with each version processing only the CRs it owns.
- NATS JetStream subject filters provide per-tenant messaging isolation with no routing configuration: creating two streams per tenant with subject patterns
capsule.{CODE}.events.>andcapsule.{CODE}.commands.>eliminates cross-tenant message delivery at the JetStream level without application-layer filtering. - Air-gap readiness requires a digest manifest and a mirror script at distribution time, not at deployment time: teams that defer air-gap packaging to deployment discover that digest pinning is impractical after the fact; the
images.txt+mirror-images.shpattern must be authored alongside the operator image build pipeline. - kube-core 0.88 changed
ResourceExt::labels()from returningOption<&BTreeMap>to returning&BTreeMapdirectly: code using.and_then()chaining on the return value fails to compile on upgrade; the fix requires replacing theOptionchain with a direct.get()call on the map.
1. Introduction: The Provisioning Problem
A multi-tenant SaaS platform provisioning tenants through imperative scripts faces three structural problems. First, a script that fails mid-execution leaves infrastructure in a partially provisioned state with no automatic recovery path. Second, scripts are not auditable — there is no single object in the system that represents “this tenant exists and has these properties.” Third, provisioning is not idempotent; running a script twice risks duplicate resource creation. The operator pattern solves all three. A Kubernetes operator watches a custom resource (CR) and drives the cluster toward the declared state. If the operator pod is killed mid-reconciliation, it resumes on restart from the last persisted status. Every tenant has a CR that represents its existence and current state. The reconciler is written to be idempotent by construction — applying the same CR twice produces the same result. The operator described here — built in Rust using kube-rs — watchesEvaTenant custom resources and executes a provisioning saga that includes namespace creation, NATS JetStream stream provisioning, seed data injection, Helm chart deployment, and ingress wiring. Six milestones completed a sprint’s worth of work from initial CRD design to homelab E2E validation.
2. The EvaTenant CRD Schema
TheEvaTenant custom resource is the primary API for tenant provisioning. Its schema encodes the complete contract between the provisioning system and its operators.
2.1 Capsule Identity and the Six-Character Code
Each tenant is identified by acapsule_code: a six-character uppercase alphabetic string (pattern ^[A-Z]{6}$). This code is the primary identifier for all per-tenant Kubernetes resources, NATS stream names, Helm release names, and DynamoDB key prefixes.
The constraint to exactly six uppercase ASCII letters was chosen for a specific reason: it is visually unambiguous (no digits that could be confused with letters, no lowercase), compact enough to appear in log lines and DNS labels without truncation, and admits a large enough namespace (26^6 = 308,915,776 possible codes) for any realistic tenant population.
The corresponding Rust field enforces this at deserialization time:
#[serde(default)] annotation provides backward compatibility with CRs created before the field was added — a design consideration whenever a field is added to an existing CRD in production.
2.2 Spec Fields
The fullEvaTenantSpec structure contains the following fields relevant to provisioning:
| Field | Type | Description |
|---|---|---|
capsule_code | String | Six-character uppercase identifier |
tier | TenantTier | Starter, Customer, Enterprise, Platform |
lifecycle | CapsuleLifecycle | Active or Decommissioned |
operator_version | String | Semver string used for blue/green CR ownership |
seeds_version | String | Git tag in the seed data repository |
versions | HashMap<String, String> | Per-module version pins |
helm_values | Option<HashMap<String, JsonValue>> | Per-capsule Helm overrides |
wave | u8 | Staged rollout wave (1=internal, 2=pilot, 3=fleet; default: 1) |
2.3 The Five Status Conditions
The operator reports provisioning progress through five standard Kubernetes conditions, following KEP-1623 conventions:| Condition | Meaning |
|---|---|
NamespaceReady | Kubernetes namespace eva-tenant-{code} exists and is Active |
Provisioned | Tenant record created in backing data store |
SeedLoaded | Seed data applied (OR-resolved: provisioner flag OR init container exit 0) |
ModulesDeployed | All Helm module releases are healthy |
IngressReady | Ingress resources provisioned and routing traffic |
True before the operator transitions the CR to phase: Active. The additionalPrinterColumns in the CRD make these fields visible in kubectl get evatenants output without requiring a kubectl describe call.
2.4 A Minimal Provisioning Request
A complete provisioning request for a Starter-tier tenant looks like:The
operator_version field serves a dual purpose: it pins which operator binary manages this capsule, and it participates in blue/green upgrade coordination. Setting this field incorrectly — for example, to a version not currently deployed — causes the CR to be permanently ignored by all running operators. Tooling that creates CRs should always source this value from the active operator Deployment’s annotation.3. Operator Architecture
3.1 Reconciler Phase Dispatch
The reconciler is the heart of the operator. It is a Rust async function called by the kube-rsController on every watch event for an EvaTenant resource. The function dispatches on the current status.phase of the CR:
Requested phase is the initial state. On the first reconcile cycle, the operator calls the provisioning API and transitions to Provisioning, where it polls status on a 30-second requeue interval until all five conditions become True.
The Active phase requeues every 5 minutes for health drift detection. This interval is the primary mechanism for detecting and correcting any state that has diverged from the declared spec since the last reconcile.
3.2 Finalizer and the Decommission Saga
EveryEvaTenant CR has the finalizer eva.aidonow.com/tenant-protection attached on first reconcile. This prevents Kubernetes from deleting the CR until the operator explicitly removes the finalizer — ensuring cleanup runs before the resource disappears from the API server.
The decommission saga executes in a defined order when DeletionTimestamp is set on the CR:
Ok(()). Individual step failures are logged but do not abort subsequent steps. This non-blocking design ensures that a NATS timeout does not prevent the Helm uninstall from proceeding.
3.3 Kubernetes Events
The operator emits typed Kubernetes Events at each major phase transition:ProvisioningStarted, NamespaceCreated, SeedLoaded, ModuleDeployStarted, ModuleDeployComplete, IngressProvisioned, ProvisioningComplete, DeprovisionStarted, DeprovisionComplete. These events appear in kubectl describe evatenant output and in cluster event streams, providing an audit trail of the provisioning lifecycle without requiring access to operator logs.
3.4 Per-Tier Resource Quotas
KubernetesResourceQuota and LimitRange objects are applied to every capsule namespace. The quota is sized by tier:
| Tier | CPU Requests | Memory | Max Pods |
|---|---|---|---|
| Starter | 2 cores | 4Gi | 20 |
| Customer | 4 cores | 8Gi | 40 |
| Enterprise | unrestricted | unrestricted | unrestricted |
| Platform | unrestricted | unrestricted | unrestricted |
LimitRange with container defaults (500m CPU, 512Mi memory) applies to all tiers. This prevents unbounded resource consumption from containers that do not specify their own resource requests — a class of misconfiguration that is common during early development.
4. Milestone 2: Bundling Helm Charts Into the Operator Image Enables Disconnected Deployment
The six module Helm charts (eva-forge, eva-gateway, eva-shell, eva-whisper, orbit-crm, orbit-itsm) are bundled directly into the operator Docker image. This is a deliberate distribution decision: the operator image is self-contained and does not require network access to a Helm registry at deployment time.
The chart bundling is implemented in the Dockerfile with a single COPY directive:
CHART_BASE_PATH environment variable is read at operator startup and stored in the Context struct that is threaded through every reconcile call:
charts/ checkout without rebuilding the image. In the operator container, it resolves to /charts, making each module’s chart path /charts/modules/{module_name}/.
Bundling charts into the operator image couples the chart release cycle to the operator release cycle. For teams where chart releases are more frequent than operator releases, an alternative is to use a Helm repository URL stored in the CR spec — but this introduces a network dependency at deploy time, which is incompatible with air-gap installations. The bundled approach was chosen specifically to support air-gap deployment as a first-class requirement.
5. Milestone 3: RAII Secret Files on tmpfs Guarantee Secrets Never Persist to Disk
5.1 The Security Requirement
Helm chart installations require credentials — database passwords, API keys, connection strings — to be passed as values. The two mechanisms are--set key=value and --values file.yaml. The --set approach embeds the secret value as a command-line argument, which appears in:
helm get valuesoutput (stored in a Kubernetes Secret by Helm itself)- Process argument lists readable from
/proc/{pid}/cmdlineon Linux - CI runner log output
--values approach passes secrets through a file. If that file is on the container’s overlay filesystem, it persists until the container is terminated. The correct implementation writes the values file to a tmpfs-backed directory, ensuring the content lives only in memory and is never written to persistent storage.
5.2 The TmpfsSecretFile Pattern
TheTmpfsSecretFile struct implements this guarantee using Rust’s Drop trait:
TMPFS_SECRET_DIR = "/run/helm-secrets" must be backed by a tmpfs volume in the operator pod spec:
medium: Memory directive instructs Kubernetes to back the emptyDir volume with a RAM-backed tmpfs mount. Content written to this directory is never flushed to disk, and the entire volume is discarded when the pod is evicted or terminated.
5.3 Why Drop Semantics Matter Here
TheDrop impl runs when the TmpfsSecretFile value goes out of scope — including on panic and on early return from the function. In Rust, this is guaranteed by the compiler; there is no defer keyword or exception handler required. A function that creates a TmpfsSecretFile and then calls helm_upgrade_install cannot “forget” to clean up the file, because the cleanup is encoded in the type system.
The timestamp suffix ({capsule_code}-{timestamp_ns}.yaml) handles the case where two reconcile loops run concurrently for different capsules — a realistic scenario when multiple tenants are provisioning simultaneously. Each loop creates its own file, and each file is deleted independently when its scope ends.
5.4 The Queue Values Render Function
The operator renders per-capsule NATS connection URLs into the Helm values format before writing to the tmpfs file:6. Annotation-Driven Blue/Green Upgrades Let Two Operator Versions Coexist Without a Coordination Service
6.1 The Upgrade Problem
Upgrading a Kubernetes operator in a multi-tenant environment presents a risk that single-tenant operators do not face: if the new operator version immediately takes ownership of all CRs the moment it starts, it may begin reconciling tenants that the old operator was actively managing. This can produce duplicate Helm releases, conflicting status writes, and mid-reconciliation state corruption. A naive approach — simply applying the new operator Deployment and waiting for the rolling update — provides no protection against this race. The new operator pod may receive watch events for all CRs immediately on startup, before the old pod has finished its final reconcile cycles.6.2 Annotation-Based Ownership
The operator solves this through a CR-level ownership annotation. Each operator reads its own version from theOPERATOR_VERSION environment variable (injected via the Deployment annotation eva.aidonow.com/operator-version). Before processing any CR, the reconciler checks the corresponding annotation on the EvaTenant object:
- Deploy the new operator version (v2) alongside the old (v1)
- v2 claims CRs with absent annotations, v1 continues managing annotated CRs
- CRs naturally migrate to v2 as they complete their current reconcile cycle and v1 removes its annotation (or the CR is recreated)
- Once v2 owns all CRs, v1 can be terminated
6.3 The Wave Label for Staged Rollout
Separate from version ownership, thewave field on the EvaTenantSpec enables staged rollout. Wave 1 is internal or test capsules; wave 2 is a pilot cohort; wave 3 is the full fleet. The operator stamps this value as a label on the CR on every reconcile cycle:
7. Milestone 4: Per-Capsule JetStream Streams Isolate Tenant Messages at the Server Level
7.1 Architecture: Per-Capsule Stream Isolation
Each provisioned capsule receives two dedicated NATS JetStream streams:| Stream name | Subject filter | Purpose |
|---|---|---|
eva-{CODE}-events | capsule.{CODE}.events.> | Domain events fan-out |
eva-{CODE}-commands | capsule.{CODE}.commands.> | Intra-capsule commands |
capsule.DEVUSR.events.> cannot receive messages published to capsule.OTHRT.events.> because JetStream’s subject-based routing enforces the filter at the server level. There is no application-layer filter required.
The wildcard token > in NATS subject patterns matches all remaining tokens in the subject hierarchy, including tokens with dots. capsule.DEVUSR.events.> matches capsule.DEVUSR.events.order.created, capsule.DEVUSR.events.user.profile.updated, and all other subjects with that prefix.
7.2 The MessagingProvider Trait
The messaging implementation is abstracted behind a trait to support both the production NATS path and a test-time mock:configure_routing validates that subject filters are correctly formed for the given capsule code. On NATS JetStream, routing is implicit once the streams are created with the correct subject filters — the method verifies this assumption without making additional API calls. The method exists in the trait to provide a hook for future multi-cloud implementations where routing configuration is a separate API call.
configure_alarms writes a Prometheus alerting rule YAML file to /etc/prometheus/alerts/eva-{CODE}-queue-depth.yaml. The rule fires when a stream’s message count exceeds 80% of its MaxMsgs limit for more than 5 minutes — a leading indicator of consumer lag before the stream reaches capacity.
7.3 Idempotency
The NATS JetStream API returns error code 10058 (“stream name already in use”) when a stream creation request names an existing stream. TheNatsProvider implementation treats this as a success:
7.4 Prometheus Metrics
The operator exposes a/metrics endpoint serving Prometheus format metrics. A capsule_queue_depth gauge tracks NATS JetStream stream message counts per capsule, updated on every reconcile cycle for Active tenants:
8. Milestone 5: Digest-Pinned Image Manifests Enable Deployment Without Internet Access
8.1 The Air-Gap Distribution Problem
Air-gap environments — networks with no outbound internet access — require all container images to be available in an internal registry before installation begins. The typical failure mode for teams that do not plan for this: images are pulled by image tag at installation time, the tag is not available in the internal registry, the installation fails with an obscure pull error, and the root cause is a distribution problem that should have been solved before the deployment attempt. The correct solution is to build the image digest manifest at release time, not at deployment time.8.2 The images.txt Manifest
Theimages.txt file is the authoritative list of all container images required to run the operator and its managed workloads:
@sha256:{digest}) rather than tag-only references ensures that the operator and its modules are version-locked at a cryptographic level. Tags are mutable — a tag pushed by a later build silently replaces the previous image. A digest reference cannot be changed without modifying the images.txt file explicitly.
The CI pipeline populates PLACEHOLDER values with real digests at release time by capturing docker inspect --format='{{index .RepoDigests 0}}' after each image push. The release pipeline blocks on any remaining PLACEHOLDER strings before creating the release artifact.
8.3 The mirror-images.sh Script
Themirror-images.sh script consumes images.txt and mirrors each image to an internal registry:
docker and podman (auto-detected or specified via --podman), performs digest verification after pull and after push, and reports a non-zero exit code if any image fails to mirror. This makes the script suitable for use in automated pre-installation validation pipelines.
8.4 The quickstart.sh Bootstrap Flow
Thequickstart.sh script provides a single-command path from a bare Kubernetes cluster to a provisioned capsule. Its five steps match the provisioning sequence:
--air-gap flag prevents the script from attempting any public registry pulls. The --wave flag stamps the wave label on the CR at creation time. The --dry-run flag runs through all steps with kubectl apply --dry-run=client, producing the full output without making any changes to the cluster.
9. The kube-core 0.88 API Break
9.1 The Breaking Change
During development, an upgrade to kube-core 0.88 (the kube-rs client crate) introduced a compile-time break in theensure_wave_label function. The function reads the current wave label from the CR’s metadata to avoid issuing unnecessary patch requests. The code before the upgrade:
Option<&BTreeMap<String, String>> to &BTreeMap<String, String> directly. Calling .and_then() on a &BTreeMap — a value, not an Option — produces a type error that the Rust compiler reports as E0599 (method not found) with E0282 (type inference failure) as a secondary diagnostic.
9.2 The Fix
The fix is a direct.get() call on the map:
.get() method on BTreeMap returns Option<&V>, so the .and_then() chain is still valid — it just starts from the Option returned by .get() rather than from the Option that labels() previously returned.
10. Milestone 6: SLA and Chaos Tests Verify Provisioning Latency and Reconciliation Idempotency Under Restart
10.1 Test Suite Design
Six E2E tests gate GA readiness. All are marked#[ignore] in the Rust test suite, which means they are excluded from normal cargo test runs and must be invoked explicitly with cargo test --test homelab_e2e_test -- --ignored. They require a live Kubernetes cluster with the operator deployed.
| Test | SLA / Constraint | What Is Verified |
|---|---|---|
| SLA — Starter tier | All 5 conditions True within 300s | Provisioning latency under load |
| SLA — Enterprise tier | All 5 conditions True within 480s | Provisioning latency for larger module set |
| Blue/green upgrade | Annotation updated, IngressReady stable | Upgrade coordination without traffic disruption |
| Decommission saga | CR and namespace deleted within 300s | Full cleanup, zero data residue |
| Chaos — kill after NamespaceReady | Full recovery after pod kill | Reconciler idempotency after restart |
| Chaos — kill after Provisioned | Full recovery after pod kill | Idempotency at a later saga checkpoint |
10.2 SLA Measurement
The SLA tests instrument per-condition timing by polling for each condition in order and recording elapsed time at each transition:/tmp/sla-results-{tier}.json containing per-condition elapsed times, the total elapsed, and a boolean sla_pass. This artifact is available for post-run analysis in CI even if the test assertion fails.
The SLA values — 300 seconds for Starter, 480 seconds for Enterprise — were chosen based on the module count difference between tiers: a Starter capsule provisions two modules (CRM, ITSM), while an Enterprise capsule provisions six. The 60% longer SLA for Enterprise (480s vs 300s) reflects the additional Helm chart deployment time.
10.3 Chaos Test Design
The chaos tests verify that the operator correctly resumes an in-progress provisioning saga after its pod is forcibly terminated. The test kills the operator pod at a known checkpoint — either afterNamespaceReady becomes True or after Provisioned becomes True — and then verifies two properties:
- The operator pod restarts and reaches
available_replicas >= 1within 60 seconds. - All five conditions reach
Truewithin the original 300-second budget.
eva-tenant-{tenant_name} must exist after recovery. This prevents a scenario where the operator creates a second namespace because it does not correctly detect the existing one after restart.
The kill mechanism is a kubectl delete pod -l app=eva-tenant-operator -n eva-operator --wait=false call, which deletes the pod and allows Kubernetes to immediately create a replacement. The --wait=false flag returns immediately, simulating an abrupt termination without waiting for graceful shutdown.
The chaos tests demonstrate that the reconciler’s idempotency guarantees hold under restart conditions. Idempotency is not free — it requires every operation in the provisioning saga to be safe to repeat. The NATS stream creation (error 10058 = idempotent success), the namespace creation (server-side apply), and the Helm upgrade (
--install flag) are each written to be safe on repeated invocation.10.4 Air-Gap Verification
The air-gap E2E test — distinct from the homelab tests — runs in a kind cluster with outbound network access blocked viaiptables DROP rules. It verifies two properties simultaneously:
- The operator starts and processes a CR in the absence of outbound connectivity.
- The iptables DROP counter does not increase during the operator’s lifecycle — no packets were blocked, meaning the operator made no outbound network calls.
tcpdump capture on port 53 provides a secondary signal: zero external DNS queries during the test run. Both signals must pass for the air-gap test to succeed.
11. Recommendations
-
Adopt the
TmpfsSecretFileRAII pattern for any Helm value that constitutes a secret in your operator: the pattern generalizes beyond NATS URLs to database passwords, API keys, and any value that should not persist on disk between reconcile cycles. The operator pod spec must include themedium: MemoryemptyDir volume; without it the pattern silently falls back to the overlay filesystem. -
Pin
operator_versionin your CR templates to the version from the active Deployment annotation: tooling that creates EvaTenant CRs should read this annotation from the live Deployment rather than hardcoding a version. A mismatch between the pinned version and any running operator version causes the CR to be permanently ignored. -
Instrument per-condition timing in every provisioning operator you build: the SLA tests demonstrate that per-condition elapsed times are more actionable than total provisioning time. When an SLA is missed, knowing whether
NamespaceReadytook 200s orModulesDeployedtook 200s drives fundamentally different remediation paths. -
Build the
images.txtdigest manifest as part of your operator image CI pipeline, not as a post-hoc step: the pipeline that pushes the image should immediately capture the pushed digest and write it toimages.txt. Treating digest capture as a manual step consistently results inPLACEHOLDERvalues in production. - Test operator restart recovery explicitly and at multiple saga checkpoints in your operator validation suite: a reconciler that is idempotent for happy-path flows may not be idempotent after interruption at a specific checkpoint. The chaos test approach — kill after each major condition transition — discovers checkpoint-specific idempotency failures that integration tests do not exercise.
-
Before upgrading to kube-core 0.88, search your codebase for
labels()followed byOptioncombinators: theResourceExt::labels()API change is not backward compatible and produces compile errors rather than runtime failures. Addressing this before upgrading is preferable to discovering it during a time-sensitive release.
12. Conclusion
The six milestones documented here represent a complete lifecycle for a production Kubernetes operator: from CRD schema design through cryptographically-verified air-gap distribution and homelab chaos testing. The patterns that emerge from this sprint — RAII-based secret cleanup, annotation-driven upgrade coordination, and per-tenant stream isolation as a first-class provisioning primitive — are not specific to the operator described here. They apply to any system that must provision isolated, stateful workloads declaratively in a Kubernetes cluster. As self-hosted Kubernetes adoption continues to expand into environments where air-gap operation and multi-tenant isolation are requirements from day one rather than afterthoughts, the engineering investment in these patterns at the operator level pays compounding dividends. An operator that handles upgrade coordination, secret hygiene, and air-gap distribution correctly is one that teams can operate with confidence — independent of the specific workloads it manages.All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.