Kubernetes Operator Design for Multi-Tenant SaaS Provisioning

Executive Summary

This paper documents the design and implementation of a Kubernetes operator that transforms a single kubectl apply into a fully provisioned, isolated SaaS tenant. Six milestones — CRD schema, Helm chart bundling, secret injection, NATS JetStream provisioning, air-gap distribution, and homelab E2E validation — were completed in a single sprint. The operator is written in Rust using the kube-rs framework and deploys on self-hosted k3s. The analysis surfaces three patterns with broad applicability: RAII-based secret cleanup via Rust’s Drop trait, annotation-driven blue/green upgrade coordination with no orchestration layer, and per-capsule JetStream stream provisioning as the messaging isolation primitive. A breaking API change in kube-core 0.88 (labels() return type) produced a production-blocking compile error and is documented as an implementation constraint for teams adopting the same dependency version.

Key Findings

A Kubernetes operator is the correct abstraction for multi-tenant SaaS provisioning: encoding tenant lifecycle as a cluster-scoped custom resource makes provisioning declarative, auditable, and resumable after operator restart — properties that ad-hoc provisioning scripts cannot provide.
Rust’s Drop trait provides RAII-based secret cleanup at zero cost: writing Helm secret values to a tmpfs-backed file and wrapping the path in a struct whose Drop impl calls fs::remove_file guarantees secret deletion even on panic, without requiring defer semantics or explicit cleanup code at call sites.
Blue/green operator upgrades can be coordinated entirely through CR annotations without a coordination service: a per-CR annotation stamped with the owning operator version allows two operator deployments to coexist safely, with each version processing only the CRs it owns.
NATS JetStream subject filters provide per-tenant messaging isolation with no routing configuration: creating two streams per tenant with subject patterns capsule.{CODE}.events.> and capsule.{CODE}.commands.> eliminates cross-tenant message delivery at the JetStream level without application-layer filtering.
Air-gap readiness requires a digest manifest and a mirror script at distribution time, not at deployment time: teams that defer air-gap packaging to deployment discover that digest pinning is impractical after the fact; the images.txt + mirror-images.sh pattern must be authored alongside the operator image build pipeline.
kube-core 0.88 changed ResourceExt::labels() from returning Option<&BTreeMap> to returning &BTreeMap directly: code using .and_then() chaining on the return value fails to compile on upgrade; the fix requires replacing the Option chain with a direct .get() call on the map.

1. Introduction: The Provisioning Problem

A multi-tenant SaaS platform provisioning tenants through imperative scripts faces three structural problems. First, a script that fails mid-execution leaves infrastructure in a partially provisioned state with no automatic recovery path. Second, scripts are not auditable — there is no single object in the system that represents “this tenant exists and has these properties.” Third, provisioning is not idempotent; running a script twice risks duplicate resource creation. The operator pattern solves all three. A Kubernetes operator watches a custom resource (CR) and drives the cluster toward the declared state. If the operator pod is killed mid-reconciliation, it resumes on restart from the last persisted status. Every tenant has a CR that represents its existence and current state. The reconciler is written to be idempotent by construction — applying the same CR twice produces the same result. The operator described here — built in Rust using kube-rs — watches EvaTenant custom resources and executes a provisioning saga that includes namespace creation, NATS JetStream stream provisioning, seed data injection, Helm chart deployment, and ingress wiring. Six milestones completed a sprint’s worth of work from initial CRD design to homelab E2E validation.

2. The EvaTenant CRD Schema

The EvaTenant custom resource is the primary API for tenant provisioning. Its schema encodes the complete contract between the provisioning system and its operators.

2.1 Capsule Identity and the Six-Character Code

Each tenant is identified by a capsule_code: a six-character uppercase alphabetic string (pattern ^[A-Z]{6}$). This code is the primary identifier for all per-tenant Kubernetes resources, NATS stream names, Helm release names, and DynamoDB key prefixes. The constraint to exactly six uppercase ASCII letters was chosen for a specific reason: it is visually unambiguous (no digits that could be confused with letters, no lowercase), compact enough to appear in log lines and DNS labels without truncation, and admits a large enough namespace (26^6 = 308,915,776 possible codes) for any realistic tenant population. The corresponding Rust field enforces this at deserialization time:

#[serde(default)]
#[schemars(regex(pattern = "^[A-Z]{6}$"))]
pub capsule_code: String,

The #[serde(default)] annotation provides backward compatibility with CRs created before the field was added — a design consideration whenever a field is added to an existing CRD in production.

2.2 Spec Fields

The full EvaTenantSpec structure contains the following fields relevant to provisioning:

Field	Type	Description
`capsule_code`	`String`	Six-character uppercase identifier
`tier`	`TenantTier`	`Starter`, `Customer`, `Enterprise`, `Platform`
`lifecycle`	`CapsuleLifecycle`	`Active` or `Decommissioned`
`operator_version`	`String`	Semver string used for blue/green CR ownership
`seeds_version`	`String`	Git tag in the seed data repository
`versions`	`HashMap<String, String>`	Per-module version pins
`helm_values`	`Option<HashMap<String, JsonValue>>`	Per-capsule Helm overrides
`wave`	`u8`	Staged rollout wave (1=internal, 2=pilot, 3=fleet; default: 1)

2.3 The Five Status Conditions

The operator reports provisioning progress through five standard Kubernetes conditions, following KEP-1623 conventions:

Condition	Meaning
`NamespaceReady`	Kubernetes namespace `eva-tenant-{code}` exists and is Active
`Provisioned`	Tenant record created in backing data store
`SeedLoaded`	Seed data applied (OR-resolved: provisioner flag OR init container exit 0)
`ModulesDeployed`	All Helm module releases are healthy
`IngressReady`	Ingress resources provisioned and routing traffic

All five conditions must be True before the operator transitions the CR to phase: Active. The additionalPrinterColumns in the CRD make these fields visible in kubectl get evatenants output without requiring a kubectl describe call.

2.4 A Minimal Provisioning Request

A complete provisioning request for a Starter-tier tenant looks like:

apiVersion: eva.aidonow.com/v1alpha1
kind: EvaTenant
metadata:
  name: devusr
  namespace: eva-tenants
spec:
  capsule_code: DEVUSR
  tier: Starter
  lifecycle: Active
  operator_version: "v0.1.0"
  seeds_version: "v1.0.0"
  versions:
    eva-crm: "v0.9.1"
    eva-itsm: "v0.7.3"

The operator_version field serves a dual purpose: it pins which operator binary manages this capsule, and it participates in blue/green upgrade coordination. Setting this field incorrectly — for example, to a version not currently deployed — causes the CR to be permanently ignored by all running operators. Tooling that creates CRs should always source this value from the active operator Deployment’s annotation.

3. Operator Architecture

3.1 Reconciler Phase Dispatch

The reconciler is the heart of the operator. It is a Rust async function called by the kube-rs Controller on every watch event for an EvaTenant resource. The function dispatches on the current status.phase of the CR:

Requested    → POST /internal/provision → phase: Provisioning, requeue 10s
Provisioning → GET /internal/provision/{id}/status → map conditions →
               when all five True → phase: Active, requeue 300s
Active       → update NATS queue depth metrics, requeue 300s
Decommissioned → await change (terminal — finalizer pending removal)
Failed       → await change (manual intervention required)

The Requested phase is the initial state. On the first reconcile cycle, the operator calls the provisioning API and transitions to Provisioning, where it polls status on a 30-second requeue interval until all five conditions become True. The Active phase requeues every 5 minutes for health drift detection. This interval is the primary mechanism for detecting and correcting any state that has diverged from the declared spec since the last reconcile.

3.2 Finalizer and the Decommission Saga

Every EvaTenant CR has the finalizer eva.aidonow.com/tenant-protection attached on first reconcile. This prevents Kubernetes from deleting the CR until the operator explicitly removes the finalizer — ensuring cleanup runs before the resource disappears from the API server. The decommission saga executes in a defined order when DeletionTimestamp is set on the CR:

Step 1: Delete NATS JetStream streams (delete_queue_pair)
Step 2: Delete Prometheus alerting rule files (delete_alarms)
Step 3: Helm releases and Kubernetes namespace (via provisioner deprovision call)
Step 4: Patch phase to Decommissioned, emit DeprovisionComplete event
Step 5: Remove finalizer (kube-rs does this automatically on Ok return)

The ordering is intentional. NATS streams are deleted first to immediately stop new message delivery, preventing consumer errors from running processes. Helm releases are removed after NATS, ensuring application pods that might reference the streams are terminated before the streams disappear. The namespace deletion is final; it cascades to all remaining Kubernetes objects in the namespace. Each step is idempotent — an already-deleted stream or file returns Ok(()). Individual step failures are logged but do not abort subsequent steps. This non-blocking design ensures that a NATS timeout does not prevent the Helm uninstall from proceeding.

Setting spec.lifecycle: Decommissioned on a live production capsule initiates an irreversible deletion sequence. There is no confirmation gate; the reconciler detects the lifecycle change and immediately begins the decommission saga. Teams operating this operator must treat lifecycle: Decommissioned as a one-way transition and enforce access controls on who can patch this field.

3.3 Kubernetes Events

The operator emits typed Kubernetes Events at each major phase transition: ProvisioningStarted, NamespaceCreated, SeedLoaded, ModuleDeployStarted, ModuleDeployComplete, IngressProvisioned, ProvisioningComplete, DeprovisionStarted, DeprovisionComplete. These events appear in kubectl describe evatenant output and in cluster event streams, providing an audit trail of the provisioning lifecycle without requiring access to operator logs.

3.4 Per-Tier Resource Quotas

Kubernetes ResourceQuota and LimitRange objects are applied to every capsule namespace. The quota is sized by tier:

Tier	CPU Requests	Memory	Max Pods
Starter	2 cores	4Gi	20
Customer	4 cores	8Gi	40
Enterprise	unrestricted	unrestricted	unrestricted
Platform	unrestricted	unrestricted	unrestricted

A LimitRange with container defaults (500m CPU, 512Mi memory) applies to all tiers. This prevents unbounded resource consumption from containers that do not specify their own resource requests — a class of misconfiguration that is common during early development.

4. Milestone 2: Bundling Helm Charts Into the Operator Image Enables Disconnected Deployment

The six module Helm charts (eva-forge, eva-gateway, eva-shell, eva-whisper, orbit-crm, orbit-itsm) are bundled directly into the operator Docker image. This is a deliberate distribution decision: the operator image is self-contained and does not require network access to a Helm registry at deployment time. The chart bundling is implemented in the Dockerfile with a single COPY directive:

# Bundle Helm charts at /charts/modules/ (CHART_BASE_PATH default: /charts)
COPY charts/modules/ /charts/modules/

ENV CHART_BASE_PATH=/charts

The CHART_BASE_PATH environment variable is read at operator startup and stored in the Context struct that is threaded through every reconcile call:

let chart_base_path = std::env::var("CHART_BASE_PATH")
    .unwrap_or_else(|_| "/charts".to_string());

let ctx = Arc::new(Context {
    kube_client: kube_client.clone(),
    provisioner,
    chart_base_path,           // → /charts/modules/{module}/
    operator_version,
    messaging: messaging_provider,
});

In development, this variable can point to a local charts/ checkout without rebuilding the image. In the operator container, it resolves to /charts, making each module’s chart path /charts/modules/{module_name}/.

Bundling charts into the operator image couples the chart release cycle to the operator release cycle. For teams where chart releases are more frequent than operator releases, an alternative is to use a Helm repository URL stored in the CR spec — but this introduces a network dependency at deploy time, which is incompatible with air-gap installations. The bundled approach was chosen specifically to support air-gap deployment as a first-class requirement.

5. Milestone 3: RAII Secret Files on tmpfs Guarantee Secrets Never Persist to Disk

5.1 The Security Requirement

Helm chart installations require credentials — database passwords, API keys, connection strings — to be passed as values. The two mechanisms are --set key=value and --values file.yaml. The --set approach embeds the secret value as a command-line argument, which appears in:

helm get values output (stored in a Kubernetes Secret by Helm itself)
Process argument lists readable from /proc/{pid}/cmdline on Linux
CI runner log output

The --values approach passes secrets through a file. If that file is on the container’s overlay filesystem, it persists until the container is terminated. The correct implementation writes the values file to a tmpfs-backed directory, ensuring the content lives only in memory and is never written to persistent storage.

5.2 The TmpfsSecretFile Pattern

The TmpfsSecretFile struct implements this guarantee using Rust’s Drop trait:

pub struct TmpfsSecretFile {
    path: std::path::PathBuf,
}

impl TmpfsSecretFile {
    pub fn create(capsule_code: &str, yaml_content: &str) -> std::io::Result<Self> {
        std::fs::create_dir_all(TMPFS_SECRET_DIR)?;

        let ts_ns = std::time::SystemTime::now()
            .duration_since(std::time::UNIX_EPOCH)
            .unwrap_or_default()
            .as_nanos();

        let filename = format!("{}-{}.yaml", capsule_code, ts_ns);
        let path = std::path::Path::new(TMPFS_SECRET_DIR).join(filename);

        std::fs::write(&path, yaml_content.as_bytes())?;
        Ok(Self { path })
    }

    pub fn path(&self) -> &std::path::Path { &self.path }
}

impl Drop for TmpfsSecretFile {
    fn drop(&mut self) {
        let _ = std::fs::remove_file(&self.path);
    }
}

The constant TMPFS_SECRET_DIR = "/run/helm-secrets" must be backed by a tmpfs volume in the operator pod spec:

volumes:
  - name: helm-secrets
    emptyDir:
      medium: Memory
volumeMounts:
  - name: helm-secrets
    mountPath: /run/helm-secrets

The medium: Memory directive instructs Kubernetes to back the emptyDir volume with a RAM-backed tmpfs mount. Content written to this directory is never flushed to disk, and the entire volume is discarded when the pod is evicted or terminated.

5.3 Why Drop Semantics Matter Here

The Drop impl runs when the TmpfsSecretFile value goes out of scope — including on panic and on early return from the function. In Rust, this is guaranteed by the compiler; there is no defer keyword or exception handler required. A function that creates a TmpfsSecretFile and then calls helm_upgrade_install cannot “forget” to clean up the file, because the cleanup is encoded in the type system. The timestamp suffix ({capsule_code}-{timestamp_ns}.yaml) handles the case where two reconcile loops run concurrently for different capsules — a realistic scenario when multiple tenants are provisioning simultaneously. Each loop creates its own file, and each file is deleted independently when its scope ends.

5.4 The Queue Values Render Function

The operator renders per-capsule NATS connection URLs into the Helm values format before writing to the tmpfs file:

pub fn render_capsule_queue_values(
    capsule_code: &str,
    queue_urls: &std::collections::HashMap<String, String>,
) -> String {
    let events = queue_urls.get("events").map(|s| s.as_str()).unwrap_or("");
    let commands = queue_urls.get("commands").map(|s| s.as_str()).unwrap_or("");
    format!(
        "capsule:\n  code: {capsule_code}\n  queues:\n    events: \"{events}\"\n    commands: \"{commands}\"\n"
    )
}

This produces Helm values compatible with the canonical chart schema:

capsule:
  code: DEVUSR
  queues:
    events: "nats://nats.svc.cluster.local:4222/eva-DEVUSR-events"
    commands: "nats://nats.svc.cluster.local:4222/eva-DEVUSR-commands"

6. Annotation-Driven Blue/Green Upgrades Let Two Operator Versions Coexist Without a Coordination Service

6.1 The Upgrade Problem

Upgrading a Kubernetes operator in a multi-tenant environment presents a risk that single-tenant operators do not face: if the new operator version immediately takes ownership of all CRs the moment it starts, it may begin reconciling tenants that the old operator was actively managing. This can produce duplicate Helm releases, conflicting status writes, and mid-reconciliation state corruption. A naive approach — simply applying the new operator Deployment and waiting for the rolling update — provides no protection against this race. The new operator pod may receive watch events for all CRs immediately on startup, before the old pod has finished its final reconcile cycles.

6.2 Annotation-Based Ownership

The operator solves this through a CR-level ownership annotation. Each operator reads its own version from the OPERATOR_VERSION environment variable (injected via the Deployment annotation eva.aidonow.com/operator-version). Before processing any CR, the reconciler checks the corresponding annotation on the EvaTenant object:

pub const OPERATOR_VERSION_ANNOTATION: &str = "eva.aidonow.com/operator-version";

match cr_annotation {
    None => {
        // CR is unclaimed — stamp our version and proceed
        gate_api.patch_metadata(
            &gate_name,
            &PatchParams::default(),
            &Patch::Merge(&json!({
                "metadata": {
                    "annotations": {
                        OPERATOR_VERSION_ANNOTATION: our_version.as_str()
                    }
                }
            })),
        ).await?;
        // fall through to reconcile
    }
    Some(v) if v == our_version => {
        // We own this CR — process normally
    }
    Some(other_version) => {
        // Another operator version owns this CR — yield
        return Ok(Action::await_change());
    }
}

The upgrade flow is:

Deploy the new operator version (v2) alongside the old (v1)
v2 claims CRs with absent annotations, v1 continues managing annotated CRs
CRs naturally migrate to v2 as they complete their current reconcile cycle and v1 removes its annotation (or the CR is recreated)
Once v2 owns all CRs, v1 can be terminated

An OPERATOR_VERSION environment variable that is empty or unset causes the operator to enter backward-compatibility mode: it claims all CRs with absent annotations. This is intentional for pre-annotation deployments, but means that a misconfigured operator (with an empty or wrong version) will silently take ownership of CRs that should belong to a different version. Verify the OPERATOR_VERSION value in the Deployment before deploying into a mixed-version environment.

6.3 The Wave Label for Staged Rollout

Separate from version ownership, the wave field on the EvaTenantSpec enables staged rollout. Wave 1 is internal or test capsules; wave 2 is a pilot cohort; wave 3 is the full fleet. The operator stamps this value as a label on the CR on every reconcile cycle:

async fn ensure_wave_label(
    kube_client: &Client,
    tenant: &EvaTenant,
    wave: u8,
) -> Result<(), OperatorError> {
    let patch = json!({
        "metadata": {
            "labels": {
                crate::types::WAVE_LABEL: wave.to_string()
            }
        }
    });
    api.patch_metadata(&name, &PatchParams::apply("eva-tenant-operator"), &Patch::Merge(&patch))
        .await
}

External tooling (smoke gates, canary controllers) can then select capsules by wave using standard label selectors, without needing to parse the CR spec.

7. Milestone 4: Per-Capsule JetStream Streams Isolate Tenant Messages at the Server Level

7.1 Architecture: Per-Capsule Stream Isolation

Each provisioned capsule receives two dedicated NATS JetStream streams:

Stream name	Subject filter	Purpose
`eva-{CODE}-events`	`capsule.{CODE}.events.>`	Domain events fan-out
`eva-{CODE}-commands`	`capsule.{CODE}.commands.>`	Intra-capsule commands

The subject filter is the isolation mechanism. A consumer subscribed to capsule.DEVUSR.events.> cannot receive messages published to capsule.OTHRT.events.> because JetStream’s subject-based routing enforces the filter at the server level. There is no application-layer filter required. The wildcard token > in NATS subject patterns matches all remaining tokens in the subject hierarchy, including tokens with dots. capsule.DEVUSR.events.> matches capsule.DEVUSR.events.order.created, capsule.DEVUSR.events.user.profile.updated, and all other subjects with that prefix.

7.2 The MessagingProvider Trait

The messaging implementation is abstracted behind a trait to support both the production NATS path and a test-time mock:

#[async_trait]
pub trait MessagingProvider: Send + Sync {
    async fn create_or_update_stream(
        &self,
        stream_name: &str,
        subject_filter: &str,
    ) -> Result<String, OperatorError>;

    async fn configure_routing(&self, capsule_code: &str) -> Result<(), OperatorError>;

    async fn configure_alarms(&self, capsule_code: &str)
        -> Result<HashMap<String, String>, OperatorError>;

    async fn delete_queue_pair(&self, capsule_code: &str) -> Result<(), OperatorError>;
    async fn delete_alarms(&self, capsule_code: &str) -> Result<(), OperatorError>;
}

configure_routing validates that subject filters are correctly formed for the given capsule code. On NATS JetStream, routing is implicit once the streams are created with the correct subject filters — the method verifies this assumption without making additional API calls. The method exists in the trait to provide a hook for future multi-cloud implementations where routing configuration is a separate API call. configure_alarms writes a Prometheus alerting rule YAML file to /etc/prometheus/alerts/eva-{CODE}-queue-depth.yaml. The rule fires when a stream’s message count exceeds 80% of its MaxMsgs limit for more than 5 minutes — a leading indicator of consumer lag before the stream reaches capacity.

7.3 Idempotency

The NATS JetStream API returns error code 10058 (“stream name already in use”) when a stream creation request names an existing stream. The NatsProvider implementation treats this as a success:

Err(e) => {
    let err_str = e.to_string();
    if err_str.contains("10058") || err_str.contains("stream name already in use") {
        info!(stream = %stream_name, "NATS stream already exists — idempotent OK");
    } else {
        return Err(OperatorError::Internal(
            format!("NATS create_stream '{stream_name}' failed: {e}")
        ));
    }
}

This makes the messaging provisioning step safe to retry on any requeue cycle without risk of duplicate stream creation.

7.4 Prometheus Metrics

The operator exposes a /metrics endpoint serving Prometheus format metrics. A capsule_queue_depth gauge tracks NATS JetStream stream message counts per capsule, updated on every reconcile cycle for Active tenants:

tokio::spawn(async move {
    crate::metrics::read_and_update_nats_queue_depth(&nats_url, &capsule).await;
});

The update is spawned as a background task to prevent NATS latency from blocking the reconcile loop.

8. Milestone 5: Digest-Pinned Image Manifests Enable Deployment Without Internet Access

8.1 The Air-Gap Distribution Problem

Air-gap environments — networks with no outbound internet access — require all container images to be available in an internal registry before installation begins. The typical failure mode for teams that do not plan for this: images are pulled by image tag at installation time, the tag is not available in the internal registry, the installation fails with an obscure pull error, and the root cause is a distribution problem that should have been solved before the deployment attempt. The correct solution is to build the image digest manifest at release time, not at deployment time.

8.2 The images.txt Manifest

The images.txt file is the authoritative list of all container images required to run the operator and its managed workloads:

# images.txt — Eva Deploy Image Digest Manifest
# Format: <image-ref>@sha256:<digest>
registry.example.com/eva/eva-operator:v0.1.0@sha256:PLACEHOLDER
registry.example.com/eva/eva-crm:v0.9.1@sha256:PLACEHOLDER
registry.example.com/eva/eva-itsm:v0.7.3@sha256:PLACEHOLDER
registry.example.com/eva/eva-whisper:v0.5.0@sha256:PLACEHOLDER
registry.example.com/eva/eva-forge:v0.4.0@sha256:PLACEHOLDER
registry.example.com/eva/eva-shell:v0.6.0@sha256:PLACEHOLDER
registry.example.com/eva/eva-gateway:v0.3.0@sha256:PLACEHOLDER

Digest pinning (@sha256:{digest}) rather than tag-only references ensures that the operator and its modules are version-locked at a cryptographic level. Tags are mutable — a tag pushed by a later build silently replaces the previous image. A digest reference cannot be changed without modifying the images.txt file explicitly. The CI pipeline populates PLACEHOLDER values with real digests at release time by capturing docker inspect --format='{{index .RepoDigests 0}}' after each image push. The release pipeline blocks on any remaining PLACEHOLDER strings before creating the release artifact.

8.3 The mirror-images.sh Script

The mirror-images.sh script consumes images.txt and mirrors each image to an internal registry:

while IFS= read -r line; do
  [[ -z "$line" || "$line" =~ ^# ]] && continue

  SOURCE_IMAGE="${line%%@sha256:*}"
  EXPECTED_DIGEST="sha256:${line##*@sha256:}"

  IMAGE_PATH=$(echo "$SOURCE_IMAGE" | cut -d/ -f2-)
  TARGET_IMAGE="${TARGET_REGISTRY}/${IMAGE_PATH}"

  $CONTAINER_TOOL pull "$SOURCE_IMAGE"
  $CONTAINER_TOOL tag "$SOURCE_IMAGE" "$TARGET_IMAGE"
  $CONTAINER_TOOL push "$TARGET_IMAGE"
done < "$IMAGES_FILE"

The script supports both docker and podman (auto-detected or specified via --podman), performs digest verification after pull and after push, and reports a non-zero exit code if any image fails to mirror. This makes the script suitable for use in automated pre-installation validation pipelines.

8.4 The quickstart.sh Bootstrap Flow

The quickstart.sh script provides a single-command path from a bare Kubernetes cluster to a provisioned capsule. Its five steps match the provisioning sequence:

Step 1: Apply CRDs     → kubectl apply -f CRDs/
Step 2: Deploy operator → kubectl apply -f operator-manifest/
Step 3: Wait for ready  → kubectl rollout status deployment/eva-tenant-operator
Step 4: Create CR       → kubectl apply with inline EvaTenant manifest
Step 5: Poll conditions → all five True → phase: Active

The --air-gap flag prevents the script from attempting any public registry pulls. The --wave flag stamps the wave label on the CR at creation time. The --dry-run flag runs through all steps with kubectl apply --dry-run=client, producing the full output without making any changes to the cluster.

The --dry-run flag is particularly useful during air-gap pre-installation validation: it verifies that the cluster is reachable, the operator manifest is syntactically valid, and the CRD schema accepts the CR spec — without triggering actual provisioning. Running quickstart.sh --dry-run --air-gap --capsule-code TSTCAP before a production deployment catches the majority of configuration errors.

9. The kube-core 0.88 API Break

9.1 The Breaking Change

During development, an upgrade to kube-core 0.88 (the kube-rs client crate) introduced a compile-time break in the ensure_wave_label function. The function reads the current wave label from the CR’s metadata to avoid issuing unnecessary patch requests. The code before the upgrade:

// Pre-0.88: labels() returns Option<&BTreeMap<String, String>>
let current_wave = tenant
    .labels()
    .and_then(|labels| labels.get(WAVE_LABEL))
    .and_then(|v| v.parse::<u8>().ok())
    .unwrap_or(0);

In kube-core 0.88, ResourceExt::labels() changed its return type from Option<&BTreeMap<String, String>> to &BTreeMap<String, String> directly. Calling .and_then() on a &BTreeMap — a value, not an Option — produces a type error that the Rust compiler reports as E0599 (method not found) with E0282 (type inference failure) as a secondary diagnostic.

9.2 The Fix

The fix is a direct .get() call on the map:

// Post-0.88: labels() returns &BTreeMap<String, String> directly
let current_wave = tenant
    .labels()
    .get(WAVE_LABEL)
    .and_then(|v| v.parse::<u8>().ok())
    .unwrap_or(0);

The .get() method on BTreeMap returns Option<&V>, so the .and_then() chain is still valid — it just starts from the Option returned by .get() rather than from the Option that labels() previously returned.

Any codebase that wraps ResourceExt::labels() in an Option chain (calling .map(), .and_then(), or .unwrap_or_default() directly on its return value) will fail to compile against kube-core 0.88. This affects any code that accesses labels in the reconciler, event emission, or condition reporting paths. A codebase-wide search for patterns matching .labels() followed by .and_then or .map will identify all affected call sites before the upgrade is attempted.

10. Milestone 6: SLA and Chaos Tests Verify Provisioning Latency and Reconciliation Idempotency Under Restart

10.1 Test Suite Design

Six E2E tests gate GA readiness. All are marked #[ignore] in the Rust test suite, which means they are excluded from normal cargo test runs and must be invoked explicitly with cargo test --test homelab_e2e_test -- --ignored. They require a live Kubernetes cluster with the operator deployed.

Test	SLA / Constraint	What Is Verified
SLA — Starter tier	All 5 conditions True within 300s	Provisioning latency under load
SLA — Enterprise tier	All 5 conditions True within 480s	Provisioning latency for larger module set
Blue/green upgrade	Annotation updated, IngressReady stable	Upgrade coordination without traffic disruption
Decommission saga	CR and namespace deleted within 300s	Full cleanup, zero data residue
Chaos — kill after NamespaceReady	Full recovery after pod kill	Reconciler idempotency after restart
Chaos — kill after Provisioned	Full recovery after pod kill	Idempotency at a later saga checkpoint

10.2 SLA Measurement

The SLA tests instrument per-condition timing by polling for each condition in order and recording elapsed time at each transition:

for &cond_type in ALL_CONDITIONS {
    let remaining = sla_secs - test_start.elapsed().as_secs_f64();
    let elapsed = poll_until_condition_true(
        &tenants, tenant_name, cond_type, Duration::from_secs_f64(remaining)
    ).await;
    // record condition_results
}

The test writes a JSON artifact to /tmp/sla-results-{tier}.json containing per-condition elapsed times, the total elapsed, and a boolean sla_pass. This artifact is available for post-run analysis in CI even if the test assertion fails. The SLA values — 300 seconds for Starter, 480 seconds for Enterprise — were chosen based on the module count difference between tiers: a Starter capsule provisions two modules (CRM, ITSM), while an Enterprise capsule provisions six. The 60% longer SLA for Enterprise (480s vs 300s) reflects the additional Helm chart deployment time.

10.3 Chaos Test Design

The chaos tests verify that the operator correctly resumes an in-progress provisioning saga after its pod is forcibly terminated. The test kills the operator pod at a known checkpoint — either after NamespaceReady becomes True or after Provisioned becomes True — and then verifies two properties:

The operator pod restarts and reaches available_replicas >= 1 within 60 seconds.
All five conditions reach True within the original 300-second budget.

A third assertion verifies that no duplicate namespaces were created — exactly one namespace matching eva-tenant-{tenant_name} must exist after recovery. This prevents a scenario where the operator creates a second namespace because it does not correctly detect the existing one after restart. The kill mechanism is a kubectl delete pod -l app=eva-tenant-operator -n eva-operator --wait=false call, which deletes the pod and allows Kubernetes to immediately create a replacement. The --wait=false flag returns immediately, simulating an abrupt termination without waiting for graceful shutdown.

The chaos tests demonstrate that the reconciler’s idempotency guarantees hold under restart conditions. Idempotency is not free — it requires every operation in the provisioning saga to be safe to repeat. The NATS stream creation (error 10058 = idempotent success), the namespace creation (server-side apply), and the Helm upgrade (--install flag) are each written to be safe on repeated invocation.

10.4 Air-Gap Verification

The air-gap E2E test — distinct from the homelab tests — runs in a kind cluster with outbound network access blocked via iptables DROP rules. It verifies two properties simultaneously:

The operator starts and processes a CR in the absence of outbound connectivity.
The iptables DROP counter does not increase during the operator’s lifecycle — no packets were blocked, meaning the operator made no outbound network calls.

A supplementary tcpdump capture on port 53 provides a secondary signal: zero external DNS queries during the test run. Both signals must pass for the air-gap test to succeed.

11. Recommendations

Adopt the TmpfsSecretFile RAII pattern for any Helm value that constitutes a secret in your operator: the pattern generalizes beyond NATS URLs to database passwords, API keys, and any value that should not persist on disk between reconcile cycles. The operator pod spec must include the medium: Memory emptyDir volume; without it the pattern silently falls back to the overlay filesystem.
Pin operator_version in your CR templates to the version from the active Deployment annotation: tooling that creates EvaTenant CRs should read this annotation from the live Deployment rather than hardcoding a version. A mismatch between the pinned version and any running operator version causes the CR to be permanently ignored.
Instrument per-condition timing in every provisioning operator you build: the SLA tests demonstrate that per-condition elapsed times are more actionable than total provisioning time. When an SLA is missed, knowing whether NamespaceReady took 200s or ModulesDeployed took 200s drives fundamentally different remediation paths.
Build the images.txt digest manifest as part of your operator image CI pipeline, not as a post-hoc step: the pipeline that pushes the image should immediately capture the pushed digest and write it to images.txt. Treating digest capture as a manual step consistently results in PLACEHOLDER values in production.
Test operator restart recovery explicitly and at multiple saga checkpoints in your operator validation suite: a reconciler that is idempotent for happy-path flows may not be idempotent after interruption at a specific checkpoint. The chaos test approach — kill after each major condition transition — discovers checkpoint-specific idempotency failures that integration tests do not exercise.
Before upgrading to kube-core 0.88, search your codebase for labels() followed by Option combinators: the ResourceExt::labels() API change is not backward compatible and produces compile errors rather than runtime failures. Addressing this before upgrading is preferable to discovering it during a time-sensitive release.

12. Conclusion

The six milestones documented here represent a complete lifecycle for a production Kubernetes operator: from CRD schema design through cryptographically-verified air-gap distribution and homelab chaos testing. The patterns that emerge from this sprint — RAII-based secret cleanup, annotation-driven upgrade coordination, and per-tenant stream isolation as a first-class provisioning primitive — are not specific to the operator described here. They apply to any system that must provision isolated, stateful workloads declaratively in a Kubernetes cluster. As self-hosted Kubernetes adoption continues to expand into environments where air-gap operation and multi-tenant isolation are requirements from day one rather than afterthoughts, the engineering investment in these patterns at the operator level pays compounding dividends. An operator that handles upgrade coordination, secret hygiene, and air-gap distribution correctly is one that teams can operate with confidence — independent of the specific workloads it manages.

All content represents personal learning from personal projects. Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.

Overview

Workflows

Process

Infrastructure

Documentation Index

​Executive Summary

​Key Findings

​1. Introduction: The Provisioning Problem

​2. The EvaTenant CRD Schema

​2.1 Capsule Identity and the Six-Character Code

​2.2 Spec Fields

​2.3 The Five Status Conditions

​2.4 A Minimal Provisioning Request

​3. Operator Architecture

​3.1 Reconciler Phase Dispatch

​3.2 Finalizer and the Decommission Saga

​3.3 Kubernetes Events

​3.4 Per-Tier Resource Quotas

​4. Milestone 2: Bundling Helm Charts Into the Operator Image Enables Disconnected Deployment

​5. Milestone 3: RAII Secret Files on tmpfs Guarantee Secrets Never Persist to Disk

​5.1 The Security Requirement

​5.2 The TmpfsSecretFile Pattern

​5.3 Why Drop Semantics Matter Here

​5.4 The Queue Values Render Function

​6. Annotation-Driven Blue/Green Upgrades Let Two Operator Versions Coexist Without a Coordination Service

​6.1 The Upgrade Problem

​6.2 Annotation-Based Ownership

​6.3 The Wave Label for Staged Rollout

​7. Milestone 4: Per-Capsule JetStream Streams Isolate Tenant Messages at the Server Level

​7.1 Architecture: Per-Capsule Stream Isolation

​7.2 The MessagingProvider Trait

​7.3 Idempotency

​7.4 Prometheus Metrics

​8. Milestone 5: Digest-Pinned Image Manifests Enable Deployment Without Internet Access

​8.1 The Air-Gap Distribution Problem

​8.2 The images.txt Manifest

​8.3 The mirror-images.sh Script

​8.4 The quickstart.sh Bootstrap Flow

​9. The kube-core 0.88 API Break

​9.1 The Breaking Change

​9.2 The Fix

​10. Milestone 6: SLA and Chaos Tests Verify Provisioning Latency and Reconciliation Idempotency Under Restart

​10.1 Test Suite Design

​10.2 SLA Measurement

​10.3 Chaos Test Design

​10.4 Air-Gap Verification

​11. Recommendations

​12. Conclusion

Executive Summary

Key Findings

1. Introduction: The Provisioning Problem

2. The EvaTenant CRD Schema

2.1 Capsule Identity and the Six-Character Code

2.2 Spec Fields

2.3 The Five Status Conditions

2.4 A Minimal Provisioning Request

3. Operator Architecture

3.1 Reconciler Phase Dispatch

3.2 Finalizer and the Decommission Saga

3.3 Kubernetes Events

3.4 Per-Tier Resource Quotas

4. Milestone 2: Bundling Helm Charts Into the Operator Image Enables Disconnected Deployment

5. Milestone 3: RAII Secret Files on tmpfs Guarantee Secrets Never Persist to Disk

5.1 The Security Requirement

5.2 The TmpfsSecretFile Pattern

5.3 Why Drop Semantics Matter Here

5.4 The Queue Values Render Function

6. Annotation-Driven Blue/Green Upgrades Let Two Operator Versions Coexist Without a Coordination Service

6.1 The Upgrade Problem

6.2 Annotation-Based Ownership

6.3 The Wave Label for Staged Rollout

7. Milestone 4: Per-Capsule JetStream Streams Isolate Tenant Messages at the Server Level

7.1 Architecture: Per-Capsule Stream Isolation

7.2 The MessagingProvider Trait

7.3 Idempotency

7.4 Prometheus Metrics

8. Milestone 5: Digest-Pinned Image Manifests Enable Deployment Without Internet Access

8.1 The Air-Gap Distribution Problem

8.2 The images.txt Manifest

8.3 The mirror-images.sh Script

8.4 The quickstart.sh Bootstrap Flow

9. The kube-core 0.88 API Break

9.1 The Breaking Change

9.2 The Fix

10. Milestone 6: SLA and Chaos Tests Verify Provisioning Latency and Reconciliation Idempotency Under Restart

10.1 Test Suite Design

10.2 SLA Measurement

10.3 Chaos Test Design

10.4 Air-Gap Verification

11. Recommendations

12. Conclusion