Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.aidonow.com/llms.txt

Use this file to discover all available pages before exploring further.

AI-assembled lab infrastructure stack: hardware, Proxmox, Kubernetes, platform services, and AI inference layer

Executive Summary

This paper documents the complete construction of a production-grade private lab environment — spanning physical blade hardware, enterprise networking, Proxmox virtualization, Kubernetes service orchestration, and self-hosted LLM inference — using an AI coding assistant as the primary infrastructure engineer. Over approximately four weeks and 550+ commits, Claude Code authored the majority of the infrastructure-as-code, diagnosed and resolved a cascade of constraint-discovery failures, and operated under a formal operational constitution encoded in a repository-level CLAUDE.md file. The lab culminates in a self-hosted inference tier running Gemma 4, DeepSeek, and Qwen3 models on Apple Silicon, fronted by an agent gateway layer that connects AI workloads to the broader platform. The evidence suggests that AI-driven infrastructure engineering is viable at this scale, but requires deliberate governance mechanisms — operational constitutions, memory persistence, GitOps mandates, and incident-encoded constraints — to remain coherent across sessions.

Key Findings

  • AI coding assistants can serve as primary infrastructure engineers for homelab-scale environments, authoring Kubernetes manifests, Ansible playbooks, CI pipelines, and operational protocols across hundreds of commits with minimal human-written code.
  • The primary failure mode in AI-driven infrastructure builds is constraint discovery, not code quality — the AI produces syntactically correct configurations that fail due to hardware incompatibilities, CPU instruction set gaps, and platform-specific behavioral differences that cannot be inferred from documentation alone.
  • Apple Silicon delivers a qualitatively different inference profile than x86 server hardware: an M4 Pro Mac mini achieves approximately 80 tokens/second on models in the 8–32B parameter range, compared to CPU-only inference on E5-2640v4 blades that is unsuitable for interactive workloads.
  • OpenClaw agent gateways provide the connective tissue between self-hosted LLMs and the platform service layer, enabling AI agents to interact with project management, version control, and workflow automation systems through a structured API surface rather than direct credential access.
  • A repository-level operational constitution (CLAUDE.md) is the most important governance artifact in an AI-driven infrastructure build: it encodes the decisions, constraints, and incident-derived rules that survive across Claude Code sessions, preventing regressions that would otherwise recur with every context reset.
  • Every significant incident produced a machine-checkable constraint: the three most expensive failures — a LoadBalancer DNAT interception, a CPU instruction set incompatibility cascade, and a data loss event caused by operating without a verified backup — each resulted in explicit checks committed to the operational constitution and enforced on subsequent operations.

1. Introduction: The AI-Assembled Lab

The standard narrative for AI-assisted development focuses on code generation within established codebases. This paper documents a different application: using an AI coding assistant to build infrastructure from scratch, beginning with physical hardware and ending with a self-hosted AI inference platform that runs AI workloads on that same infrastructure. The motivation was straightforward. A production AI development organization requires infrastructure that mirrors the properties it is trying to create: version-controlled, reproducible, auditable, and capable of running AI workloads locally. Building that infrastructure with AI makes the build process itself a demonstration of the capability it is intended to support. The stack that emerged spans five layers: bare-metal blade compute and an Apple Silicon inference node; Proxmox VE virtualization across three hypervisors; a k3s Kubernetes cluster running on VM workers; a platform service tier of approximately 30 services including Gitea, Redmine, n8n, HashiCorp Vault, Harbor, Kyverno, and a full observability stack; and an AI inference tier running multiple LLM models behind an OpenClaw agent gateway.

2. Hardware Architecture

2.1 Blade Compute

The compute foundation is a Dell PowerEdge FX2 blade chassis — a four-slot enclosure housing four compute blades, two internal 10GbE FN 410T switches, and a chassis management controller. All four compute nodes are Dell FC430 blades with dual Intel E5-2640v4 CPUs and 64GB RAM. A Cisco Catalyst 3850-48U-E serves as the core uplink switch, reclaimed from prior enterprise use and reconfigured from scratch using AI-generated IOS configuration.
The E5-2640v4 is a Broadwell-EP processor from 2016. It predates the x86-64-v2 microarchitecture baseline that several modern container images assume. This produced two distinct incompatibility failures during the build: MinIO and ScyllaDB 5.x both failed with illegal instruction errors on first deployment.
Three blades run Proxmox VE, forming a three-node hypervisor cluster. The fourth blade operates as bare-metal NFS storage, exporting filesystems consumed by Proxmox for VM images and by the Kubernetes NFS subdir provisioner for persistent volume claims.

2.2 Apple Silicon Inference Node

The AI inference tier runs on a separate Mac mini with an Apple Silicon M4 Pro chip. The node runs Ollama and hosts a curated set of models: Gemma 4, DeepSeek-R1 70B, DeepSeek Coder V2 16B, Qwen3 8B, and Qwen2.5-Coder 32B. Measured throughput at the 8–32B parameter range is approximately 80 tokens/second — sufficient for interactive development workflows and agent-driven task execution. The inference node was originally planned as a role for one of the x86 blades. That plan was abandoned after observing that CPU-only inference on an E5-2640v4 produces throughput unsuitable for interactive use. The Mac mini was added to the lab explicitly to address this gap, and the blade was repurposed as a Proxmox hypervisor hosting the largest Kubernetes worker node.
Apple Silicon’s unified memory architecture makes it disproportionately capable for LLM inference relative to cost. A Mac mini with 64GB unified memory can run 32B parameter models at full context lengths that would require a discrete GPU with equivalent VRAM at 3–5× the cost. For homelab AI inference, this is the highest-leverage single hardware purchase.

2.3 Storage Architecture

Storage is tiered across three backends: local NVMe SSDs on each blade for VM ephemeral storage and Ollama model data; NFS exports from the dedicated storage blade for Kubernetes PVCs; and a Synology NAS (currently offline) designated as the backup destination. The intended primary storage tier — 15 SAS drives across the chassis — remains offline due to a firmware bug in NS02 drive firmware that corrupts manufacturer metadata and triggers the SED failsafe, presenting all drives as 0 bytes. Recovery requires a forced firmware flash via SeaChest tooling; this work is pending.

3. Network Architecture

3.1 Core Switching and VLANs

The Cisco 3850 operates as the core switch with VLANs separating homelab infrastructure, home WiFi, and storage traffic. An LACP bond connects the 3850 to the FX2 chassis internal switch via two 10GbE uplinks. The 3850 runs layer-3 switching for VLAN routing but does not support NAT — requiring static routes on the upstream ISP router for VLAN internet access, a constraint that is non-obvious and caused connectivity failures for several VLAN subnets until identified. All internal services resolve via a dnsmasq VM running on the Proxmox cluster. The DNS VM also runs AdGuard Home for DNS filtering and query logging. The wildcard domain resolution strategy warrants specific attention: an initial address=/*.lab/<ip> catch-all in dnsmasq introduced a subtle failure mode in which Kubernetes internal DNS — which appends search domain suffixes to cluster-local names — resolved service.namespace.svc.cluster.local.lab to the ingress controller IP rather than the cluster-internal service IP. This broke ArgoCD’s ability to reach the Gitea instance for repository polling and was diagnosed only after ruling out network routing, TLS, and credential failures.

3.2 TLS and External Access

Internal services use a Let’s Encrypt wildcard certificate issued via Cloudflare DNS-01 challenge, covering the internal domain. All services accessible from internal clients are browser-trusted without certificate exceptions. External access is provided by a Cloudflare Tunnel with two replicas and pod anti-affinity, exposing select services to the public internet without requiring port forwarding or a static IP. Remote access for developers and AI agents is additionally provided by Tailscale, with a subnet router deployed in Kubernetes and an OAuth key rotation CronJob running every 80 days.

4. Virtualization Layer

Proxmox VE is installed on three blades, forming a three-node HA cluster with live migration support between nodes. All VM provisioning is automated: a shell script wraps qm create and qm importdisk to provision from an Ubuntu 24.04 cloud image, with per-VM cloud-init userdata snippets stored in the repository under proxmox/snippets/. The VM topology allocates resources asymmetrically: the blade with the largest local NVMe SSD hosts the heaviest Kubernetes worker (32 vCPU, 48GB RAM, 200GB local storage). The two remaining hypervisors host two Kubernetes worker VMs each (8 vCPU, 24GB, 40GB NFS-backed) and several utility VMs: the k3s control plane, DNS, an AI workstation (the persistent Claude Code operating environment), and a development workstation.
The decision to run all Kubernetes nodes as Proxmox VMs rather than bare-metal k3s workers reflects a deliberate trade-off: VM overhead (approximately 3–5% CPU, minimal memory) in exchange for snapshots, live migration, and the ability to resize resources without reinstalling the OS. For a lab environment that expects frequent reconfiguration, this trade-off strongly favors virtualization.

5. Kubernetes Architecture

5.1 Distribution and Configuration

The Kubernetes distribution is k3s v1.32, running a single control plane VM with three worker VMs. Traefik is disabled at installation in favor of ingress-nginx. The servicelb component (klipper-lb) is also disabled — the consequence of a significant production incident described below.

5.2 The klipper-lb DNAT Incident

k3s ships with a built-in LoadBalancer implementation (klipper-lb) that achieves external IP assignment by installing iptables DNAT rules on every cluster node. These rules intercept all traffic arriving at port 22 and port 443 on any cluster node IP and redirect it to the corresponding service. The failure mode is non-obvious: SSH connections to the control plane node — used for cluster administration and kubeconfig tunnel access — began redirecting to the ingress controller. Additionally, pods initiating HTTPS connections to external authentication endpoints received the ingress controller’s TLS certificate rather than the target server’s certificate, breaking every OIDC token exchange in the cluster. The diagnostic process required ruling out TLS misconfiguration, DNS poisoning, and certificate authority errors before identifying the iptables DNAT rules as the root cause. Resolution required disabling klipper-lb at the k3s server level and migrating all LoadBalancer IP assignments to MetalLB in L2 mode. Several artifacts of the incident remain as permanent configuration: control plane SSH operates on a non-standard port rather than reverting to port 22 after the MetalLB migration, and all service-to-service OIDC calls use internal cluster DNS rather than public hostnames to avoid the external ingress path.
If deploying k3s with LoadBalancer services on a network where SSH access to cluster nodes matters, disable klipper-lb at installation time (--disable servicelb) and install MetalLB before creating any LoadBalancer services. Retroactively disabling klipper-lb after services exist requires cleaning up orphaned iptables rules that k3s does not remove automatically.

5.3 Storage Classes and GitOps Mandate

Two storage classes are configured: local-path (k3s built-in, default) for node-local persistent volumes, and nfs-client (NFS subdir provisioner, reclaimPolicy: Retain) for shared persistent volumes. The Retain policy on the NFS class is deliberate — a direct consequence of the data loss incident described in Section 8. All Kubernetes resources are managed through ArgoCD using an app-of-apps pattern watching the k8s/ directory in the infrastructure repository. The operational constitution explicitly prohibits direct kubectl apply, helm install outside ArgoCD, or hand-editing of deployed resources. This constraint exists not as a best-practice recommendation but as an operationally enforced rule: the AI engineer is instructed to raise a failure rather than bypass the GitOps path, even under pressure to resolve an incident quickly.

6. Platform Service Layer

Approximately 30 services run in the Kubernetes cluster, organized into functional groupings. Identity and access management: Authentik provides SSO across all services via Google Workspace as the upstream identity provider. Every service in the stack — Gitea, Harbor, ArgoCD, Redmine, SonarQube, Vault, Grafana, Headlamp — authenticates through Authentik. A recurring configuration failure during initial deployment affected all services equally: Authentik creates OAuth2 providers with empty property mappings by default, resulting in tokens that contain no scopes and userinfo endpoints that return 403. This failure was diagnosed independently for each service before the systemic root cause was identified and corrected globally. Version control and container registry: Gitea v1.24.7 serves as the self-hosted Git platform, with SSH exposed via a dedicated MetalLB IP on port 222. Gitea Actions drives CI/CD with act_runner replicas running in Kubernetes using Docker-in-Docker sidecars. Harbor provides the private container registry with Cosign content trust enforcement. All CI images are built from a base Alpine image and published to Harbor’s internal project. Developer tooling: Redmine handles project management and issue tracking with OIDC SSO. n8n provides workflow automation for CI event routing, Telegram notifications, and agent task state machine orchestration. SonarQube community LTS provides static analysis. Observability: The full kube-prometheus-stack is deployed, augmented with Loki for log aggregation, Grafana Tempo for distributed tracing, an OTel Collector routing both, IPMI exporter for iDRAC metrics, and PVE exporter for Proxmox metrics. Security monitoring runs Wazuh for host-based IDS with agents on all cluster nodes, and CrowdSec reading ingress-nginx access logs with a bouncer wired to the ingress controller.

Comparison: Platform Service Deployment Approaches

ConcernDirect kubectl / HelmArgoCD GitOps
Drift detectionNone — live state is authoritativeContinuous — ArgoCD alerts on drift
Incident recoveryReconstruct from memory or notesResync from git — deterministic
AI agent operationsRisk of untracked changesEnforced path — all changes in git
Audit trailkubectl history, incompleteFull git log — complete
RollbackManual, error-proneGit revert, automated sync

7. Self-Hosted AI Inference and Agent Gateways

7.1 Ollama on Apple Silicon

The inference tier runs Ollama on the Mac mini, accessible at a fixed IP on the lab network. Models currently deployed:
ModelParametersPrimary Use
Gemma 4General reasoning, agent tasks via OpenClaw
DeepSeek-R170BComplex reasoning, architecture decisions
DeepSeek Coder V216BCode generation and review
Qwen38BFast interactive queries
Qwen2.5-Coder32BExtended code context tasks
The Ollama instance is exposed as an OpenAI-compatible API endpoint, which allows both the Open WebUI chat interface and AI development tools configured to use the openai provider to route to local inference transparently. This enables development workflows to run entirely on-premises, without external API calls, for latency-sensitive or privacy-sensitive tasks.

7.2 OpenClaw Agent Gateways

OpenClaw provides the agent gateway layer — a structured framework for deploying AI agents with defined API surfaces, credential scoping, and platform service integrations. Two gateway instances run in the Kubernetes cluster: a primary instance serving general agent tasks and a secondary instance currently suspended at replica count zero. The gateway instances connect Ollama-hosted models to the platform service layer through a controlled interface: agents can interact with Redmine for task management, Gitea for repository operations, and n8n for workflow triggering, without holding direct credentials to those services. Credential injection is handled by HashiCorp Vault AppRole authentication, with per-agent secret scoping enforced at the Vault policy level.
The combination of OpenClaw gateways with Vault AppRole authentication creates a pattern where AI agents operate with least-privilege credentials that are scoped to their designated functions. An agent authorized to read and update Redmine tasks cannot access Gitea repositories, and vice versa. This replicates the RBAC model applied to human engineers in the same platform.
The Ollama Mac mini is also exposed to AI development tools running on the AI workstation VM via the same API endpoint, allowing Claude Code to delegate sub-tasks to local models for token-economics reasons while maintaining the primary reasoning context on the Anthropic API.

8. AI as Primary Infrastructure Engineer

8.1 The Operational Constitution

The central governance artifact is a repository-level CLAUDE.md file — an 8,000-word operational constitution that instructs the AI on how to operate within the lab environment. It covers: the GitOps-only mandate and what constitutes a valid bypass (nothing), the Stateful Service Change Protocol (back up first, verify the backup, only then change), the Redmine time-tracking requirement for every task, the Vault secret layout, the ArgoCD app-of-apps structure, and the lessons derived from each significant incident. The operational constitution solves a fundamental problem in AI-driven infrastructure engineering: Claude Code has no persistent memory across sessions. Each new session begins with full context of the repository but no memory of decisions made in prior sessions, incidents encountered, or constraints discovered. The CLAUDE.md functions as an externalized memory that is automatically loaded at session start, ensuring that constraints discovered through failure in session N are enforced in session N+1.

8.2 Memory Persistence

A structured memory system in docs/memory/ complements the operational constitution. This directory contains 20+ files covering infrastructure credentials, architectural decisions, lessons learned, and reference material. The CLAUDE.md instructs the AI to read docs/memory/MEMORY.md at session start and load relevant files before undertaking any task. On provisioning the AI workstation VM, the Ansible playbook copies this directory to the Claude Code memory path so that the auto-memory system loads it automatically.
For AI-driven infrastructure builds, commit the memory directory to the repository rather than treating it as ephemeral. Claude Code’s auto-memory at ~/.claude/projects/ is per-machine and per-path; the git-committed memory directory is path-independent and survives workstation reprovisioning, VM recreation, and context resets.

8.3 Agent Identity and Credential Management

The AI workstation VM runs under a dedicated git identity (cog-bot) with SSH keys scoped to the infrastructure repository. A provisioning script creates Linux user accounts for each AI agent role on the workstation, each with a pre-configured environment: Gitea PATs, Redmine API keys loaded from Vault, kubeconfig with SSH tunnel helper, and Anthropic API credentials. Per-agent Vault AppRole credentials are injected at session start via agent-session-start.sh, ensuring that each agent role operates with the minimum credential set required for its designated tasks. Approximately 108 of the 550+ commits in the infrastructure repository carry a Co-Authored-By: Claude Sonnet 4.6 attribution, representing the commits the AI authored directly. The actual proportion of AI-authored infrastructure code is higher, as many commits were created by the AI and reviewed by the human operator before commit without explicit co-author attribution.

9. Critical Failure Modes

Three incidents produced the most significant operational impact and the most durable lessons: The klipper-lb DNAT failure (described in Section 5.2): A default k3s component silently intercepted SSH and HTTPS traffic at the node level, producing failures that appeared to be TLS, DNS, and credential errors before the root cause was identified. Lesson encoded: disable servicelb at cluster installation, always. The ScyllaDB CPU incompatibility: A decision to run ScyllaDB 5.x as a DynamoDB-compatible backend for a platform service failed because ScyllaDB 5.x requires SSE4.2 and PCLMUL CPU extensions absent on the E5-2640v4 blades. Eight diagnostic steps — adjusting ScyllaDB configuration flags — were attempted before identifying the CPU incompatibility as fundamental. Resolution: replaced ScyllaDB with amazon/dynamodb-local, preserving the DynamoDB API compatibility without CPU requirements. Lesson encoded: before deploying any service to the cluster, check CPU instruction set requirements against E5-2640v4 baseline. The stateful service data loss event: A sequence of destructive operations on a stateful service — attempted without a verified backup — resulted in irrecoverable data loss for multiple services including workflow automation configurations and project management data. The Stateful Service Change Protocol, committed to CLAUDE.md following this incident, mandates: pause dependent services, create a backup, verify the backup is restorable, only then execute the change. PVC backup CronJobs for all stateful services were added to the cluster within 24 hours of the incident.

10. Recommendations

  1. Begin with an operational constitution before writing any infrastructure code. The CLAUDE.md is not documentation — it is the governance layer that makes AI-driven infrastructure coherent across sessions. Define the GitOps mandate, the stateful service protocol, and the cluster constraints before the first Kubernetes manifest is written.
  2. Audit CPU instruction set requirements before selecting any service. For clusters running on server-generation hardware more than three years old, verify that planned services do not require x86-64-v2 or newer microarchitecture features. ScyllaDB and MinIO are known examples; others exist.
  3. Disable klipper-lb and install MetalLB before creating any LoadBalancer services in k3s deployments where SSH access to cluster nodes is required. The iptables DNAT behavior is not documented prominently and produces failure modes that are expensive to diagnose after the fact.
  4. Commit the AI memory directory to the infrastructure repository and wire it into the VM provisioning playbook. Treating AI memory as ephemeral creates a recurring onboarding cost each time the workstation is reprovisioned or a new session begins with a clean environment.
  5. Encode every incident as a machine-checkable constraint, not a retrospective note. The operational constitution should grow with each failure. A constraint that is only in someone’s memory — human or AI — will be violated again.
  6. Use Apple Silicon for LLM inference in homelab environments. The tokens-per-second profile at the 8–32B parameter range is substantially superior to x86 CPU inference, and the power envelope is substantially lower than GPU-based alternatives. For labs where inference is a supporting role (agent tasks, developer toolbox) rather than the primary workload, the M-series Mac mini is the current optimum.
  7. Scope agent credentials through Vault AppRole, not shared service accounts. Per-agent credential scoping at the platform level prevents a compromised or misbehaving agent from taking actions outside its designated scope. The overhead of maintaining per-agent AppRole credentials is low; the blast radius reduction is significant.

Conclusion

A complete production-grade lab environment — spanning physical hardware, enterprise networking, virtualization, container orchestration, 30+ platform services, and self-hosted LLM inference — was assembled primarily by an AI coding assistant over approximately four weeks. The result is a functional platform that runs AI agent workloads on the same infrastructure that the AI engineer built. The limiting factor was not the AI’s ability to write correct infrastructure code. It was the AI’s inability to know, before deployment, which hardware constraints, platform behavioral differences, and version incompatibilities would invalidate configurations that were technically correct in isolation. The operational constitution pattern addresses this by ensuring that every constraint discovered through failure becomes a permanent part of the operating environment that shapes all subsequent work. As AI coding assistants become more capable of operating autonomously over longer time horizons, the patterns established here — operational constitutions, committed memory, GitOps mandates, and incident-encoded constraints — will become baseline requirements rather than advanced practices. Infrastructure engineering was always about managing constraints. AI-driven infrastructure engineering is the same discipline, with different tooling for encoding and enforcing those constraints.
Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.