Documentation Index
Fetch the complete documentation index at: https://www.aidonow.com/llms.txt
Use this file to discover all available pages before exploring further.
Executive Summary
A Rust service CI pipeline began failing with a missing assembler error after a routine dependency update. The root cause was a four-level transitive dependency chain —reqwest → hyper-rustls → rustls → aws-lc-rs → aws-lc-sys — in which the terminal crate builds BoringSSL from bundled C and assembly source, requiring cmake, NASM, Go, and Perl as build prerequisites. None of these tools were present in the CI container image. Eight commits over one afternoon traced the requirement chain incrementally, each revealing a new missing tool, before the resolution: upgrading aws-lc-sys from 0.37.1 to 0.40.0, which ships pre-generated assembly files and reduced the native toolchain requirement to gcc and as — both already present in build-essential. The same CI pipeline concurrently surfaced a separate but related issue: a .cargo-ok race condition on a shared NFS-backed cargo registry volume, requiring an fd-based file lock for correct serialization. This paper documents both cascades as a study in transitive dependency cost discovery in containerized Rust CI.
Key Findings
- Transitive TLS backend selection silently introduces native toolchain dependencies: choosing
rustlsas the TLS backend inreqwestpropagates through three crate boundaries to a C/assembly build requirement that does not appear in any direct dependency and produces no warning until the build fails. AWS_LC_SYS_NO_ASM=1does not work for release builds: the environment variable that disables BoringSSL’s assembly paths applies only to debug builds; release builds ignore it, making it an unreliable workaround.aws-lc-sys 0.40.0eliminates the native toolchain requirement through pre-generated assembly: the upgrade from 0.37.1 to 0.40.0 ships platform-specific pre-generated.Sfiles thatccassembles usinggcc/as, removing the dependency on cmake, nasm, golang, and perl simultaneously.- Shared NFS-backed cargo registry volumes introduce a
.cargo-okrace condition under concurrent job execution: cargo usesO_CREAT|O_EXCLfor extraction sentinels, which fails withEEXISTwhen a concurrent job on the same NFS share has already written the sentinel — requiring explicit serialization via an fd-based flock. - actix-web 4.x uses first-match routing, not last-match: registering a catch-all handler via
configure()before a specific path handler causes the specific handler to be silently shadowed, producing HTTP 400 errors that are difficult to attribute to routing configuration. - Pinning resolved dependency ranges in
Cargo.tomlprevents futurecargo updateregressions: without explicit version pins, a subsequentcargo updatecan silently regress to an olderaws-lc-systhat reintroduces the native toolchain requirement.
1. The Dependency Chain
The cascade originates in a common Rust HTTP client choice.reqwest with TLS enabled selects rustls as its TLS backend by default when the rustls-tls feature flag is active. rustls in turn selects aws-lc-rs as its cryptographic provider. aws-lc-rs depends on aws-lc-sys, which provides Rust bindings to AWS-LC — a fork of BoringSSL.
aws-lc-sys 0.37.1 builds BoringSSL from bundled source using cmake as the build driver. BoringSSL’s cmake build requires: NASM for x86_64 assembly optimization (AES-NI, SHA-NI, GHASH paths), Go for code generation and header file production during cmake configure, and Perl for cmake configure scripts.
None of these tools are present in a minimal Ubuntu-based CI container image. The cascade began when this dependency chain was first introduced — by a dependency update that changed the resolved version of rustls — and the build failed with nasm: not found.
2. The Eight-Commit Cascade
All eight commits occurred on a single afternoon, each representing a hypothesis, a test, and a discovery.Commit 1: Add nasm
Hypothesis: The build requires NASM. Adding it will resolve the failure.Commit 2: Try AWS_LC_SYS_NO_ASM=1
Hypothesis: The AWS_LC_SYS_NO_ASM=1 environment variable disables BoringSSL’s assembly paths, building in pure C and eliminating the NASM requirement.
Commit 3: Add golang
Hypothesis: After restoring NASM and removing theNO_ASM flag, the cmake configure step fails. BoringSSL requires a Go compiler during cmake configuration to generate boringssl.h and other headers.
Commit 4: Add perl
Hypothesis: BoringSSL’s cmake configure scripts require Perl. Even with gcc, NASM, and Go present, the cmake step invokes Perl scripts that fail silently or with an obscure error.3. The Resolution: Upgrading to 0.40.0
aws-lc-sys 0.40.0 changed its build strategy for x86_64 Linux: rather than invoking cmake to build BoringSSL from C and assembly source, it ships pre-generated .S assembly files in a generated-src/linux-x86_64/ directory within the crate. The cc crate assembles these using gcc and as, both included in build-essential.
The native toolchain requirements under each version:
| Tool | aws-lc-sys 0.37.1 | aws-lc-sys 0.40.0 |
|---|---|---|
| cmake | Required (BoringSSL build driver) | Not required |
| nasm | Required (x86_64 assembly) | Not required |
| golang | Required (cmake configure step) | Not required |
| perl | Required (cmake configure scripts) | Not required |
| gcc / as | Required | Required |
aws-lc-rs from 1.15.4 to 1.16.3 (which depends on the newer aws-lc-sys). To prevent future cargo update regressions to an older version:
cargo update could resolve to an earlier aws-lc-rs that depends on aws-lc-sys 0.37.x, silently reintroducing the full native toolchain requirement.
The CI manifest after the upgrade:
4. The Concurrent Race Condition: zstd-sys and Shared NFS
While the aws-lc-sys cascade was being resolved, a concurrent but independent build failure was occurring intermittently:error: failed to extract archive on zstd-sys, with the underlying cause File exists (os error 17): O_CREAT|O_EXCL on .cargo-ok.
Root Cause
The CI runner operates as a Kubernetes Deployment with three replicas. All three replicas share a single NFS-backed PVC mounted at the cargo registry path. When multiple replicas run builds concurrently, each attempts to extractzstd-sys’s bundled C source to the shared NFS path. Cargo uses O_CREAT|O_EXCL — an atomic create-if-not-exists operation — to write the .cargo-ok sentinel after successful extraction. If replica A has already extracted and written .cargo-ok, replica B’s O_CREAT|O_EXCL call fails with EEXIST (error 17), aborting the build.
Fix Evolution
The correct fix requires serializing cargo builds across replicas at the registry extraction step, using an fd-based flock that keeps the lock held across the sentinel-clear and build operations:flock <file> -c 'command', is necessary because the subshell form does not inherit the parent’s exported environment variables, including the per-run CARGO_HOME isolation variable. Using a subshell caused builds to fall back to the shared /root/.cargo path, defeating run isolation.
A complementary fix — ZSTD_SYS_USE_PKG_CONFIG=1 with libzstd-dev installed — directs zstd-sys to link against the system library rather than building from bundled source, eliminating zstd-sys’s source extraction to the shared registry path entirely:
ZSTD_SYS_USE_PKG_CONFIG=1 requires that the system libzstd version satisfies zstd-sys’s minimum version constraint. Verify with pkg-config --modversion libzstd before relying on this approach. A version mismatch produces a link error at build time, not a pkg-config error at configure time.5. Bonus Finding: actix-web First-Match Routing
The same service exhibited a separate failure unrelated to the dependency cascade: Kubernetes liveness probes on/health returned HTTP 400, causing the pods to enter a crash loop immediately after the first successful start.
Root Cause
The service’s main function calledconfigure() to register all routes contributed by platform crates via a distributed slice. One of those platform crates registered its own /health handler that required a specific application data type (DynamoDbEventStore) not present in this service’s configuration. When the platform crate’s /health handler was invoked, the missing app_data caused actix-web to return HTTP 400.
The initial diagnosis assumed actix-web uses last-match routing (the last registered handler wins). The attempted fix moved the service’s simple /health handler to after the configure() call. This had no effect.
actix-web 4.x uses first-match routing. The first registered handler for a given path receives all requests for that path. The platform crate’s /health handler, registered inside configure(), matched before the service’s handler regardless of where the service’s handler was registered in relation to configure().
Fix
Register the service-specific/health handler before configure():
GatewayTenantExtractor middleware was returning HTTP 400 (missing_tenant_context) on probe requests because Kubernetes health probes do not send the X-Eva-Tenant-Id identity header. Health and readiness paths were short-circuited in the middleware to bypass tenant extraction:
6. Lessons on Transitive Dependency Costs
The aws-lc-sys cascade illustrates a category of dependency cost that is largely invisible during development: native build requirements that propagate through multiple crate boundaries. The TLS backend selection inreqwest is a direct dependency decision. The native toolchain requirements in aws-lc-sys are four levels removed from that decision. There is no mechanism in Cargo to surface this cost at dependency resolution time.
Dependency Evaluation Checklist for TLS Backend Selection
| Question | Implication |
|---|---|
| Does the crate build native code from source? | Native toolchain required in CI container |
| Does it ship pre-built binaries or pre-generated source? | Check target platform coverage |
| What is the minimum compiler/assembler version required? | Verify against CI image baseline |
| Does it require cmake, Go, Perl, or other build-time tools? | Add to CI container or consider alternative crate |
| Is there a pure-Rust alternative with acceptable performance? | Evaluate ring vs aws-lc-rs for TLS workloads |
7. Recommendations
-
Audit transitive TLS backend dependencies before finalizing
reqwestfeature flags. Therustls-tlsfeature selectsaws-lc-rsby default. Evaluate whetherring(pure Rust, no native toolchain) ornative-tls(uses the OS TLS stack) better fits the deployment environment’s CI container baseline. -
Upgrade
aws-lc-systo 0.40.0 or later in any Rust service that includes it transitively. The pre-generated assembly approach eliminates cmake, nasm, golang, and perl from the build requirement. Pin the minimum version inCargo.tomlto prevent regression. -
Use
ZSTD_SYS_USE_PKG_CONFIG=1withlibzstd-devwhenzstd-sysis a transitive dependency and a shared NFS cargo cache is in use. This eliminateszstd-syssource extraction from the shared registry path, removing the.cargo-okrace condition for that crate. -
Use fd-based flocks, not subshell-based flocks, when serializing operations that depend on exported environment variables. The
flock <file> -c 'command'form spawns a subshell that does not inherit the parent environment. Fd-based locking (exec N>file; flock -x N; ...; flock -u N) keeps the lock in the current shell environment. -
Register health and readiness route handlers before
configure()in actix-web applications that use platform crates. Platform crates may register handlers for standard paths including/health. First-match routing means these handlers will shadow your service’s handlers ifconfigure()is called first. -
Pin minimum version constraints for dependencies in
Cargo.tomlafter resolving a native toolchain cascade. Without pins, futurecargo updateinvocations can silently regress to an older version that reintroduces the native build requirement.
Conclusion
The aws-lc-sys cascade represents a class of CI failure that is becoming more common as the Rust cryptographic ecosystem matures: crates with high-performance native implementations that bundle their C/assembly source introduce build toolchain requirements that are opaque to theCargo.toml dependency graph. The resolution pattern — check whether a newer version of the terminal crate ships pre-built artifacts before installing native build tools — is generalizable beyond this specific crate.
The concurrent .cargo-ok race condition on shared NFS registry volumes represents a separate but equally generalizable finding: shared mutable state in CI environments requires explicit serialization, and cargo’s sentinel-based extraction mechanism was not designed for concurrent access from multiple nodes to a shared filesystem.
Both issues are artifacts of how Rust’s build ecosystem interacts with containerized CI infrastructure. As Rust adoption in cloud-native environments grows, these interaction patterns will affect more teams. The patterns documented here — version upgrade over toolchain installation, fd-based locking for NFS-shared caches, minimum version pinning against regression — represent the current resolution approaches until the underlying crates or tooling addresses the root causes.
Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.