Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.aidonow.com/llms.txt

Use this file to discover all available pages before exploring further.

The aws-lc-sys build cascade: four failed toolchain additions resolved by upgrading to 0.40.0 with pre-generated assembly

Executive Summary

A Rust service CI pipeline began failing with a missing assembler error after a routine dependency update. The root cause was a four-level transitive dependency chain — reqwest → hyper-rustls → rustls → aws-lc-rs → aws-lc-sys — in which the terminal crate builds BoringSSL from bundled C and assembly source, requiring cmake, NASM, Go, and Perl as build prerequisites. None of these tools were present in the CI container image. Eight commits over one afternoon traced the requirement chain incrementally, each revealing a new missing tool, before the resolution: upgrading aws-lc-sys from 0.37.1 to 0.40.0, which ships pre-generated assembly files and reduced the native toolchain requirement to gcc and as — both already present in build-essential. The same CI pipeline concurrently surfaced a separate but related issue: a .cargo-ok race condition on a shared NFS-backed cargo registry volume, requiring an fd-based file lock for correct serialization. This paper documents both cascades as a study in transitive dependency cost discovery in containerized Rust CI.

Key Findings

  • Transitive TLS backend selection silently introduces native toolchain dependencies: choosing rustls as the TLS backend in reqwest propagates through three crate boundaries to a C/assembly build requirement that does not appear in any direct dependency and produces no warning until the build fails.
  • AWS_LC_SYS_NO_ASM=1 does not work for release builds: the environment variable that disables BoringSSL’s assembly paths applies only to debug builds; release builds ignore it, making it an unreliable workaround.
  • aws-lc-sys 0.40.0 eliminates the native toolchain requirement through pre-generated assembly: the upgrade from 0.37.1 to 0.40.0 ships platform-specific pre-generated .S files that cc assembles using gcc/as, removing the dependency on cmake, nasm, golang, and perl simultaneously.
  • Shared NFS-backed cargo registry volumes introduce a .cargo-ok race condition under concurrent job execution: cargo uses O_CREAT|O_EXCL for extraction sentinels, which fails with EEXIST when a concurrent job on the same NFS share has already written the sentinel — requiring explicit serialization via an fd-based flock.
  • actix-web 4.x uses first-match routing, not last-match: registering a catch-all handler via configure() before a specific path handler causes the specific handler to be silently shadowed, producing HTTP 400 errors that are difficult to attribute to routing configuration.
  • Pinning resolved dependency ranges in Cargo.toml prevents future cargo update regressions: without explicit version pins, a subsequent cargo update can silently regress to an older aws-lc-sys that reintroduces the native toolchain requirement.

1. The Dependency Chain

The cascade originates in a common Rust HTTP client choice. reqwest with TLS enabled selects rustls as its TLS backend by default when the rustls-tls feature flag is active. rustls in turn selects aws-lc-rs as its cryptographic provider. aws-lc-rs depends on aws-lc-sys, which provides Rust bindings to AWS-LC — a fork of BoringSSL.
reqwest 0.12 (rustls-tls feature)
  └── hyper-rustls 0.27
        └── rustls 0.23.36
              └── aws-lc-rs 1.15.4
                    └── aws-lc-sys 0.37.1  ← builds BoringSSL from source
The same chain is reached via the AWS SDK dependency in the same service:
aws-config 1.1 / aws-sdk-dynamodb 1.19
  └── (same rustls → aws-lc-rs → aws-lc-sys chain)
aws-lc-sys 0.37.1 builds BoringSSL from bundled source using cmake as the build driver. BoringSSL’s cmake build requires: NASM for x86_64 assembly optimization (AES-NI, SHA-NI, GHASH paths), Go for code generation and header file production during cmake configure, and Perl for cmake configure scripts. None of these tools are present in a minimal Ubuntu-based CI container image. The cascade began when this dependency chain was first introduced — by a dependency update that changed the resolved version of rustls — and the build failed with nasm: not found.

2. The Eight-Commit Cascade

All eight commits occurred on a single afternoon, each representing a hypothesis, a test, and a discovery.

Commit 1: Add nasm

Hypothesis: The build requires NASM. Adding it will resolve the failure.
# publish.yml — before
- run: apt-get install -y gcc build-essential cmake pkg-config libssl-dev

# publish.yml — after
- run: apt-get install -y gcc build-essential cmake nasm pkg-config libssl-dev
Outcome: Build advanced further, then failed. A different missing tool was now the blocker.

Commit 2: Try AWS_LC_SYS_NO_ASM=1

Hypothesis: The AWS_LC_SYS_NO_ASM=1 environment variable disables BoringSSL’s assembly paths, building in pure C and eliminating the NASM requirement.
env:
  AWS_LC_SYS_NO_ASM: "1"
Outcome: The variable worked in the debug build but was ignored in release mode. The release build (used in CI) continued to require NASM. The environment variable is not documented as release-build-incompatible; this was discovered empirically.
AWS_LC_SYS_NO_ASM=1 applies only to debug builds of aws-lc-sys. Release builds ignore it. This behavior is not prominently documented and produces no warning when the variable is set but has no effect.

Commit 3: Add golang

Hypothesis: After restoring NASM and removing the NO_ASM flag, the cmake configure step fails. BoringSSL requires a Go compiler during cmake configuration to generate boringssl.h and other headers.
- run: apt-get install -y gcc build-essential cmake nasm golang-go pkg-config libssl-dev
Outcome: cmake configure advanced further, then failed. Another missing tool.

Commit 4: Add perl

Hypothesis: BoringSSL’s cmake configure scripts require Perl. Even with gcc, NASM, and Go present, the cmake step invokes Perl scripts that fail silently or with an obscure error.
- run: apt-get install -y gcc build-essential cmake nasm golang-go perl pkg-config libssl-dev
Outcome: The native toolchain dependency list was now four tools (cmake, nasm, golang, perl) plus build-essential — a significant addition to the CI container for a dependency that entered the graph transitively without any explicit decision. At this point, an alternative approach was evaluated: continue fixing toolchain availability, or upgrade the dependency.

3. The Resolution: Upgrading to 0.40.0

aws-lc-sys 0.40.0 changed its build strategy for x86_64 Linux: rather than invoking cmake to build BoringSSL from C and assembly source, it ships pre-generated .S assembly files in a generated-src/linux-x86_64/ directory within the crate. The cc crate assembles these using gcc and as, both included in build-essential. The native toolchain requirements under each version:
Toolaws-lc-sys 0.37.1aws-lc-sys 0.40.0
cmakeRequired (BoringSSL build driver)Not required
nasmRequired (x86_64 assembly)Not required
golangRequired (cmake configure step)Not required
perlRequired (cmake configure scripts)Not required
gcc / asRequiredRequired
The upgrade required updating aws-lc-rs from 1.15.4 to 1.16.3 (which depends on the newer aws-lc-sys). To prevent future cargo update regressions to an older version:
# Cargo.toml — explicit minimum version pins
[dependencies]
aws-config = { version = ">=1.5", ... }
aws-sdk-dynamodb = { version = ">=1.40", ... }
Without these pins, cargo update could resolve to an earlier aws-lc-rs that depends on aws-lc-sys 0.37.x, silently reintroducing the full native toolchain requirement. The CI manifest after the upgrade:
# publish.yml — after upgrade
- run: apt-get install -y gcc build-essential pkg-config libssl-dev
cmake, nasm, golang-go, and perl removed.

4. The Concurrent Race Condition: zstd-sys and Shared NFS

While the aws-lc-sys cascade was being resolved, a concurrent but independent build failure was occurring intermittently: error: failed to extract archive on zstd-sys, with the underlying cause File exists (os error 17): O_CREAT|O_EXCL on .cargo-ok.

Root Cause

The CI runner operates as a Kubernetes Deployment with three replicas. All three replicas share a single NFS-backed PVC mounted at the cargo registry path. When multiple replicas run builds concurrently, each attempts to extract zstd-sys’s bundled C source to the shared NFS path. Cargo uses O_CREAT|O_EXCL — an atomic create-if-not-exists operation — to write the .cargo-ok sentinel after successful extraction. If replica A has already extracted and written .cargo-ok, replica B’s O_CREAT|O_EXCL call fails with EEXIST (error 17), aborting the build.

Fix Evolution

The correct fix requires serializing cargo builds across replicas at the registry extraction step, using an fd-based flock that keeps the lock held across the sentinel-clear and build operations:
# Final fix: fd-based flock to serialize registry access
exec 200>/tmp/cargo-node-extract.lock
flock -x 200

# Clear stale sentinels before build
find /root/.cargo/registry/src -name '.cargo-ok' -delete

# Build with lock held
cargo build --release --workspace

flock -u 200
The fd-based approach, rather than flock <file> -c 'command', is necessary because the subshell form does not inherit the parent’s exported environment variables, including the per-run CARGO_HOME isolation variable. Using a subshell caused builds to fall back to the shared /root/.cargo path, defeating run isolation. A complementary fix — ZSTD_SYS_USE_PKG_CONFIG=1 with libzstd-dev installed — directs zstd-sys to link against the system library rather than building from bundled source, eliminating zstd-sys’s source extraction to the shared registry path entirely:
env:
  ZSTD_SYS_USE_PKG_CONFIG: "1"
- run: apt-get install -y gcc build-essential libzstd-dev pkg-config libssl-dev
ZSTD_SYS_USE_PKG_CONFIG=1 requires that the system libzstd version satisfies zstd-sys’s minimum version constraint. Verify with pkg-config --modversion libzstd before relying on this approach. A version mismatch produces a link error at build time, not a pkg-config error at configure time.

5. Bonus Finding: actix-web First-Match Routing

The same service exhibited a separate failure unrelated to the dependency cascade: Kubernetes liveness probes on /health returned HTTP 400, causing the pods to enter a crash loop immediately after the first successful start.

Root Cause

The service’s main function called configure() to register all routes contributed by platform crates via a distributed slice. One of those platform crates registered its own /health handler that required a specific application data type (DynamoDbEventStore) not present in this service’s configuration. When the platform crate’s /health handler was invoked, the missing app_data caused actix-web to return HTTP 400. The initial diagnosis assumed actix-web uses last-match routing (the last registered handler wins). The attempted fix moved the service’s simple /health handler to after the configure() call. This had no effect. actix-web 4.x uses first-match routing. The first registered handler for a given path receives all requests for that path. The platform crate’s /health handler, registered inside configure(), matched before the service’s handler regardless of where the service’s handler was registered in relation to configure().

Fix

Register the service-specific /health handler before configure():
// Before — service handler shadowed by platform handler
App::new()
    .configure(register_all_routes)  // platform /health registered here
    .route("/health", web::get().to(health_check))  // never reached

// After — service handler wins (first-match)
App::new()
    .route("/health", web::get().to(health_check))  // registered first
    .configure(register_all_routes)  // platform /health shadowed
A secondary fix was required: the GatewayTenantExtractor middleware was returning HTTP 400 (missing_tenant_context) on probe requests because Kubernetes health probes do not send the X-Eva-Tenant-Id identity header. Health and readiness paths were short-circuited in the middleware to bypass tenant extraction:
let path = req.path().to_owned();
if path == "/health" || path == "/health/ready" {
    return Ok(svc.call(req).await?.map_into_left_body());
}
When using configure() to register routes from external crates, assume that any path registered inside configure() will match before any path registered after the configure() call. If a platform crate registers a handler for a path that your service needs to own, register your handler before calling configure().

6. Lessons on Transitive Dependency Costs

The aws-lc-sys cascade illustrates a category of dependency cost that is largely invisible during development: native build requirements that propagate through multiple crate boundaries. The TLS backend selection in reqwest is a direct dependency decision. The native toolchain requirements in aws-lc-sys are four levels removed from that decision. There is no mechanism in Cargo to surface this cost at dependency resolution time.

Dependency Evaluation Checklist for TLS Backend Selection

QuestionImplication
Does the crate build native code from source?Native toolchain required in CI container
Does it ship pre-built binaries or pre-generated source?Check target platform coverage
What is the minimum compiler/assembler version required?Verify against CI image baseline
Does it require cmake, Go, Perl, or other build-time tools?Add to CI container or consider alternative crate
Is there a pure-Rust alternative with acceptable performance?Evaluate ring vs aws-lc-rs for TLS workloads

7. Recommendations

  1. Audit transitive TLS backend dependencies before finalizing reqwest feature flags. The rustls-tls feature selects aws-lc-rs by default. Evaluate whether ring (pure Rust, no native toolchain) or native-tls (uses the OS TLS stack) better fits the deployment environment’s CI container baseline.
  2. Upgrade aws-lc-sys to 0.40.0 or later in any Rust service that includes it transitively. The pre-generated assembly approach eliminates cmake, nasm, golang, and perl from the build requirement. Pin the minimum version in Cargo.toml to prevent regression.
  3. Use ZSTD_SYS_USE_PKG_CONFIG=1 with libzstd-dev when zstd-sys is a transitive dependency and a shared NFS cargo cache is in use. This eliminates zstd-sys source extraction from the shared registry path, removing the .cargo-ok race condition for that crate.
  4. Use fd-based flocks, not subshell-based flocks, when serializing operations that depend on exported environment variables. The flock <file> -c 'command' form spawns a subshell that does not inherit the parent environment. Fd-based locking (exec N>file; flock -x N; ...; flock -u N) keeps the lock in the current shell environment.
  5. Register health and readiness route handlers before configure() in actix-web applications that use platform crates. Platform crates may register handlers for standard paths including /health. First-match routing means these handlers will shadow your service’s handlers if configure() is called first.
  6. Pin minimum version constraints for dependencies in Cargo.toml after resolving a native toolchain cascade. Without pins, future cargo update invocations can silently regress to an older version that reintroduces the native build requirement.

Conclusion

The aws-lc-sys cascade represents a class of CI failure that is becoming more common as the Rust cryptographic ecosystem matures: crates with high-performance native implementations that bundle their C/assembly source introduce build toolchain requirements that are opaque to the Cargo.toml dependency graph. The resolution pattern — check whether a newer version of the terminal crate ships pre-built artifacts before installing native build tools — is generalizable beyond this specific crate. The concurrent .cargo-ok race condition on shared NFS registry volumes represents a separate but equally generalizable finding: shared mutable state in CI environments requires explicit serialization, and cargo’s sentinel-based extraction mechanism was not designed for concurrent access from multiple nodes to a shared filesystem. Both issues are artifacts of how Rust’s build ecosystem interacts with containerized CI infrastructure. As Rust adoption in cloud-native environments grows, these interaction patterns will affect more teams. The patterns documented here — version upgrade over toolchain installation, fd-based locking for NFS-shared caches, minimum version pinning against regression — represent the current resolution approaches until the underlying crates or tooling addresses the root causes.
Code examples are sanitized and generalized. No proprietary information is shared. Opinions are my own and do not reflect my employer’s views.