Problem Statement
The openshell-gateway binary unconditionally compiles in all four compute drivers (Kubernetes, Docker, Podman, VM). Operators who only need one pay for the others in:
- Supply-chain surface. A CVE in
bollard, kube-rs, oci-client, etc., applies to the shipped binary even when that driver is never instantiated.
- Build time and binary size. Every clean build pays for every driver.
- Downstream packaging. Producing e.g. a "Kubernetes-only" or "Docker-only" gateway image today requires forking
Cargo.toml.
Measured against the gateway's current dependency graph, a single-driver build (e.g. K8s-only or Docker-only) sheds a substantial fraction of the transitive crate set. Trimming one driver from the full default set saves less, because the four drivers share most of their tonic/hyper/serde stack. The headline savings come from picking a single driver, not from removing one.
The same need exists for telemetry as a build-time switch — already partially solved (a telemetry cargo feature exists on openshell-server), so this issue mainly extends an established pattern.
Proposed Design
Add per-driver cargo features on openshell-server, all enabled by default. Building with --no-default-features and selecting a subset produces a gateway that contains only the requested drivers (and optionally telemetry).
Feature surface:
default = ["telemetry", "driver-kubernetes", "driver-docker", "driver-podman", "driver-vm"]
telemetry — already exists; retained as-is.
driver-kubernetes, driver-docker, driver-podman, driver-vm — new; each gates the corresponding driver crate as an optional dependency.
Behavior contract:
- Default build is byte-equivalent to today (all drivers + telemetry).
- Any non-empty combination of driver features must compile and produce a working gateway that supports exactly those drivers.
- A build with zero driver features must fail at compile time with a clear message — not at first sandbox creation.
- If
gateway.toml configures a driver that wasn't compiled in, the gateway must fail at startup with a clear error naming the missing driver and the cargo flag that re-enables it.
- Auto-detection (
detect_driver() when no driver is configured) must skip drivers not compiled in and try the next available one in a deterministic order, rather than failing.
- Runtime telemetry behavior (the
OPENSHELL_TELEMETRY_ENABLED env var) is unchanged. The cargo feature controls whether the code is present; the env var controls whether present code emits.
CI: A build matrix that exercises the default build, each driver in isolation, and at least one mixed combo. Reuse the approach already used by tasks/scripts/verify-telemetry-compiled-out.sh (grep the binary for known strings from omitted crates) to verify drivers actually compile out.
Alternatives Considered
- Runtime-only gating. Cheapest, but doesn't shrink the binary, dependency graph, or supply-chain surface. The explicit ask is to not include the code, not just disable it.
- A single
slim feature that drops all non-K8s drivers. Inflexible — a Docker-only operator can't use it. Per-driver features cost the same to implement and give every combination for free.
- One gateway binary per driver. Multiplies release artifacts, breaks the single-binary deployment story, forces every config/auth change to ripple across N binaries.
- Dynamic plugin loading (cdylib). True runtime selection but introduces ABI-stability obligations, breaks single-static-binary deploys, and complicates signing/distribution. Overkill for the stated need.
- Workspace
default-members exclusion. Doesn't change openshell-server's own dependency graph — solves a different problem.
Agent Investigation
- Dependency footprint was measured directly with
cargo tree -p <crate> --edges normal --prefix none --no-dedupe. Numbers in the Problem Statement table are reproducible from HEAD. Note that "exclusive crates given the other three drivers are present" is a different number from "crates added on top of openshell-core" — the former is small for Docker (1) and Podman (1) because they share their transitive graph with K8s through tonic/hyper/serde; the latter (+19, +33) is what an operator actually saves in a single-driver build. The proposal motivates the feature in terms of single-driver builds, where the savings are real.
- Driver wiring is centralized in three sites in
openshell-server: imports/re-exports in crates/openshell-server/src/compute/mod.rs:9-38, four ComputeRuntime::new_* constructors in crates/openshell-server/src/compute/mod.rs:295-413, and the match driver block in crates/openshell-server/src/lib.rs:716-780. No driver type leaks elsewhere into the gateway, so feature-gating is mechanically straightforward.
- The CLI does not depend on driver crates (confirmed in
crates/openshell-cli/Cargo.toml) — it talks to the gateway over gRPC. No CLI changes needed.
- The
telemetry feature already follows the exact pattern proposed here. crates/openshell-server/Cargo.toml:99-107 defines default = ["telemetry"] and propagates to openshell-core/telemetry. crates/openshell-driver-vm/Cargo.toml:49-54 does the same, and crates/openshell-core/src/telemetry.rs uses #[cfg(feature = "telemetry")] on imports, the background sender thread, the HTTP client, and emission helpers (which become no-ops when the feature is off). tasks/scripts/verify-telemetry-compiled-out.sh already proves out the compile-out property in CI by grepping for events.telemetry.data.nvidia.com and the telemetry client ID. The driver feature surface should reuse this exact shape — no new pattern to invent.
- The VM driver is already out-of-process — the gateway just spawns
openshell-driver-vm as a subprocess and talks to it over a UDS gRPC channel (crates/openshell-server/src/compute/vm.rs). Its in-server footprint is small (a spawn helper + RemoteComputeDriver gRPC client), so a driver-vm feature is essentially gating that helper module.
- Existing
TODO(driver-abstraction) at crates/openshell-server/src/lib.rs:12-20 already anticipates collapsing the per-arm wiring. This proposal does not require that refactor — but lands well alongside it whenever it happens.
- README already documents
OPENSHELL_TELEMETRY_ENABLED as the runtime knob (README.md mentions it around line 251). This issue does not change the runtime env-var contract; it adds a parallel compile-time knob, exactly as the existing telemetry cargo feature does.
Checklist
Problem Statement
The
openshell-gatewaybinary unconditionally compiles in all four compute drivers (Kubernetes, Docker, Podman, VM). Operators who only need one pay for the others in:bollard,kube-rs,oci-client, etc., applies to the shipped binary even when that driver is never instantiated.Cargo.toml.Measured against the gateway's current dependency graph, a single-driver build (e.g. K8s-only or Docker-only) sheds a substantial fraction of the transitive crate set. Trimming one driver from the full default set saves less, because the four drivers share most of their
tonic/hyper/serdestack. The headline savings come from picking a single driver, not from removing one.The same need exists for telemetry as a build-time switch — already partially solved (a
telemetrycargo feature exists onopenshell-server), so this issue mainly extends an established pattern.Proposed Design
Add per-driver cargo features on
openshell-server, all enabled by default. Building with--no-default-featuresand selecting a subset produces a gateway that contains only the requested drivers (and optionally telemetry).Feature surface:
default = ["telemetry", "driver-kubernetes", "driver-docker", "driver-podman", "driver-vm"]telemetry— already exists; retained as-is.driver-kubernetes,driver-docker,driver-podman,driver-vm— new; each gates the corresponding driver crate as an optional dependency.Behavior contract:
gateway.tomlconfigures a driver that wasn't compiled in, the gateway must fail at startup with a clear error naming the missing driver and the cargo flag that re-enables it.detect_driver()when no driver is configured) must skip drivers not compiled in and try the next available one in a deterministic order, rather than failing.OPENSHELL_TELEMETRY_ENABLEDenv var) is unchanged. The cargo feature controls whether the code is present; the env var controls whether present code emits.CI: A build matrix that exercises the default build, each driver in isolation, and at least one mixed combo. Reuse the approach already used by
tasks/scripts/verify-telemetry-compiled-out.sh(grep the binary for known strings from omitted crates) to verify drivers actually compile out.Alternatives Considered
slimfeature that drops all non-K8s drivers. Inflexible — a Docker-only operator can't use it. Per-driver features cost the same to implement and give every combination for free.default-membersexclusion. Doesn't changeopenshell-server's own dependency graph — solves a different problem.Agent Investigation
cargo tree -p <crate> --edges normal --prefix none --no-dedupe. Numbers in the Problem Statement table are reproducible from HEAD. Note that "exclusive crates given the other three drivers are present" is a different number from "crates added on top ofopenshell-core" — the former is small for Docker (1) and Podman (1) because they share their transitive graph with K8s throughtonic/hyper/serde; the latter (+19, +33) is what an operator actually saves in a single-driver build. The proposal motivates the feature in terms of single-driver builds, where the savings are real.openshell-server: imports/re-exports incrates/openshell-server/src/compute/mod.rs:9-38, fourComputeRuntime::new_*constructors incrates/openshell-server/src/compute/mod.rs:295-413, and thematch driverblock incrates/openshell-server/src/lib.rs:716-780. No driver type leaks elsewhere into the gateway, so feature-gating is mechanically straightforward.crates/openshell-cli/Cargo.toml) — it talks to the gateway over gRPC. No CLI changes needed.telemetryfeature already follows the exact pattern proposed here.crates/openshell-server/Cargo.toml:99-107definesdefault = ["telemetry"]and propagates toopenshell-core/telemetry.crates/openshell-driver-vm/Cargo.toml:49-54does the same, andcrates/openshell-core/src/telemetry.rsuses#[cfg(feature = "telemetry")]on imports, the background sender thread, the HTTP client, and emission helpers (which become no-ops when the feature is off).tasks/scripts/verify-telemetry-compiled-out.shalready proves out the compile-out property in CI by grepping forevents.telemetry.data.nvidia.comand the telemetry client ID. The driver feature surface should reuse this exact shape — no new pattern to invent.openshell-driver-vmas a subprocess and talks to it over a UDS gRPC channel (crates/openshell-server/src/compute/vm.rs). Its in-server footprint is small (aspawnhelper +RemoteComputeDrivergRPC client), so adriver-vmfeature is essentially gating that helper module.TODO(driver-abstraction)atcrates/openshell-server/src/lib.rs:12-20already anticipates collapsing the per-arm wiring. This proposal does not require that refactor — but lands well alongside it whenever it happens.OPENSHELL_TELEMETRY_ENABLEDas the runtime knob (README.mdmentions it around line 251). This issue does not change the runtime env-var contract; it adds a parallel compile-time knob, exactly as the existingtelemetrycargo feature does.Checklist