Part 5 — Sandboxed Execution Environments
Isolation, Sandboxing, and Fast Execution
Sections in this chapter
- 1Why trust boundaries exist
- 2The trust ladder
- 3Why default Docker isn't enough
- 4gVisor and Firecracker: the purpose-built rungs
- 5Per-task worktree isolation
- 6Network egress policies
- 7Filesystem controls
- 8Credential scoping
- 9Fast-start patterns
- 10Managed sandbox services
- 11A worked example: the coding-agent sandbox
Key Takeaways
Insight
The interview trap: a candidate says "we use Docker." The follow-up is "why is Docker not a sandbox for untrusted code?" Three answers to have ready: shared kernel (kernel exploits break out), def
Insight
The "agent exfiltrated credentials via DNS despite network restrictions" interview scenario has a specific fix: an allowlist-enforcing DNS resolver that answers only for explicitly allowed hostnames
Common Trap
The "we'll build our own Firecracker-based sandbox" decision is the most commonly-regretted engineering choice in this area. The undifferentiated heavy lifting — pool management, snapshot/restore, n
Interview Questions
1Explain why default Docker is insufficient for running untrusted agent-generated code.
▲
Frame: six points. Shared kernel (escape via kernel bugs); default egress (any HTTPS destination); root-inside; risky default mounts; unset resource limits; broad capability set. Each is fixable, but defaults are where incidents live. Use a purpose-built sandbox (gVisor, Firecracker) for this workload.
2Design a sandbox service targeting 500 ms cold start, 10k concurrent, full network isolation.
▲
Frame: the worked example in 11.10. Firecracker microVMs, managed or equivalent; pre-warmed pool; CRIU snapshots; COW worktrees; three-layer egress enforcement (DNS sinkhole, proxy, kernel); short-lived credentials; budgets; kill switch; per-sandbox tracing.
3An agent exfiltrated credentials via a DNS channel despite network restrictions. What happened and how do you close it?
▲
Frame: the allowlist DNS resolver is permissive — it resolves anything by default and only blocks explicitly denied names. The attacker encodes data in subdomains of a resolvable name. Close: sinkhole everything by default; resolve only explicitly allowlisted hostnames; log and alert on queries that would
4Walk the trust ladder. When would you use each rung?
▲
Frame: in-process for pure functions; Docker for trusted known-code (test runners with hardening); gVisor for workloads where Docker-like UX matters and you want stronger isolation; Firecracker microVMs for untrusted agent-generated code; separate accounts for multi-tenant code execution products. Match ru
5Build or buy a sandbox service?
▲
Frame: default is buy. Build only when specific measured requirements (cost at extreme scale, compliance, data residency) can't be met by managed services and a quarter of senior engineer time is a justified investment. The undifferentiated heavy lifting (pooling, snapshots, networking, quotas) is substant
6A sandbox ran in 300 ms but the first agent task in it took 12 seconds. What's wrong?
▲
Frame: cold start of the sandbox is not cold start of the task. The language runtime initialisation, package imports, first-time network connection to dependencies all contribute. Diagnose: trace the task start with a breakdown (runtime init, imports, first DNS query, first API call). Fix: pre-warm runtime