Part 5 — Sandboxed Execution Environments

Isolation, Sandboxing, and Fast Execution

Sections in this chapter

1Why trust boundaries exist
2The trust ladder
3Why default Docker isn't enough
4gVisor and Firecracker: the purpose-built rungs
5Per-task worktree isolation
6Network egress policies
7Filesystem controls
8Credential scoping
9Fast-start patterns
10Managed sandbox services
11A worked example: the coding-agent sandbox

Key Takeaways

Insight

The interview trap: a candidate says "we use Docker." The follow-up is "why is Docker not a sandbox for untrusted code?" Three answers to have ready: shared kernel (kernel exploits break out), def

Insight

The "agent exfiltrated credentials via DNS despite network restrictions" interview scenario has a specific fix: an allowlist-enforcing DNS resolver that answers only for explicitly allowed hostnames

Common Trap

The "we'll build our own Firecracker-based sandbox" decision is the most commonly-regretted engineering choice in this area. The undifferentiated heavy lifting — pool management, snapshot/restore, n

Interview Questions

Explain why default Docker is insufficient for running untrusted agent-generated code.

▲

Frame: six points. Shared kernel (escape via kernel bugs); default egress (any HTTPS destination); root-inside; risky default mounts; unset resource limits; broad capability set. Each is fixable, but defaults are where incidents live. Use a purpose-built sandbox (gVisor, Firecracker) for this workload.

Design a sandbox service targeting 500 ms cold start, 10k concurrent, full network isolation.

▲

Frame: the worked example in 11.10. Firecracker microVMs, managed or equivalent; pre-warmed pool; CRIU snapshots; COW worktrees; three-layer egress enforcement (DNS sinkhole, proxy, kernel); short-lived credentials; budgets; kill switch; per-sandbox tracing.

An agent exfiltrated credentials via a DNS channel despite network restrictions. What happened and how do you close it?

▲

Frame: the allowlist DNS resolver is permissive — it resolves anything by default and only blocks explicitly denied names. The attacker encodes data in subdomains of a resolvable name. Close: sinkhole everything by default; resolve only explicitly allowlisted hostnames; log and alert on queries that would

Walk the trust ladder. When would you use each rung?

▲

Frame: in-process for pure functions; Docker for trusted known-code (test runners with hardening); gVisor for workloads where Docker-like UX matters and you want stronger isolation; Firecracker microVMs for untrusted agent-generated code; separate accounts for multi-tenant code execution products. Match ru

Build or buy a sandbox service?

▲

Frame: default is buy. Build only when specific measured requirements (cost at extreme scale, compliance, data residency) can't be met by managed services and a quarter of senior engineer time is a justified investment. The undifferentiated heavy lifting (pooling, snapshots, networking, quotas) is substant

A sandbox ran in 300 ms but the first agent task in it took 12 seconds. What's wrong?

▲

Frame: cold start of the sandbox is not cold start of the task. The language runtime initialisation, package imports, first-time network connection to dependencies all contribute. Diagnose: trace the task start with a breakdown (runtime init, imports, first DNS query, first API call). Fix: pre-warm runtime