Part 1 — Foundations
Anatomy of a Harness — The Seven Layers
Sections in this chapter
- 1Why seven
- 2Layer 1: Instruction
- 3Layer 2: Tools
- 4Layer 3: Memory and retrieval
- 5Layer 4: Execution
- 6Layer 5: Policy and approval
- 7Layer 6: Observability
- 8Layer 7: Evaluation
- 9The dependency graph
- 10Which layer to investigate first
- 11Which layer to build first
- 12A worked example: applying the seven layers
Key Takeaways
Insight
Memorise the order. It is not alphabetical and not accidental. It mirrors the lifecycle of a single request: the agent reads instructions, discovers tools, retrieves memory, executes actions, is check
Common Trap
Investigating agent failures by staring at individual failed conversations is a classic junior-engineer mistake. One conversation is an anecdote; ten thousand slices are data. Agents fail in distribut
Interview Questions
1Draw the seven layers of a production harness on a whiteboard and explain each one.
▲
Frame: draw the agent loop in the centre, surround with the layers, use the dependency graph in 3.8, state one sentence per layer about what it owns and what breaks without it.
2An agent is failing 40% of the time in production. Which layer do you investigate first and why?
▲
Frame: observability first, because every other answer is uninformed. Slice the failure rate along the five axes (mode, task type, tool, model version, input characteristic) and let the slice point at the layer.
3For a brand-new coding agent, which of the seven layers would you build first?
▲
Frame: observability, then execution (sandbox), then tools+instruction+memory together, then policy (dry-runs and approval gates on writes), then evaluation by month two. Justify each step with what it unblocks and why skipping it is unsafe.
4Which layer is most often neglected in early deployments?
▲
Frame: evaluation. Teams focus on shipping behaviour, treat evals as QA, and regret it on the first silent regression. Observability is a close second.
5If a single layer had to be outsourced to a vendor, which one and why?
▲
Frame: execution (sandboxing) is the most commonly and most safely bought. Building Firecracker-microVM-fast-start infrastructure in-house is expensive; the managed services (E2B, Modal, Daytona) are well differentiated. Observability is second-most-often bought (LangSmith, Langfuse, Braintrust, Arize).
6Describe one failure mode per layer.
▲
Frame: instruction drift across model upgrades; tools malformed destructive calls; memory memory poisoning; execution sandbox escape; policy indirect prompt injection bypass; observability PII in traces; evaluation judge-model miscalibration.