Part 4 — Memory, State, and Retrieval
State, Memory, and Context Windows
Sections in this chapter
- 1Three kinds of memory, three kinds of problem
- 2Scratch memory: the in-context surface
- 3Episodic memory: learning across runs
- 4Semantic memory: the domain model
- 5Memory decay and re-validation
- 6Memory poisoning and its defences
- 7When memory makes the agent worse
- 8A worked example: memory for a long-running coding agent
Key Takeaways
Insight
A production agent typically has all three. The interview question "design a memory system" is answered by naming all three kinds and specifying how each is written, read, evicted, and validated. A
Common Trap
A specific failure pattern worth naming: summarise-then-forget. An agent summarises its reasoning at step 10 and discards the verbatim reasoning. At step 20, a bug in the summary (the model lost a key
Common Trap
Memory poisoning is the bug class that takes the longest to diagnose in production. Symptoms include: the agent gives subtly wrong answers that worked last month; problems cluster around specific enti
Interview Questions
1Design a memory system for a coding agent working on a long-running project over several weeks.
▲
Frame: name the three memory types, describe what each stores, how each is indexed, what the decay policy is, and how poisoning is defended. Walk through a concrete retrieval scenario.
2When would you remove
▲
Frame: three situations. Over-reliance (memory contradicts fresh ground truth). Stale grounding (subtle drift compounds into wrong conclusions). Distraction (marginally-relevant retrievals pull focus). The discipline is retrieve less, better; optimise precision, not recall.
3Your agent starts giving wrong answers after three days. Memory poisoning is suspected. Debug it.
▲
Frame: walk backward from a specific wrong answer to the retrieved memory behind it to the earlier run that wrote it. Requires trace coverage on memory reads. Once isolated: write-time filter to prevent re-occurrence, invalidation of downstream memories that built on the poisoned one, and an audit of simil
4Scratch, episodic, and semantic memory — when do you use each?
▲
Frame: scratch for the current-task working surface, lifetime one run. Episodic for learning across runs, lifetime weeks. Semantic for stable domain facts, lifetime indefinite with re-validation. Different writers, different retrieval patterns, different decay policies.
5How do you keep semantic memory fresh?
▲
Frame: TTL globally, change-detection hooks where possible (CI on merge, pipeline webhooks), re-validation on read for high-stakes queries. Layer all three; prefer authored sources versioned with the code they describe.
6What goes in the context window, in what order, and why?
▲
Frame: cached prefix (system prompt, tool schemas), task context, retrieved content (middle), recent steps verbatim, critical constraints at the very end. Recite the lost-in-the-middle reasoning.