AI Harness EngineeringChapter 9 of 19

Part 4Memory, State, and Retrieval

09

State, Memory, and Context Windows

Sections in this chapter

  1. 1Three kinds of memory, three kinds of problem
  2. 2Scratch memory: the in-context surface
  3. 3Episodic memory: learning across runs
  4. 4Semantic memory: the domain model
  5. 5Memory decay and re-validation
  6. 6Memory poisoning and its defences
  7. 7When memory makes the agent worse
  8. 8A worked example: memory for a long-running coding agent

Key Takeaways

Insight

A production agent typically has all three. The interview question "design a memory system" is answered by naming all three kinds and specifying how each is written, read, evicted, and validated. A

Common Trap

A specific failure pattern worth naming: summarise-then-forget. An agent summarises its reasoning at step 10 and discards the verbatim reasoning. At step 20, a bug in the summary (the model lost a key

Common Trap

Memory poisoning is the bug class that takes the longest to diagnose in production. Symptoms include: the agent gives subtly wrong answers that worked last month; problems cluster around specific enti

Interview Questions

1

Design a memory system for a coding agent working on a long-running project over several weeks.

Frame: name the three memory types, describe what each stores, how each is indexed, what the decay policy is, and how poisoning is defended. Walk through a concrete retrieval scenario.

2

When would you remove

Frame: three situations. Over-reliance (memory contradicts fresh ground truth). Stale grounding (subtle drift compounds into wrong conclusions). Distraction (marginally-relevant retrievals pull focus). The discipline is retrieve less, better; optimise precision, not recall.

3

Your agent starts giving wrong answers after three days. Memory poisoning is suspected. Debug it.

Frame: walk backward from a specific wrong answer to the retrieved memory behind it to the earlier run that wrote it. Requires trace coverage on memory reads. Once isolated: write-time filter to prevent re-occurrence, invalidation of downstream memories that built on the poisoned one, and an audit of simil

4

Scratch, episodic, and semantic memory — when do you use each?

Frame: scratch for the current-task working surface, lifetime one run. Episodic for learning across runs, lifetime weeks. Semantic for stable domain facts, lifetime indefinite with re-validation. Different writers, different retrieval patterns, different decay policies.

5

How do you keep semantic memory fresh?

Frame: TTL globally, change-detection hooks where possible (CI on merge, pipeline webhooks), re-validation on read for high-stakes queries. Layer all three; prefer authored sources versioned with the code they describe.

6

What goes in the context window, in what order, and why?

Frame: cached prefix (system prompt, tool schemas), task context, retrieved content (middle), recent steps verbatim, critical constraints at the very end. Recite the lost-in-the-middle reasoning.