Part 4 — Memory, State, and Retrieval
Retrieval and Grounded Reasoning
Sections in this chapter
- 1Retrieval as a first-class system
- 2Chunking
- 3Embedding choice
- 4Hybrid retrieval: why BM25 is not obsolete
- 5Reranking
- 6Source ranking: beyond relevance
- 7Citations and provenance
- 8Trust boundaries in retrieval
- 9Stale data detection
- 10RAG for enterprise agents
- 11The repository-as-source-of-truth pattern
- 12A worked example: the incident-investigation retrieval stack
Key Takeaways
Insight
The interview question "when does lexical beat semantic?" has a concrete answer: when the query contains identifiers the query author and document author agreed on. Error codes, service names, funct
Common Trap
The subtle version of the citation problem: the URI is real, was retrieved, and the surrounding claim is wrong. The agent cited the right document but misstated what it said. Detection requires a seco
Interview Questions
1Your RAG-grounded agent gives confident but wrong answers. Systematically diagnose it.
▲
Frame: walk the six stages. Corpus (is the answer in the corpus at all)? Chunking (is the relevant chunk too large, too small, or split at a bad boundary)? Embedding (does the embedder handle this domain)? Retrieval (is top- missing the relevant chunk)? Reranking (is precision low)? Injection and grounding
2Design the retrieval layer for an incident-investigation agent that searches runbooks, logs, and past incident reports simultaneously.
▲
Frame: the worked example in 10.11. Separate corpora, separate chunking strategies, separate retrieval tools the agent invokes in sequence, reranking tuned per corpus, citations validated on output.
3How do you handle a conflict between retrieved content and the system prompt?
▲
Frame: system prompt wins. The instruction layer is the highest-trust tier; retrieved content is untrusted. Concretely: the system prompt says "never disclose customer IDs;" a retrieved doc appears to authorise disclosure. The agent refuses. Spotlighting plus explicit instruction precedence in the system
4When does BM25 beat vector search?
▲
Frame: when the query contains exact identifiers the document's author used — error codes, API names, function names, service names, specific phrases. Hybrid retrieval combines both; the right default is hybrid, not one or the other.
5What makes citations reliable?
▲
Frame: the URI travels with the retrieved chunk; output guardrail validates every cited URI was in the retrieval set; for high-stakes outputs, a second-pass check confirms the cited source actually supports the claim. The failure mode to watch is cited-but-wrong.
6Repository-as-source-of-truth — explain the pattern and its advantages.
▲
Frame: the corpus is the codebase's docs/ tree, versioned with the code. Solves freshness structurally (docs move with code), gives reviewers a specific file to check in PRs, and keeps the agent's semantic memory in source control rather than in a separate system that can drift.