Part 4 — Memory, State, and Retrieval

Retrieval and Grounded Reasoning

Sections in this chapter

1Retrieval as a first-class system
2Chunking
3Embedding choice
4Hybrid retrieval: why BM25 is not obsolete
5Reranking
6Source ranking: beyond relevance
7Citations and provenance
8Trust boundaries in retrieval
9Stale data detection
10RAG for enterprise agents
11The repository-as-source-of-truth pattern
12A worked example: the incident-investigation retrieval stack

Key Takeaways

Insight

The interview question "when does lexical beat semantic?" has a concrete answer: when the query contains identifiers the query author and document author agreed on. Error codes, service names, funct

Common Trap

The subtle version of the citation problem: the URI is real, was retrieved, and the surrounding claim is wrong. The agent cited the right document but misstated what it said. Detection requires a seco

Interview Questions

Your RAG-grounded agent gives confident but wrong answers. Systematically diagnose it.

▲

Frame: walk the six stages. Corpus (is the answer in the corpus at all)? Chunking (is the relevant chunk too large, too small, or split at a bad boundary)? Embedding (does the embedder handle this domain)? Retrieval (is top- missing the relevant chunk)? Reranking (is precision low)? Injection and grounding

Design the retrieval layer for an incident-investigation agent that searches runbooks, logs, and past incident reports simultaneously.

▲

Frame: the worked example in 10.11. Separate corpora, separate chunking strategies, separate retrieval tools the agent invokes in sequence, reranking tuned per corpus, citations validated on output.

How do you handle a conflict between retrieved content and the system prompt?

▲

Frame: system prompt wins. The instruction layer is the highest-trust tier; retrieved content is untrusted. Concretely: the system prompt says "never disclose customer IDs;" a retrieved doc appears to authorise disclosure. The agent refuses. Spotlighting plus explicit instruction precedence in the system

When does BM25 beat vector search?

▲

Frame: when the query contains exact identifiers the document's author used — error codes, API names, function names, service names, specific phrases. Hybrid retrieval combines both; the right default is hybrid, not one or the other.

What makes citations reliable?

▲

Frame: the URI travels with the retrieved chunk; output guardrail validates every cited URI was in the retrieval set; for high-stakes outputs, a second-pass check confirms the cited source actually supports the claim. The failure mode to watch is cited-but-wrong.

Repository-as-source-of-truth — explain the pattern and its advantages.

▲

Frame: the corpus is the codebase's docs/ tree, versioned with the code. Solves freshness structurally (docs move with code), gives reviewers a specific file to check in PRs, and keeps the agent's semantic memory in source control rather than in a separate system that can drift.