AI Harness EngineeringChapter 12 of 19

Part 6Verification Loops and Quality Engineering

12

Guides, Sensors, and Self-Correction

Sections in this chapter

  1. 1The reframe
  2. 2Why you need both
  3. 3Computational versus inferential controls
  4. 4Writing checker messages for LLM consumption
  5. 5The approved-fixtures pattern
  6. 6Golden principles
  7. 7The recursive self-correction loop
  8. 8Validating the inferential controls
  9. 9The missing-sensor test
  10. 10A worked example: the pipeline-generation verification layer

Key Takeaways

Insight

The naming of guides and sensors comes from control theory. The agent's output is the control variable; guides are the feed-forward model; sensors are the feedback. The mental model ports cleanly and

Insight

Golden principles are a good illustration of the broader pattern: the more mechanical a task is, the safer it is to automate with an agent. Tasks involving judgement — architectural decisions, priorit

Common Trap

The "we improved our linter's messages by a weekend and the agent's first-attempt pass rate went from 45% to 68%" war story appears in multiple production teams' post-mortems. The investment is triv

Interview Questions

1

Design a verification layer for a coding agent. What are your three most important sensors?

Frame: name the split (computational vs inferential) first. Three critical sensors: (1) test runner on a human-authored test suite (the approved-fixtures pattern); (2) type checker / linter with LLM-friendly error messages; (3) an LLM-reviewer sensor for intent-match — does the code solve the stated task,

2

What makes an LLM judge reliable versus unreliable? How do you validate the judge itself?

Frame: calibration against a human-scored set; pairwise over absolute; ensemble over single call. Measure judge-human agreement; treat sub-80% as unready. A judge used without calibration adds uncertainty rather than reducing it.

3

The agent passes all tests but the code still doesn't do what we want. What sensor is missing?

Frame: the missing-sensor test. Either a coverage sensor (new code has untested paths) or a requirements sensor (does the code satisfy the spec, not just the tests). The failure is the tests themselves are incomplete relative to intent. The fix is a sensor that validates against intent, not against a test

4

Explain guides and sensors. Why do you need both?

Frame: guides are feed-forward (increase first-attempt quality); sensors are feedback (catch residual). Feedback-only converges slowly; feed-forward-only drifts. Both together converge quickly and stay correct. Control-theory analogy helps in the interview.

5

Golden principles — what, why, and when are they appropriate?

Frame: mechanical repo rules that a scheduled agent enforces (line length, import order, missing types, no circular imports). Appropriate when the rule is mechanical, the fix is bounded, and the review is batch. Not appropriate for judgement-heavy rules or architectural changes.

6

How do you make linter and test-runner output useful to an agent?

Frame: structured messages with code, message, fix_hint, context, rule_url. Re-format existing tool output through an adapter layer if the tool's native output is designed for humans. The investment is small; first-attempt pass rate improvements are substantial.