Part 3 — Tools, Actions, and MCP
Tool Calling and Safe Action Design
Sections in this chapter
- 1The trust ladder
- 2Function schema design
- 3Input validation beyond the schema
- 4Read versus write: the separation principle
- 5Error messages for LLM consumption
- 6Retries, idempotency, and at-most-once semantics
- 7Human approval gates
- 8Dry-run mode
- 9Permission checks: the principal problem
- 10Tool result formatting
- 11A worked example: the Pipeline Mutation Tool
Key Takeaways
Insight
A common interview setup: "You're designing a delete_x tool. What safety properties do you enforce?" The expected answer walks the destructive-tool rung: dry-run default, idempotency key, approval
Insight
Dry-run by default" is the closest thing agentic engineering has to "use HTTPS by default." It is a one-line change that eliminates a whole class of production incidents. If you take a second les
Common Trap
Teams that spend a weekend improving tool error messages routinely report double-digit reductions in agent loop rate and failed-run rate. Yet error messages are the last thing anyone instruments. If y
Common Trap
An agent with a service-account credential that has broader scope than the invoking user is a lateral-privilege-escalation vector. A user with read access can, via the agent, perform writes. This is t
Interview Questions
1Design the schema for a delete_pipeline
▲
Frame: walk the destructive-tool rung. Required fields: pipeline_id (pattern-constrained), reason, and an explicit confirm_id (e.g., the pipeline's current name repeated, to prevent paste errors). dry_run=true default. Mandatory idempotency key. Approval gate for prod pipelines. Separate get_pipeline m
2Your agent calls the same write tool four times identically. What went wrong, and how do you catch it?
▲
Frame: three diagnoses. (a) No idempotency key, so transient failures caused true repeats. (b) No duplicate-call detection, so agent retry logic wasn't interrupted. (c) Tool error messages didn't give the agent actionable recovery info, so it retried the same call hoping for a different result. Defences: a
3How do you make error messages from a failed tool call actually help the agent recover?
▲
Frame: structured error with code, message, hint, retryable, suggested_tools. Example: pipeline_not_found with a hint naming similar-spelled pipelines and suggesting list_pipelines. This is disproportionately high ROI.
4Explain the principal-of-action question for agent tools.
▲
Frame: three choices — service account, invoking user, scoped delegation. The right default is invoking user (RBAC transfers naturally). Service account leads to lateral privilege escalation and fails security review. Scoped delegation adds a layer of defence for high-risk agents. Implementation: short-liv
5When do you require a human approval gate?
▲
Frame: name the six triggers — tool class (destructive), scope (prod), scale (batch size), amount (monetary threshold), novelty (first call with these args this session), confidence (if calibrated). Policy is versioned and reviewed, not hardcoded.
6Dry-run by default — defend it in a design review.
▲
Frame: three arguments. It eliminates a whole class of incidents from accidental execution. It gives approvers a concrete artefact to review. It enables end-to-end testing without touching production. The opt-in cost is one parameter. The cost of not having it is the first catastrophic incident.