Part 3 — Tools, Actions, and MCP

Tool Calling and Safe Action Design

Sections in this chapter

1The trust ladder
2Function schema design
3Input validation beyond the schema
4Read versus write: the separation principle
5Error messages for LLM consumption
6Retries, idempotency, and at-most-once semantics
7Human approval gates
8Dry-run mode
9Permission checks: the principal problem
10Tool result formatting
11A worked example: the Pipeline Mutation Tool

Key Takeaways

Insight

A common interview setup: "You're designing a delete_x tool. What safety properties do you enforce?" The expected answer walks the destructive-tool rung: dry-run default, idempotency key, approval

Insight

Dry-run by default" is the closest thing agentic engineering has to "use HTTPS by default." It is a one-line change that eliminates a whole class of production incidents. If you take a second les

Common Trap

Teams that spend a weekend improving tool error messages routinely report double-digit reductions in agent loop rate and failed-run rate. Yet error messages are the last thing anyone instruments. If y

Common Trap

An agent with a service-account credential that has broader scope than the invoking user is a lateral-privilege-escalation vector. A user with read access can, via the agent, perform writes. This is t

Interview Questions

Design the schema for a delete_pipeline

▲

Frame: walk the destructive-tool rung. Required fields: pipeline_id (pattern-constrained), reason, and an explicit confirm_id (e.g., the pipeline's current name repeated, to prevent paste errors). dry_run=true default. Mandatory idempotency key. Approval gate for prod pipelines. Separate get_pipeline m

Your agent calls the same write tool four times identically. What went wrong, and how do you catch it?

▲

Frame: three diagnoses. (a) No idempotency key, so transient failures caused true repeats. (b) No duplicate-call detection, so agent retry logic wasn't interrupted. (c) Tool error messages didn't give the agent actionable recovery info, so it retried the same call hoping for a different result. Defences: a

How do you make error messages from a failed tool call actually help the agent recover?

▲

Frame: structured error with code, message, hint, retryable, suggested_tools. Example: pipeline_not_found with a hint naming similar-spelled pipelines and suggesting list_pipelines. This is disproportionately high ROI.

Explain the principal-of-action question for agent tools.

▲

Frame: three choices — service account, invoking user, scoped delegation. The right default is invoking user (RBAC transfers naturally). Service account leads to lateral privilege escalation and fails security review. Scoped delegation adds a layer of defence for high-risk agents. Implementation: short-liv

When do you require a human approval gate?

▲

Frame: name the six triggers — tool class (destructive), scope (prod), scale (batch size), amount (monetary threshold), novelty (first call with these args this session), confidence (if calibrated). Policy is versioned and reviewed, not hardcoded.

Dry-run by default — defend it in a design review.

▲

Frame: three arguments. It eliminates a whole class of incidents from accidental execution. It gives approvers a concrete artefact to review. It enables end-to-end testing without touching production. The opt-in cost is one parameter. The cost of not having it is the first catastrophic incident.