Human-in-the-Loop & Escalation

When should an agent stop and ask? Define the boundaries before production.

On this page

The Failure Scenario
Why This Matters
How to Implement
Production Checklist
Common Pitfalls
Terminal Output

The Failure Scenario

An enterprise deploys an AI agent to handle procurement approvals under $10,000. The agent reviews purchase requests, checks budget allocations, and either approves or flags for human review. The policy is simple: approve routine purchases, escalate anything unusual. But "unusual" is never formally defined.

A department submits a $9,500 request for "consulting services" from a vendor with no prior relationship. The agent approves it because the amount is under the threshold, the budget category exists, and the request format is standard. The vendor turns out to be a shell company run by the department head's spouse. The agent had no way to know this, but a human reviewer would have recognized the vendor name and flagged the conflict of interest.

The post-mortem reveals the core problem: the escalation policy was based entirely on dollar amount and budget codes. There were no rules for vendor novelty, relationship checks, or pattern anomalies. The agent was given a binary decision (approve or escalate) without the contextual judgment framework that makes escalation decisions meaningful. The $10,000 threshold gave everyone false confidence that risk was being managed.

Why This Matters

Human-in-the-loop is not a safety blanket you throw over an agent deployment. It is an engineering discipline that requires explicit escalation policies, well-designed handoff interfaces, and measured response-time SLAs. Done poorly, it creates one of two failure modes: the agent never escalates (rubber-stamping risk), or the agent escalates everything (making humans the bottleneck and defeating the purpose of automation).

The stakes scale with the agent's authority. An agent that drafts emails can tolerate a "send and review later" model. An agent that executes financial transactions, modifies infrastructure, or communicates with external parties needs synchronous approval gates where execution blocks until a human explicitly approves. The autonomy level must match the reversibility of the action. Irreversible actions demand pre-approval; reversible actions can use post-review.

From a governance perspective, regulators increasingly expect documented escalation policies for AI systems. The EU AI Act requires human oversight mechanisms for high-risk AI. Even in unregulated contexts, the liability question is clear: if an agent makes a damaging decision without offering a human the opportunity to intervene, the organization is on the hook for a design failure, not just an AI error.

How to Implement

Define three to five autonomy tiers, each with explicit criteria for which actions fall into that tier. Tier 0 is fully autonomous; the agent acts without any human involvement. Tier 1 is act-then-notify; the agent executes but sends a notification for async review. Tier 2 is propose-and-wait; the agent prepares the action but does not execute until a human approves. Tier 3 is full handoff; the agent transfers the conversation or task to a human with full context.

Map every tool call and action type to a tier based on three factors: reversibility (can you undo it?), impact scope (how many people or dollars are affected?), and confidence (how certain is the agent about the right action?). Build a decision matrix and store it as configuration that can be updated without redeploying the agent. When in doubt about a tier assignment, move it one tier higher. It's always safer to over-escalate during the first month and then relax controls based on data.

Design the escalation handoff to include full context: the user's original request, the agent's proposed action, the reasoning chain, the confidence score, and any relevant data the agent retrieved. A human reviewer who receives "approve this action? [yes/no]" without context will either rubber-stamp everything or become a bottleneck while they reconstruct what the agent was trying to do.

escalation-policy.yaml

# escalation-policy.yaml — tiered autonomy configuration
agent: procurement-bot

tiers:
  autonomous:
    description: "Agent acts without human involvement"
    actions:
      - name: "lookup_budget"
      - name: "check_vendor_status"
      - name: "send_status_update"

  act_then_notify:
    description: "Agent acts, notifies reviewer within 1 hour"
    actions:
      - name: "approve_purchase"
        conditions:
          max_amount: 2000
          vendor_status: "existing"
          budget_utilization_below: 0.8
    notify_channel: "#procurement-log"

  propose_and_wait:
    description: "Agent proposes, blocks until human approves"
    timeout_minutes: 120
    actions:
      - name: "approve_purchase"
        conditions:
          max_amount: 10000
          vendor_status: "any"
      - name: "create_vendor_record"
    approval_channel: "#procurement-approvals"
    required_approvers: 1

  full_handoff:
    description: "Agent transfers to human with context"
    actions:
      - name: "approve_purchase"
        conditions:
          amount_above: 10000
      - name: "any_action"
        conditions:
          confidence_below: 0.7
          anomaly_score_above: 0.8

Production Checklist

✓Every agent action is mapped to an autonomy tier with explicit criteria documented in a reviewable config file.
✓Escalation handoffs include full context: user request, proposed action, reasoning, confidence score, and retrieved data.
✓Approval workflows have defined SLAs. If no human responds within the timeout, the action is rejected, not auto-approved.
✓Confidence thresholds are calibrated on historical data, not set arbitrarily. A 0.7 threshold should mean 70% of actions above it are correct.
✓Escalation volume is monitored: if more than 30% of actions require human review, the tiers or the agent need adjustment.
✓Fallback routing exists. If the primary approver is unavailable, escalations route to a backup within the timeout window.
✓The agent cannot modify its own escalation policy or tier assignments through any tool call or prompt.
✓Monthly reviews compare escalation decisions against outcomes (did humans approve actions the agent should have handled, or vice versa?)
✓Users are informed when they are interacting with an agent versus a human, and when their request has been escalated.

Common Pitfalls

The most common pitfall is designing escalation around a single axis, usually dollar amount or confidence score, when real-world decisions involve multiple risk factors. A $500 purchase from a new vendor in an unusual budget category might warrant more scrutiny than a $5,000 purchase from a long-standing supplier for routine supplies. Your escalation policy needs to compose multiple signals, not just check one threshold.

Another failure mode is the "approve fatigue" problem. If your agent escalates too frequently with low-quality context, human reviewers start rubber-stamping approvals to clear their queue. This is worse than no human-in-the-loop because it creates an illusion of oversight while providing none. Monitor approval rates. If a reviewer is approving 98% of escalations in under 10 seconds, they are not reviewing them.

Finally, many teams implement human-in-the-loop as a synchronous blocker without considering the user experience. A customer waiting in a chat while the agent says "let me check with my team" for 45 minutes will leave. Design your escalation flows with the end user in mind: set expectations about wait times, offer to follow up asynchronously, and provide interim responses where possible.

Terminal Output

terminal

$ clawproof --check 04

  CHECK 04 — Human-in-the-Loop & Escalation
  ──────────────────────────────────────────

  [PASS] Autonomy tiers defined: 4 levels configured
  [PASS] All tool calls mapped to tiers (12/12 actions)
  [PASS] Escalation handoffs include context payload
  [PASS] Approval timeout configured: 120 min → auto-reject
  [WARN] Confidence threshold (0.7) not validated against historical accuracy
  [PASS] Escalation rate: 18% of actions (target: <30%)
  [PASS] Fallback approver routing configured
  [PASS] Agent cannot modify escalation policy via tool calls
  [FAIL] No monthly review of escalation outcomes found

  Result: 7 passed, 1 warning, 1 failed
  Status: NEEDS ATTENTION

$ clawproof --related

Referenced In

articleWhy Your AI Agent Needs a Pre-Flight Checklist

Previous← #03 Prompt Injection & Data Exfil Next#05 Rollback & Kill Switches →