Hardening OpenAI Function Calling Agents

OpenAI's function calling is powerful and dangerously easy to misconfigure. This playbook covers permission boundaries, input validation, and cost controls.

By Werner Plutat

On this page

The Risk Surface
Permission Boundaries
Function Schema Hardening
Input Validation Layer
Rate Limiting and Cost Controls
Monitoring and Alerting
Deployment Checklist

The Risk Surface

OpenAI function calling lets the model decide which functions to invoke and what arguments to pass. This creates a unique risk surface: the LLM acts as an untrusted caller to your backend code. Unlike traditional API consumers, the model can hallucinate function names, fabricate argument values, or be manipulated through prompt injection to call functions the user never intended.

The core danger is that function definitions sent in the `tools` array describe capabilities, not permissions. If you define a `delete_user` function, the model can call it. There is no built-in authorization layer. Every function you expose is implicitly authorized for every conversation unless you enforce boundaries yourself.

Common failure modes include: the model calling administrative functions during regular user sessions, passing unsanitized SQL or shell commands as string arguments, chaining multiple function calls to escalate privileges (e.g., calling `list_users` to discover admin IDs, then `update_role`), and exploiting overly broad `enum` values or missing `maxLength` constraints in parameter schemas.

OpenAI's API will also retry failed function calls with modified arguments, which means a partially-blocked attack may succeed on the second or third attempt if your validation is inconsistent across retries.

Permission Boundaries

The most effective way to limit function calling risk is to scope the `tools` array per-request based on the authenticated user's permissions. Never send your full function catalog to every conversation. Instead, build a function registry that maps user roles to allowed functions.

Start by categorizing your functions into tiers: read-only operations (fetching data, searching), write operations (creating or updating records), and destructive operations (deleting data, modifying permissions). Only include functions in the `tools` array that match the current user's authorization level.

For multi-turn conversations, consider narrowing the tools array as the conversation progresses. A customer support agent might start with broad search capabilities but should only receive refund-processing functions after the user has been verified and a specific order identified. This pattern (progressive function disclosure) reduces the attack surface at each step.

Implement a server-side allowlist rather than a blocklist. When the model returns a `tool_calls` response, validate the function name against the user's permitted functions before executing. This catches cases where cached or stale tool definitions include functions that should have been removed.

For `tool_choice` configuration, avoid `auto` in high-risk scenarios. Use `{"type": "function", "function": {"name": "specific_function"}}` to force the model to call exactly the function you expect at each step of a guided workflow. This eliminates the model's ability to choose unexpected functions entirely.

Function Schema Hardening

Your function definitions in the `tools` array are both documentation for the model and your first line of defense. Tighten parameter schemas to constrain what the model can generate. Use `enum` for known-value fields, set `maxLength` on strings, and mark only truly required fields as required. Avoid open-ended `string` types where a more specific format will do.

The schema below demonstrates a hardened function definition that restricts argument values to safe ranges and known enums. Note the explicit `additionalProperties: false`. Without this, the model can inject unexpected fields that your handler might naively process.

hardened-function-definition.json

{
  "type": "function",
  "function": {
    "name": "get_order_details",
    "description": "Retrieve details for a specific order. Only returns orders belonging to the authenticated user.",
    "strict": true,
    "parameters": {
      "type": "object",
      "required": ["order_id"],
      "additionalProperties": false,
      "properties": {
        "order_id": {
          "type": "string",
          "pattern": "^ORD-[0-9]{6,10}$",
          "description": "Order ID in format ORD-XXXXXX"
        },
        "include_fields": {
          "type": "array",
          "items": {
            "type": "string",
            "enum": ["status", "items", "shipping", "payment_summary"]
          },
          "maxItems": 4,
          "description": "Specific fields to include in the response"
        }
      }
    }
  }
}

Input Validation Layer

Even with tight schemas, treat every argument from the model as untrusted user input. The `strict` mode in OpenAI function calling enforces JSON Schema validation on the API side, but it does not cover semantic validation. The model can still pass a valid-format order ID that belongs to a different user, or a syntactically correct but malicious string.

Build a validation middleware that runs between the API response and your function execution. This middleware should: (1) verify the function name is in the current user's allowlist, (2) validate all arguments against your server-side schema (do not rely solely on OpenAI's schema enforcement), (3) check business logic constraints like ownership (does this order belong to the requesting user?), and (4) sanitize string arguments that will touch databases or external systems.

For functions that accept free-text arguments, such as search queries or message content, apply the same input sanitization you would for any web form: escape special characters, enforce length limits, and reject inputs that match known injection patterns. Remember that the model can be prompt-injected to pass payloads like `'; DROP TABLE orders; --` as function arguments.

Log every function call with its full arguments before execution. This creates an audit trail and enables you to detect patterns like repeated calls with incrementing IDs (enumeration attacks) or sudden spikes in destructive operations. Pair this with anomaly detection: if a conversation that has been doing read-only lookups suddenly attempts a delete, flag it for review.

Consider implementing a confirmation step for high-risk operations. Instead of executing destructive functions immediately, return a confirmation token that the model must present in a follow-up call. This forces a two-step process that gives your application layer a chance to verify intent.

Rate Limiting and Cost Controls

Function calling agents can generate runaway costs through recursive tool use loops or by being manipulated into making expensive API calls repeatedly. Implement per-conversation and per-user rate limits on function calls, and set hard budget caps that terminate conversations when exceeded.

The middleware below tracks function call counts and estimated costs per conversation, rejecting calls that exceed configured thresholds. It also enforces a maximum chain depth to prevent infinite tool-use loops where the model calls functions that trigger more function calls.

function-call-limiter.ts

interface ConversationLimits {
  maxFunctionCalls: number;
  maxChainDepth: number;
  maxCostCents: number;
  windowMs: number;
}

const DEFAULT_LIMITS: ConversationLimits = {
  maxFunctionCalls: 25,
  maxChainDepth: 5,
  maxCostCents: 500,   // $5 per conversation
  windowMs: 3_600_000, // 1 hour
};

const conversationState = new Map<string, {
  callCount: number;
  chainDepth: number;
  costCents: number;
  windowStart: number;
}>();

export function checkFunctionCallLimit(
  conversationId: string,
  functionName: string,
  estimatedCostCents: number,
  limits: ConversationLimits = DEFAULT_LIMITS
): { allowed: boolean; reason?: string } {
  const now = Date.now();
  let state = conversationState.get(conversationId);

  if (!state || now - state.windowStart > limits.windowMs) {
    state = { callCount: 0, chainDepth: 0, costCents: 0, windowStart: now };
    conversationState.set(conversationId, state);
  }

  state.callCount++;
  state.costCents += estimatedCostCents;

  if (state.callCount > limits.maxFunctionCalls) {
    return { allowed: false, reason: `Function call limit (${limits.maxFunctionCalls}) exceeded` };
  }
  if (state.costCents > limits.maxCostCents) {
    return { allowed: false, reason: `Cost cap (${limits.maxCostCents}c) exceeded` };
  }

  return { allowed: true };
}

Monitoring and Alerting

✓Log every function call with conversation ID, user ID, function name, arguments (redacting sensitive values), and execution result
✓Track function call latency percentiles (p50, p95, p99). Sudden spikes may indicate the model is stuck in a retry loop
✓Set up alerts for function error rates exceeding 10% over a 5-minute window, which often signals prompt injection attempts or schema mismatches
✓Monitor for unusual function call sequences: a pattern like list → get → update → delete in rapid succession may indicate automated exploitation
✓Track per-user daily function call volumes and alert on users exceeding 3x the median to catch abuse early
✓Instrument token usage per conversation and alert when total tokens (input + output + function definitions) exceed your expected ceiling
✓Monitor the ratio of function calls to user messages. Healthy conversations typically have 1-3 function calls per user turn; ratios above 5:1 suggest looping
✓Set up a real-time dashboard showing active conversations with function calling enabled, grouped by function name, to spot coordinated attacks

Deployment Checklist

✓All function definitions use `strict: true` and `additionalProperties: false` in parameter schemas
✓Functions are scoped per user role. The `tools` array is built dynamically based on authenticated user permissions
✓Server-side validation runs on every function call independently of OpenAI's schema enforcement
✓String arguments that touch databases or shell commands are sanitized and parameterized
✓Per-conversation rate limits are enforced: max function calls, max cost, and max chain depth
✓Destructive operations (delete, modify permissions) require a two-step confirmation flow
✓All function call arguments and results are logged with conversation and user context for audit trails
✓Monitoring alerts are configured for error rate spikes, unusual call patterns, and cost threshold breaches
✓A kill switch exists to disable function calling globally or per-user without redeploying
✓The system prompt explicitly instructs the model not to call functions unless the user's request clearly requires it
✓Secrets and API keys are never included in function definitions or system prompts. They are injected server-side at execution time
✓Load testing has been performed with adversarial prompts to verify that validation and rate limiting hold under pressure

$ clawproof --related