Hardening OpenAI Function Calling Agents
OpenAI's function calling is powerful and dangerously easy to misconfigure. This playbook covers permission boundaries, input validation, and cost controls.
By Werner Plutat
On this page
The Risk Surface
OpenAI function calling lets the model decide which functions to invoke and what arguments to pass. This creates a unique risk surface: the LLM acts as an untrusted caller to your backend code. Unlike traditional API consumers, the model can hallucinate function names, fabricate argument values, or be manipulated through prompt injection to call functions the user never intended.
The core danger is that function definitions sent in the `tools` array describe capabilities, not permissions. If you define a `delete_user` function, the model can call it. There is no built-in authorization layer. Every function you expose is implicitly authorized for every conversation unless you enforce boundaries yourself.
Common failure modes include: the model calling administrative functions during regular user sessions, passing unsanitized SQL or shell commands as string arguments, chaining multiple function calls to escalate privileges (e.g., calling `list_users` to discover admin IDs, then `update_role`), and exploiting overly broad `enum` values or missing `maxLength` constraints in parameter schemas.
OpenAI's API will also retry failed function calls with modified arguments, which means a partially-blocked attack may succeed on the second or third attempt if your validation is inconsistent across retries.
Permission Boundaries
The most effective way to limit function calling risk is to scope the `tools` array per-request based on the authenticated user's permissions. Never send your full function catalog to every conversation. Instead, build a function registry that maps user roles to allowed functions.
Start by categorizing your functions into tiers: read-only operations (fetching data, searching), write operations (creating or updating records), and destructive operations (deleting data, modifying permissions). Only include functions in the `tools` array that match the current user's authorization level.
For multi-turn conversations, consider narrowing the tools array as the conversation progresses. A customer support agent might start with broad search capabilities but should only receive refund-processing functions after the user has been verified and a specific order identified. This pattern (progressive function disclosure) reduces the attack surface at each step.
Implement a server-side allowlist rather than a blocklist. When the model returns a `tool_calls` response, validate the function name against the user's permitted functions before executing. This catches cases where cached or stale tool definitions include functions that should have been removed.
For `tool_choice` configuration, avoid `auto` in high-risk scenarios. Use `{"type": "function", "function": {"name": "specific_function"}}` to force the model to call exactly the function you expect at each step of a guided workflow. This eliminates the model's ability to choose unexpected functions entirely.
Function Schema Hardening
Your function definitions in the `tools` array are both documentation for the model and your first line of defense. Tighten parameter schemas to constrain what the model can generate. Use `enum` for known-value fields, set `maxLength` on strings, and mark only truly required fields as required. Avoid open-ended `string` types where a more specific format will do.
The schema below demonstrates a hardened function definition that restricts argument values to safe ranges and known enums. Note the explicit `additionalProperties: false`. Without this, the model can inject unexpected fields that your handler might naively process.
{
"type": "function",
"function": {
"name": "get_order_details",
"description": "Retrieve details for a specific order. Only returns orders belonging to the authenticated user.",
"strict": true,
"parameters": {
"type": "object",
"required": ["order_id"],
"additionalProperties": false,
"properties": {
"order_id": {
"type": "string",
"pattern": "^ORD-[0-9]{6,10}$",
"description": "Order ID in format ORD-XXXXXX"
},
"include_fields": {
"type": "array",
"items": {
"type": "string",
"enum": ["status", "items", "shipping", "payment_summary"]
},
"maxItems": 4,
"description": "Specific fields to include in the response"
}
}
}
}
}Input Validation Layer
Even with tight schemas, treat every argument from the model as untrusted user input. The `strict` mode in OpenAI function calling enforces JSON Schema validation on the API side, but it does not cover semantic validation. The model can still pass a valid-format order ID that belongs to a different user, or a syntactically correct but malicious string.
Build a validation middleware that runs between the API response and your function execution. This middleware should: (1) verify the function name is in the current user's allowlist, (2) validate all arguments against your server-side schema (do not rely solely on OpenAI's schema enforcement), (3) check business logic constraints like ownership (does this order belong to the requesting user?), and (4) sanitize string arguments that will touch databases or external systems.
For functions that accept free-text arguments, such as search queries or message content, apply the same input sanitization you would for any web form: escape special characters, enforce length limits, and reject inputs that match known injection patterns. Remember that the model can be prompt-injected to pass payloads like `'; DROP TABLE orders; --` as function arguments.
Log every function call with its full arguments before execution. This creates an audit trail and enables you to detect patterns like repeated calls with incrementing IDs (enumeration attacks) or sudden spikes in destructive operations. Pair this with anomaly detection: if a conversation that has been doing read-only lookups suddenly attempts a delete, flag it for review.
Consider implementing a confirmation step for high-risk operations. Instead of executing destructive functions immediately, return a confirmation token that the model must present in a follow-up call. This forces a two-step process that gives your application layer a chance to verify intent.
Rate Limiting and Cost Controls
Function calling agents can generate runaway costs through recursive tool use loops or by being manipulated into making expensive API calls repeatedly. Implement per-conversation and per-user rate limits on function calls, and set hard budget caps that terminate conversations when exceeded.
The middleware below tracks function call counts and estimated costs per conversation, rejecting calls that exceed configured thresholds. It also enforces a maximum chain depth to prevent infinite tool-use loops where the model calls functions that trigger more function calls.
interface ConversationLimits {
maxFunctionCalls: number;
maxChainDepth: number;
maxCostCents: number;
windowMs: number;
}
const DEFAULT_LIMITS: ConversationLimits = {
maxFunctionCalls: 25,
maxChainDepth: 5,
maxCostCents: 500, // $5 per conversation
windowMs: 3_600_000, // 1 hour
};
const conversationState = new Map<string, {
callCount: number;
chainDepth: number;
costCents: number;
windowStart: number;
}>();
export function checkFunctionCallLimit(
conversationId: string,
functionName: string,
estimatedCostCents: number,
limits: ConversationLimits = DEFAULT_LIMITS
): { allowed: boolean; reason?: string } {
const now = Date.now();
let state = conversationState.get(conversationId);
if (!state || now - state.windowStart > limits.windowMs) {
state = { callCount: 0, chainDepth: 0, costCents: 0, windowStart: now };
conversationState.set(conversationId, state);
}
state.callCount++;
state.costCents += estimatedCostCents;
if (state.callCount > limits.maxFunctionCalls) {
return { allowed: false, reason: `Function call limit (${limits.maxFunctionCalls}) exceeded` };
}
if (state.costCents > limits.maxCostCents) {
return { allowed: false, reason: `Cost cap (${limits.maxCostCents}c) exceeded` };
}
return { allowed: true };
}Monitoring and Alerting
- โLog every function call with conversation ID, user ID, function name, arguments (redacting sensitive values), and execution result
- โTrack function call latency percentiles (p50, p95, p99). Sudden spikes may indicate the model is stuck in a retry loop
- โSet up alerts for function error rates exceeding 10% over a 5-minute window, which often signals prompt injection attempts or schema mismatches
- โMonitor for unusual function call sequences: a pattern like list โ get โ update โ delete in rapid succession may indicate automated exploitation
- โTrack per-user daily function call volumes and alert on users exceeding 3x the median to catch abuse early
- โInstrument token usage per conversation and alert when total tokens (input + output + function definitions) exceed your expected ceiling
- โMonitor the ratio of function calls to user messages. Healthy conversations typically have 1-3 function calls per user turn; ratios above 5:1 suggest looping
- โSet up a real-time dashboard showing active conversations with function calling enabled, grouped by function name, to spot coordinated attacks
Deployment Checklist
- โAll function definitions use `strict: true` and `additionalProperties: false` in parameter schemas
- โFunctions are scoped per user role. The `tools` array is built dynamically based on authenticated user permissions
- โServer-side validation runs on every function call independently of OpenAI's schema enforcement
- โString arguments that touch databases or shell commands are sanitized and parameterized
- โPer-conversation rate limits are enforced: max function calls, max cost, and max chain depth
- โDestructive operations (delete, modify permissions) require a two-step confirmation flow
- โAll function call arguments and results are logged with conversation and user context for audit trails
- โMonitoring alerts are configured for error rate spikes, unusual call patterns, and cost threshold breaches
- โA kill switch exists to disable function calling globally or per-user without redeploying
- โThe system prompt explicitly instructs the model not to call functions unless the user's request clearly requires it
- โSecrets and API keys are never included in function definitions or system prompts. They are injected server-side at execution time
- โLoad testing has been performed with adversarial prompts to verify that validation and rate limiting hold under pressure