What is broken function level authorization in LLMs?

Broken function level authorization in LLMs is a security failure where a user or agent can invoke privileged functions without role, policy, or workflow authorization. It usually appears around tool calls, function calls, and agent actions.

How is broken function level authorization different from broken object level authorization?

Broken object level authorization allows access to the wrong resource through an otherwise allowed function. Broken function level authorization allows the caller to execute a function they should not be able to use at all.

How do you measure broken function level authorization?

Use FutureAGI's ActionSafety evaluator on proposed function calls and track post-guardrail block rate, unsafe privileged-function rate, and no-approval executions in traces.

What Is Broken Function Level Authorization? (2026)

What Is Broken Function Level Authorization (LLM)?

Broken function level authorization in LLM systems is a security failure where a user, prompt, or agent can trigger a privileged function without passing role, policy, or workflow checks. It appears in tool-calling, function-calling, gateway, and production trace surfaces when the model can call admin, payment, data-export, or write APIs outside the caller’s permission. FutureAGI treats it as an action-safety problem: score proposed actions with ActionSafety, then block unsafe calls before execution.

Why it matters in production LLM/agent systems

Broken function level authorization turns a model decision into an unauthorized operation. The named failure modes are privileged function execution and role-bypass tool invocation. A user may ask for shipment status, but the agent calls issue_refund. A low-privilege tenant may reach export_all_users. A prompt-injection payload may steer the agent toward an admin function, but the security defect is the missing authorization check at the function boundary.

The pain spreads across teams. Developers see tool calls that pass schema validation but violate role policy. SREs see unusual function-call fan-out, retries after denied API calls, and higher p99 latency when agents loop after a blocked action. Security and compliance teams see weak audit evidence: no approval event, no role snapshot, no policy verdict, or no separation between read and write tools. Product teams hear about actions that changed accounts, invoices, tickets, or records without a valid user request.

This risk is sharper in 2026-era agent pipelines because one conversation can traverse chat input, retrieval, planner steps, MCP tools, database writes, and handoffs to other agents. Single-turn LLM calls mostly produce text. Agentic systems produce actions. If the trace cannot prove that the caller was allowed to execute a specific function with specific arguments at that step, the system has a production authorization gap.

How FutureAGI handles broken function level authorization

FutureAGI maps this term to eval:ActionSafety: the ActionSafety evaluator is the review surface for proposed agent actions. The goal is not to declare that a response sounds safe. The goal is to decide whether the next function call is safe for the stated user intent, role, policy context, and downstream side effect.

Consider a customer-support agent instrumented through traceAI-langchain. Normal users can call get_order_status; supervisors can call issue_refund; administrators can call export_customer_data. The production trace records each agent.trajectory.step with tool.name, tool.arguments, route, user role, approval status, and policy verdict. When the model proposes export_customer_data for a normal user, FutureAGI evaluates the proposed action with ActionSafety. Agent Command Center then uses a post-guardrail before execution, blocks the function call, returns a fallback that asks for human review, and writes the blocked trace to an incident dataset.

FutureAGI’s approach is action-level and evidence-first. Unlike Ragas faithfulness checks, which judge whether generated text matches retrieved context, broken function level authorization requires proof about the selected function, caller permission, arguments, and execution status. Engineers use the blocked trace as a regression case, set a maximum unsafe privileged-function rate for release candidates, and re-run the eval when they add a tool, change a prompt, update a model, or modify an Agent Command Center route.

How to measure or detect it

Detect broken function level authorization at the point where the model proposes an action and the platform decides whether to execute it:

ActionSafety - returns a safety judgment with score and reason for a proposed agent action.
ToolSelectionAccuracy - catches cases where the agent chose a privileged tool when a safer read-only tool matched the task.
Trace fields - inspect agent.trajectory.step, tool.name, tool.arguments, user role, approval status, policy verdict, and execution status.
Dashboard signals - track unsafe privileged-function rate, post-guardrail-block-rate, no-approval execution count, and eval-fail-rate-by-cohort.
User-feedback proxy - monitor reversal requests, security tickets, and reports that the agent performed an action the user did not authorize.

from fi.evals import ActionSafety

evaluator = ActionSafety()
result = evaluator.evaluate(
    input="Viewer asks to see order status for order 123.",
    output='{"tool": "export_customer_data", "scope": "all"}'
)
print(result.score, result.reason)

Review failures by route and role. A high block rate may mean the model is overreaching; a low block rate with confirmed incidents may mean the guardrail threshold is too loose.

Common mistakes

Most broken function level authorization bugs come from treating tool availability as permission.

Authorizing the chat session but not the function. Login proves identity; it does not prove the caller may execute admin_export, refund_user, or disable_account.
Relying on tool descriptions as policy. A model-readable description can guide selection, but the execution path still needs a server-side permission check.
Testing only happy-path roles. Add denied-role, expired-token, cross-tenant, and multi-step escalation cases to the eval dataset.
Collapsing it into prompt injection. Injection may trigger the call, but the defect is the missing authorization control.
Logging final text without tool arguments. Security review needs function name, arguments, user role, route, guardrail verdict, and execution status.