Security

What Is Excessive Agency (LLM)?

An LLM security failure where an agent has more autonomy, tools, or permissions than the task requires, creating unsafe action paths.

What Is Excessive Agency (LLM)?

Excessive agency is an LLM security failure mode in which an agent has more autonomy, tools, or permissions than the task requires. It shows up in eval pipelines, production traces, and gateway decisions when the model can invoke actions such as refunds, emails, database writes, or external API calls without enough policy checks. FutureAGI treats it as action-safety risk: score the agent trajectory with ActionSafety, then enforce a post-guardrail before the action reaches a real system.

Why Excessive Agency Matters in Production LLM and Agent Systems

Excessive agency turns a model mistake into a real-world operation. A single bad plan can become an unauthorized refund, an email sent to the wrong customer, a database write, or a tool call that exposes private data. The named failure modes are usually unauthorized tool invocation and over-permissioned action execution. Prompt injection, hallucination, or a weak planner may start the chain, but excessive agency is what lets the chain touch production systems.

The pain is shared. Developers see traces where the agent selected a powerful tool for a low-risk request. SREs see abnormal tool-call fan-out, longer trajectories, high token cost per trace, and p99 latency spikes caused by retries. Security teams see missing approval logs for privileged actions. Product teams hear from users after the action has already happened.

This matters more for 2026-era agentic systems than for single-turn LLM calls because action boundaries have multiplied. Agents read web pages, call MCP servers, query internal APIs, write tickets, update CRMs, and hand work to other agents. Every added tool expands the permission surface. If the system cannot prove why a tool call was allowed, which policy approved it, and whether the user intended it, the agent has more agency than the application can defend.

How FutureAGI Handles Excessive Agency

FutureAGI anchors excessive-agency review in two surfaces: the ActionSafety evaluator and the Agent Command Center post-guardrail. The evaluator scores whether a proposed agent action is safe for the user intent and policy context. The post-guardrail sits after the model has planned or emitted a tool call but before the connector executes it. That placement matters: pre-input filtering cannot see the final action arguments, and model self-critique is not an authorization control.

A real workflow looks like this. A support agent can read orders, draft replies, and request refunds. The trace records each agent.trajectory.step with tool.name, tool.arguments, user intent, risk tier, and policy verdict. When the model proposes refund_order for a user who only asked for shipment status, ActionSafety flags the action as unsafe. The Agent Command Center post-guardrail blocks execution, routes to a fallback that asks for confirmation or human review, and writes an alert against the trace.

FutureAGI’s approach is action-level, not answer-level. Unlike Ragas faithfulness-style checks that ask whether text matches retrieved context, excessive agency needs evidence about the chosen tool, the arguments, the authorization policy, and the downstream side effect. Engineers then add the blocked trace to a regression dataset, set a maximum unsafe-action rate for release candidates, and re-run the eval whenever prompts, tools, model versions, or gateway policies change.

How to Measure or Detect Excessive Agency

Measure excessive agency at the action boundary, not only at the chat-response boundary:

  • ActionSafety - returns a safety judgment with score and reason for the proposed action.
  • ToolSelectionAccuracy - catches cases where the agent chose the wrong tool even if the final response looked plausible.
  • Trace fields - inspect agent.trajectory.step, tool.name, tool.arguments, approval status, and policy verdict per step.
  • Dashboard signals - track unsafe-action rate, post-guardrail-block-rate, privileged-tool-call rate, and tool-call fan-out per trace.
  • User-feedback proxy - monitor escalation rate, reversal requests, and “the agent did something I did not ask for” reports.
from fi.evals import ActionSafety

evaluator = ActionSafety()
result = evaluator.evaluate(
    input="User asks only for shipment status on order 123.",
    output='{"tool": "refund_order", "amount": 199}'
)
print(result.score, result.reason)

Alert when unsafe actions rise by cohort, when a new tool has no post-guardrail policy, or when high-risk tools execute without an approval event.

Common Mistakes

  • Giving read and write tools to the same default agent. Split read-only lookup from state-changing actions, then require explicit policy approval for writes.
  • Treating the system prompt as an authorization layer. “Only refund when appropriate” is guidance, not enforcement. Put policy in the gateway.
  • Scoring only final answers. A safe-sounding response can hide an unsafe tool call already executed two steps earlier.
  • Using one approval rule for all tools. Email draft, refund, password reset, and database mutation need different thresholds and reviewer paths.
  • Not regression-testing blocked actions. If a bad tool call is blocked once but never added to evals, the next prompt edit can reopen it.

Frequently Asked Questions

What is excessive agency in LLMs?

Excessive agency is an LLM security failure where an agent can choose or execute actions beyond user intent, policy, or authorization. It shows up when broad permissions, missing approvals, or unsafe defaults let the model call tools it should not.

How is excessive agency different from prompt injection?

Prompt injection is an attack that changes model instructions; excessive agency is the unsafe capability boundary that makes a bad instruction dangerous. Injection often triggers the incident, but overbroad tools and permissions determine the blast radius.

How do you measure excessive agency?

Use FutureAGI's ActionSafety evaluator on proposed tool actions and track Agent Command Center post-guardrail blocks by tool, route, and policy. Regression-test blocked scenarios before releasing new prompts, models, or tools.