What is an infinite loop in an AI agent?

An infinite loop is an agent failure mode where the agent repeats the same reasoning step, tool call, handoff, or prompt without progress. It shows up as repeated trace steps, rising token cost, and a workflow that never reaches a terminal state.

How is an infinite loop different from a long-running agent task?

A long-running task keeps adding new state, evidence, or completed subtasks. An infinite loop repeats the same action pattern without measurable progress, often because the planner cannot recognize that its previous step failed.

How do you measure an infinite loop?

Use FutureAGI's CustomerAgentLoopDetection and StepEfficiency evaluators with trace fields such as agent.trajectory.step and repeated tool spans. Track loop rate, step count, token cost per trace, and terminal-state failures.

What Is an Infinite Loop? FutureAGI Guide (2026)

What Is an Infinite Loop (Agent Failure)?

An infinite loop is an agent failure mode where an AI agent repeats the same reasoning step, tool call, handoff, or user prompt without making meaningful progress. It appears in eval pipelines and production traces as duplicated agent.trajectory.step spans, rising token usage, and no terminal state. FutureAGI treats it as a trajectory-quality problem: detect repeated action patterns, score step efficiency, and halt or route the run before the user sees a stuck workflow.

Why It Matters in Production LLM and Agent Systems

An infinite loop turns a single bad planning decision into cost, latency, and trust damage. The common shape is simple: the agent asks a tool for missing data, receives an empty or ambiguous response, decides it still needs that data, and calls the same tool again. In a customer-support workflow, that can mean the agent repeatedly asks for an order ID even after the user already provided it. In a coding agent, it can mean searching the repository for the same symbol until the run times out.

The pain lands on several teams at once. Developers see traces with repeated tool spans but no final answer. SREs see p99 latency and worker utilization rise without a matching increase in completed tasks. Product teams see drop-offs, thumbs-down feedback, and “the bot is stuck” tickets. Finance sees token and tool-call cost move upward because every loop iteration pays for another model step.

This matters more in 2026-era agentic systems than in single-turn chat. Multi-step agents carry memory, tool state, retrieval context, handoffs, and gateway policies across a trajectory. A loop can cross those boundaries: one agent retries a failing tool, hands off to another agent with the same incomplete state, then receives the task back unchanged. Without loop detection, the system may look active while it is only repeating itself.

How FutureAGI Detects Infinite Loops

FutureAGI’s approach is to treat a loop as a trajectory defect, not as a vague “bad answer.” The anchored surfaces are eval:CustomerAgentLoopDetection and eval:StepEfficiency. CustomerAgentLoopDetection evaluates customer-agent conversations for loop-like behavior, such as repeated requests, repeated clarifications, or failure to advance the interaction. StepEfficiency evaluates the efficiency of the agent’s trajectory, so a run with ten steps that should have taken three can be flagged even when it eventually completes.

A practical workflow starts in traces. With a traceAI integration such as /traceai/langchain, each agent turn emits agent.trajectory.step, tool span names, timing, and token attributes such as llm.token_count.prompt. The engineer groups traces by session and compares adjacent steps: same tool, same arguments, same planner rationale, same missing state. Those candidates feed an eval job using CustomerAgentLoopDetection for customer-facing repetition and StepEfficiency for wasteful trajectories.

Unlike a plain OpenTelemetry span search, which only shows repeated events after the run, FutureAGI pairs trace repetition with evaluator outcomes. The next action is operational: set an alert on loop-rate by workflow, add a step budget for high-risk tools, and add a planner fallback that asks a human or returns a clear failure instead of continuing. For regression testing, keep a small dataset of historical loop traces and require new prompt, tool, or routing changes to improve or preserve both evaluator scores.

How to Measure or Detect It

Use multiple signals because loops often hide behind “still thinking” UI states:

CustomerAgentLoopDetection — evaluates whether a customer-agent conversation is repeating itself instead of moving toward resolution.
StepEfficiency — evaluates whether the trajectory used an efficient number of steps for the goal.
Trace field agent.trajectory.step — compare adjacent steps for repeated action names, tool names, arguments, and planner rationale.
Dashboard signal: loop-rate by workflow — percentage of traces with repeated steps above your threshold.
Cost signal: token-cost-per-trace — loops show as rising cost with flat task-completion rate.
User proxy: escalation-rate — repeated prompts often push users to human support.

from fi.evals import CustomerAgentLoopDetection, StepEfficiency

loop_eval = CustomerAgentLoopDetection()
efficiency_eval = StepEfficiency()

loop_result = loop_eval.evaluate(trace=agent_trace)
efficiency_result = efficiency_eval.evaluate(trajectory=agent_trace.steps)

Treat the snippet as the eval-job shape: import the evaluators, run them against stored traces, and trend the results by prompt version, tool version, and workflow.

Common Mistakes

Counting steps without comparing state. A 12-step workflow may be valid; a four-step workflow can loop if every step repeats the same missing state.
Only setting a hard max-step limit. A limit stops the bill, but it does not explain which planner, tool, or handoff caused the repetition.
Retrying tools as a loop fix. Blind retries can deepen the loop when the tool returns the same empty result or stale context.
Ignoring near-duplicate language. Agents often paraphrase the same clarification, so exact-string matching misses customer-facing loops.
Mixing loop failures with generic latency. Long latency can come from slow tools; loop latency comes from repeated decisions with no progress.