How is ReAct different from chain-of-thought?

Chain-of-thought is reasoning only — the model writes its thoughts but does not act. ReAct extends it by interleaving acting steps, so the agent can call tools, observe results, and update its reasoning. CoT is a single mental pass; ReAct is a loop.

How do you measure a ReAct agent?

FutureAGI uses ReasoningQuality on each thought, ToolSelectionAccuracy on each action, and TaskCompletion on the trajectory; per-step spans expose every triplet for debugging.

What Is ReAct? Definition & FutureAGI Guide (2026)

Q: What is the ReAct pattern?

ReAct is an agent control pattern that interleaves reasoning thoughts and acting steps — usually tool calls — in a loop, with each action's observation feeding into the next thought, until the model emits a final answer.

What Is the ReAct Pattern?

ReAct is the canonical agent control pattern that interleaves Reasoning and Acting in a loop. Introduced by Yao et al. in 2022, it formalised what most agents now do: at each step the model writes an explicit thought (the reasoning), emits an action (typically a tool call), observes the result, and continues. The pattern produces a chain of (thought, action, observation) triplets. ReAct underpins the default agent loops in LangChain, the OpenAI Agents SDK, LangGraph, CrewAI, and most ReAct-style implementations across 2026 frameworks. In a FutureAGI trace, each triplet appears as three correlated spans.

Why It Matters in Production LLM and Agent Systems

ReAct is popular because it makes reasoning legible. The model says what it’s thinking before it acts, which means every wrong action has a recorded justification. That sounds minor — it isn’t. The thought trace is the difference between debugging “the agent picked the wrong tool” and debugging “the agent picked the wrong tool because it misread the user’s intent as a returns query when it was an exchange query.” One bug fix needs a tool description tweak; the other needs a system-prompt edit.

Each role gets value differently. A backend engineer reads thoughts to localise where the agent’s understanding diverged from the user’s intent. A product reviewer audits thoughts to catch agents that say one thing and do another. An SRE uses thought-token cost as a proxy for agent complexity — verbose ReAct agents are expensive ones. A QA engineer turns canonical thoughts into regression evals: “the agent should always reason about eligibility before issuing a refund.”

The flip side: thoughts cost tokens. A ReAct agent generates 2–4x the tokens of a no-thought agent for the same output. In 2026, teams increasingly mix patterns — ReAct for high-stakes paths, plan-and-execute for predictable workflows, no-thought tool calls for low-stakes lookups. The goal is to put thought tokens where they earn their cost, and FutureAGI traces let you see where they do.

How FutureAGI Handles ReAct

FutureAGI’s approach is to evaluate thoughts and actions as separate first-class spans. The traceAI integrations — traceAI-langchain, traceAI-langgraph, traceAI-openai-agents, traceAI-crewai, traceAI-haystack — wrap the ReAct loop so each thought becomes an LLM span, each action becomes a tool span, and the observation is the tool span’s result. All three carry agent.trajectory.step and an iteration index, which makes per-triplet evaluation trivial.

Three evaluators cover the ReAct surface. ReasoningQuality (and the framework-eval ReasoningQualityEval) scores the logical validity of each thought given the prior observations — does the reasoning actually justify the next action? ToolSelectionAccuracy scores whether the action that followed the thought was the correct tool. TaskCompletion grades the trajectory end-to-end. Used together they tell you whether a ReAct failure is “the thought was wrong” (a reasoning bug, often a model swap or prompt regression) or “the thought was right but the action contradicted it” (a tool-spec or registry mismatch).

Concretely: a refund agent built on the LangChain ReAct executor with traceAI-langchain runs against a Scenario simulation. After a model swap, ReasoningQuality stays at 0.91, but ToolSelectionAccuracy drops from 0.88 to 0.74. The trace shows the model still reasons correctly (“user wants refund, I should check the policy first”) but then calls issue_refund directly instead of check_policy. The fix is in the tool descriptions, not the prompt — the new model is more aggressive about action and less attentive to ordering hints. The team adds a regression eval that pins the policy-check call before any refund issuance.

How to Measure or Detect It

ReAct’s three-part structure gives you three independent signals — measure each:

ReasoningQuality (local-metric) and ReasoningQualityEval (framework-eval): score logical validity of each thought given prior observations.
ToolSelectionAccuracy: scores whether the action following each thought was the correct tool.
TaskCompletion: end-to-end success rate on the trajectory.
thought-action consistency (custom): % of triplets where the action follows logically from the thought.
thought-token cost (dashboard signal): total reasoning tokens per trace; spikes flag verbose loops.
agent.trajectory.step (OTel attribute): per-iteration tag; combine with span kind to slice thought vs. action spans.

Minimal Python:

from fi.evals import ReasoningQuality, ToolSelectionAccuracy

reasoning = ReasoningQuality().evaluate(
    input=user_query,
    trajectory=react_spans,
)
print(reasoning.score, reasoning.reason)

Common Mistakes

Letting thoughts hide in the prompt. If thoughts aren’t captured as their own span, you lose the legibility that makes ReAct worth using; instrument them explicitly.
Evaluating actions only. A wrong action with a reasonable thought is a tool-spec bug; a right action with a broken thought is a model bug — distinguish them.
No max-iteration cap. ReAct loops without bounds are runaway-cost candidates; cap turn count.
Treating thought tokens as free. Verbose thoughts can double inference cost; track thought-token cost per trace.
Pinning to ReAct everywhere. Plan-and-execute or direct tool calls beat ReAct on predictable workflows — match the pattern to the task.

What Is the ReAct Pattern (Reason + Act)?

What Is the ReAct Pattern?

Why It Matters in Production LLM and Agent Systems

How FutureAGI Handles ReAct

How to Measure or Detect It

Common Mistakes

Frequently Asked Questions

What Is the ReAct Pattern?

Why It Matters in Production LLM and Agent Systems

How FutureAGI Handles ReAct

How to Measure or Detect It

Common Mistakes

Frequently Asked Questions

Related Terms