How is recursive chunking different from fixed-size chunking?

Fixed-size chunking cuts text by character or token count. Recursive chunking tries larger semantic boundaries first, then falls back to smaller separators only when the text is still too large.

How do you measure recursive chunking?

FutureAGI measures recursive chunking with ChunkAttribution, ChunkUtilization, ContextRelevance, and trace fields for retrieved chunk rank, source URI, and chunk-policy version.

What Is Recursive Chunking? FutureAGI Guide (2026)

Q: What is recursive chunking?

Recursive chunking is a RAG splitting method that tries separators such as headings, paragraphs, sentences, and words until each chunk fits a target size while preserving meaning.

What Is Recursive Chunking?

Recursive chunking is a retrieval-augmented generation chunking technique that splits documents by trying separators in priority order, usually headings, paragraphs, sentences, then words, until every passage fits a target size. It shows up in RAG ingestion, eval pipelines, and production traces because those boundaries control which evidence can be retrieved. FutureAGI evaluates recursive chunking with ChunkAttribution, ContextRelevance, and Groundedness so teams can see whether retrieved chunks actually support generated answers.

Why It Matters in Production LLM and Agent Systems

Recursive chunking prevents one common RAG failure: evidence gets split at the wrong boundary, then the retriever returns a fragment that looks relevant but omits the condition that makes the answer true. A refund policy chunk might contain “30 days” while the next chunk contains “only for annual plans.” The model answers confidently, the citation looks plausible, and the end-user receives a false policy statement.

Developers feel this as unstable retrieval quality. A prompt passes when a paragraph boundary stays intact, then fails after a Markdown export, PDF parser, or tokenizer change shifts the same text into different chunks. SREs see higher token-cost-per-trace when teams compensate by raising top-k or stuffing parent sections into context. Product teams see thumbs-down feedback on cited answers. Compliance teams care because bad boundaries can turn a correct source document into a misleading generated answer.

The log symptoms are specific: high retrieval rank for irrelevant fragments, low attribution on answers with citations, rising context window pressure, and eval-fail-rate spikes by document type. Recursive chunking matters even more in 2026-era agentic RAG because retrieval rarely happens once. An agent may rewrite a query, retrieve evidence, call a tool, critique the response, and retrieve again. If the first retrieval step loses the exception, later steps build on incomplete context.

How FutureAGI Handles Recursive Chunking

FutureAGI’s approach is to treat recursive chunking as a versioned retrieval policy that must earn promotion with evals, not as a preprocessing detail hidden behind embeddings. The anchor surface is eval:ChunkAttribution, implemented as fi.evals.ChunkAttribution. It checks whether the final answer can be attributed to the retrieved chunks. Teams usually pair it with ChunkUtilization, ContextRelevance, and Groundedness to separate three failures: bad retrieval, unused evidence, and unsupported generation.

Example: a support agent indexes onboarding docs, pricing pages, and API references. The team tries a recursive splitter with separator order ["## ", "\n\n", ". ", " "], a 700-token limit, and 120-token overlap. FutureAGI logs the retrieval spans through traceAI-langchain, including retrieved chunk text, source URI, rank, chunk token count, and chunk-policy version. The engineer runs the old and new indexes against the same golden dataset, then checks eval-fail-rate-by-cohort for billing, setup, and error-code questions.

Unlike Ragas faithfulness, which mainly asks whether the answer is supported by the supplied context, this workflow keeps the chunking decision visible. If ContextRelevance improves but ChunkAttribution drops on pricing questions, the retriever found topical chunks but not enough contiguous evidence. The engineer changes overlap for pricing pages, re-embeds that corpus version, and blocks the rollout until the regression eval clears the agreed threshold.

How to Measure or Detect Recursive Chunking

Measure recursive chunking at both the index version and trace level:

ChunkAttribution: returns whether answer claims can be tied to retrieved chunks.
ChunkUtilization: shows whether retrieved evidence was actually used by the generator.
ContextRelevance: scores whether chunks answer the query before generation starts.
Dashboard signals: top-k recall on golden queries, eval-fail-rate-by-cohort, context token count, token-cost-per-trace, and cited-answer thumbs-down rate.
Trace fields: store chunk-policy version, source URI, rank, chunk token count, overlap size, and separator family.

from fi.evals import ChunkAttribution

result = ChunkAttribution().evaluate(
    output="Annual plans have a 30-day refund window.",
    context=["Billing policy: annual plans have a 30-day refund window."]
)
print(result.score, result.reason)

Detection is strongest when each retrieval span preserves the chunk policy. If attribution falls after a parser change while relevance stays flat, inspect boundary changes before tuning prompts.

Common Mistakes

Recursive chunking is simple enough to adopt quickly, which makes its failure modes easy to miss during ingestion work.

Using one separator order for every corpus. Markdown docs, PDFs, tables, and code need different boundary rules.
Optimizing only chunk size. A 500-token target is meaningless if it splits definitions from exceptions.
Adding overlap without measuring cost. More overlap can improve attribution while raising token-cost-per-trace and duplicate retrieval.
Re-embedding without policy versioning. Boundary changes create a new retrieval artifact and need regression comparison.
Trusting citations alone. A citation can point to a related chunk while omitting the sentence that proves the answer.