What Is a Chunk?
A retrieved passage-sized unit from a source document that a RAG pipeline embeds, indexes, retrieves, and passes to an LLM as context.
What Is a Chunk?
A chunk in RAG is a passage-sized unit cut from a source document so it can be embedded, indexed, retrieved, and passed into an LLM prompt as context. It is a retrieval-augmented generation evidence unit, not the whole document. Chunks show up in the retriever output, production trace, and eval pipeline. FutureAGI evaluates whether retrieved chunks were relevant, cited by the answer, and actually used before teams tune chunk size or model prompts.
Why It Matters in Production LLM and Agent Systems
Bad chunks create silent RAG failures. The retriever may return a passage that looks close by embedding distance but lacks the deciding fact, cuts a table row in half, or repeats boilerplate from every policy page. The LLM then fills the gap with parametric memory. The end user sees a confident answer; the trace shows “retrieval succeeded”; the failure only appears when you inspect the chunk text and the answer together.
The pain is split across teams. Retrieval engineers see low top-k recall even after changing the embedding model. ML engineers see Groundedness or Faithfulness failures that are really chunk-boundary failures. SREs see token cost rise after overlap is increased across the corpus. Product and compliance teams see stale or partial citations in regulated workflows, especially when a chunk contains a policy heading but not the exception that follows it.
Agentic RAG makes chunk quality more important because one weak chunk can contaminate several later steps. A support agent may retrieve the wrong refund chunk, call a CRM tool with the wrong eligibility window, then write a final response that appears well-cited. In 2026 multi-step pipelines, the chunk is the evidence object that travels through retrieval, reranking, tool use, answer generation, and evaluation. If that object is noisy, every downstream decision inherits the noise.
How FutureAGI Handles Chunks
FutureAGI’s approach is to treat a chunk as a measurable evidence unit, not just an indexing artifact. In a FutureAGI RAG workflow, retrieved chunks are logged on the retrieval span, passed to the answer span, and scored with fi.evals.ChunkAttribution. That evaluator is the anchor for this term: it checks whether the generated answer referenced the retrieved chunks at all, rather than answering from model memory. Teams pair it with ChunkUtilization to see how much of the chunk was used and ContextRelevance to verify that the chunk was worth retrieving.
Example: a product-doc assistant uses traceAI-llamaindex to capture retrieval.documents, document IDs, chunk IDs, and the final answer. A regression dataset includes questions whose gold evidence lives in tables and warning callouts. After a deploy, the dashboard shows ChunkAttribution pass-rate dropping from 95% to 81% for table-heavy questions while overall latency and top-k counts remain stable. The engineer opens failed traces, sees chunks ending before the relevant table value, and moves that corpus from fixed 600-token chunks to layout-aware recursive chunks.
Unlike Ragas faithfulness, which mainly asks whether the answer is supported by the retrieved context, chunk evaluation starts one step earlier: did the answer use the retrieved chunk, and was that chunk relevant enough to be useful? The next action is operational. Set an alert on attribution pass-rate by corpus, block release when a chunking strategy reduces ContextRelevance, and run regression evals before re-embedding a production knowledge base.
How to Measure or Detect Chunks
Measure chunks at retrieval time, answer time, and dataset-regression time:
fi.evals.ChunkAttribution: Pass/Fail signal for whether the answer referenced retrieved chunks at all.fi.evals.ChunkUtilization: 0-1 score for how much useful chunk content appears in the final answer.fi.evals.ContextRelevance: score for whether the retrieved chunk can answer the user query.- Trace fields: store
retrieval.documents, chunk ID, source document ID, rank, token count, and reranker score. - Dashboard signals: chunk-attribution fail-rate by corpus, p99 chunk token count, top-k gold recall, and user escalation-rate after cited answers.
from fi.evals import ChunkAttribution
result = ChunkAttribution().evaluate(
output="Refunds are allowed within 30 days for unused plans.",
context=["Refund policy: unused plans may be refunded within 30 days."]
)
print(result.score, result.reason)
A healthy chunking setup has high attribution pass-rate, high context relevance, bounded p99 chunk length, and stable recall on a golden dataset after every re-chunking or re-embedding job.
Common Mistakes
- Treating chunk size as the strategy. Token count is only one parameter; structure, overlap, metadata, and parent links matter more on messy documents.
- Increasing overlap without checking cost. Overlap can raise recall, but it also duplicates embeddings and inflates prompt tokens per trace.
- Evaluating retrieval without answer behavior. A chunk can rank first and still be ignored by the model; measure
ChunkAttribution. - Mixing stale and fresh chunks in one index. Version chunk IDs with source timestamps so old policy text cannot outrank current text.
- Chunking tables like prose. Tables, code, and policy exceptions need structure-aware boundaries or the key value often lands in the next chunk.
Frequently Asked Questions
What is a chunk in RAG?
A chunk in RAG is a passage-sized unit cut from a source document, embedded, indexed, retrieved, and supplied to an LLM as context for an answer.
How is a chunk different from a document?
A document is the full source artifact, while a chunk is the retrieval unit derived from it. A single document may produce dozens or thousands of chunks.
How do you measure chunk quality?
FutureAGI uses fi.evals.ChunkAttribution to check whether an answer referenced retrieved chunks, with ChunkUtilization and ContextRelevance as companion signals.