RAG

What Is a Parent-Document Retriever?

A RAG retriever that indexes smaller child chunks but returns their larger source document sections as generation context.

What Is a Parent-Document Retriever?

A parent-document retriever is a RAG retrieval pattern that indexes small child chunks for precise vector search, then returns the larger parent document or section that contains the matching chunk. It shows up between the retriever and generator in production traces, where chunk IDs map back to parent IDs. FutureAGI evaluates this pattern with ChunkAttribution, ContextRelevance, and ChunkUtilization so teams can see whether parent context helped the answer or merely added distracting text.

Why It Matters in Production LLM and Agent Systems

Parent-document retrieval solves a real bug and creates a new one. Pure small-chunk retrieval often finds the exact term but returns too little surrounding evidence: the answer misses a caveat, date range, exception, or table header. Returning the parent section can fix that. The failure mode flips when the parent is too broad: the prompt now contains the matching clause plus unrelated policy text, and the model grounds itself on the wrong paragraph.

Developers feel it as retrieval tests that pass while generated answers still fail. SREs see token cost and latency rise because every matched child pulls a 2,000-token parent. Compliance teams see citations that point to the right document but not the specific clause. Product teams hear “the answer cited our handbook, but the answer is still wrong.”

The symptoms are visible if the trace keeps both child and parent IDs: high vector hit rate, flat reranker score, rising prompt tokens, lower ContextRelevance, and increased groundedness failures on long parent passages. Agentic systems amplify the risk. A research agent may retrieve a broad parent section, summarize the wrong subsection, call a tool with that result, then cite the original document as if the chain was grounded. In 2026 multi-step RAG pipelines, parent retrieval needs attribution checks at every handoff, not only final answer review.

How FutureAGI Handles Parent-Document Retrievers

FutureAGI’s approach is to score the child hit and parent context separately on /platform/evaluate. In a FutureAGI eval run, traceAI-langchain or traceAI-llamaindex captures retriever spans with app fields such as retrieval.documents, retrieval.child_chunk_id, retrieval.parent_document_id, and parent token count. The eval dataset stores the user query, returned child chunk, returned parent passage, generated answer, and source metadata. ChunkAttribution checks whether the answer referenced retrieved evidence. ContextRelevance checks whether the returned parent context was useful for the query, and ChunkUtilization helps flag parents that were mostly unused.

Example: a support knowledge base indexes 200-token child chunks but returns the full policy subsection. After a release, FutureAGI shows ChunkAttribution pass-rate steady at 94%, while ContextRelevance on parent passages drops from 0.86 to 0.62 for billing questions and prompt tokens rise 41%. The engineer opens failed traces, sees a child match under “trial refunds” returning the entire billing chapter, and changes the parent boundary to the nearest H3 section. They then rerun the regression eval and set an alert when parent-token p95 or relevance fail-rate crosses threshold.

Unlike Ragas faithfulness, which mainly asks whether the final answer is supported by the supplied context, this workflow preserves the child-to-parent link. The team can tell whether the small chunk found the right evidence, whether the parent supplied helpful surrounding context, and whether the model used either one.

How to Measure or Detect It

Measure parent-document retrieval as a mapping problem, not just a top-k ranking:

  • Child hit quality: ContextRelevance on the child chunk shows whether vector search found evidence that can answer the query.
  • Parent usefulness: ContextRelevance on the returned parent passage catches parents that add unrelated sections around a good child hit.
  • Attribution: ChunkAttribution returns a pass/fail signal for whether the final answer referenced retrieved evidence.
  • Context use: ChunkUtilization helps quantify whether useful parent content appeared in the answer or sat unused in the prompt.
  • Trace signals: store retrieval.child_chunk_id, retrieval.parent_document_id, parent token count, rank, reranker score, and source version.
  • Dashboard signals: parent-to-child token ratio, prompt-token p95, attribution fail-rate by corpus, and thumbs-down rate on cited answers.
from fi.evals import ChunkAttribution

result = ChunkAttribution().evaluate(
    output="Trial refunds are available for unused annual plans.",
    context=["Trial refund policy: unused annual plans may be refunded."]
)
print(result.score, result.reason)
Retrieval shapeChild indexedReturned to modelTrade-off
Single-vector densefull chunkmatched chunkCheap, loses surrounding context
Parent-document (section)200-token childnearest H2/H3 sectionBest context-token balance
Parent-document (page)200-token childfull pageBroad, distractor-heavy
Sentence windowsentencematched sentence + N neighborsTight, no header context
Hierarchical (auto-merging)leaf nodemerged ancestorAdaptive, more index ops

For external calibration, RAGBench (100K examples across five domains) and CRAG (Meta’s 4,409-question Comprehensive RAG benchmark) both compare flat-chunk vs hierarchical retrieval. auto-merging and parent-document patterns typically lift ContextRecall by 6-10 points and RAGTruth Groundedness by 3-5 points over flat dense retrieval on long-form sources, while inflating prompt tokens 30-60%.

Choosing parent boundaries

The most-important design choice in a parent-document retriever is where the parent boundary lands. The three patterns we see in 2026 production:

  • Section boundary (H2 / H3): the parent is the nearest section header above the matching child chunk. Good for documentation, policy pages, and structured manuals. Keeps parents focused but can miss cross-section context.
  • Page boundary: the parent is the whole page or PDF page. Easy to implement, often too broad. distractors creep in.
  • Semantic boundary: the parent is computed by clustering or topic segmentation. Best accuracy on free-form documents but expensive to maintain when the corpus updates.

Tag every parent with parent_boundary_type so the eval can compare them. We’ve seen teams switch from page boundaries to section boundaries and gain 9 points of ContextRelevance on policy QA while dropping prompt tokens 35%. Compared to LlamaIndex’s default ParentDocumentRetriever, which uses a fixed token window, the section-boundary approach respects document semantics. important when the corpus contains headers that frame the rules below them.

A second 2026-specific tip: when the parent itself is long (say 4,000+ tokens for Gemini 3 long-context flows), still rerank at the parent level. A long-context model does not need help with attention, but it does benefit from having distractor parents filtered out before generation. ChunkAttribution on the long-context output catches the cases where the parent was retrieved but ignored.

Common Mistakes

  • Returning the whole document as the parent. A document-sized parent can bury the matched chunk under unrelated clauses and raise prompt-token cost.
  • Measuring only vector recall. The child can match correctly while the expanded parent lowers ContextRelevance or distracts the generator.
  • Dropping child IDs after parent expansion. Without child-to-parent lineage, failed traces cannot show which small chunk caused the large context block.
  • Using one parent boundary for every format. FAQs, tables, policy pages, and API docs usually need different parent scopes.
  • Letting parent context bypass reranking. Rerank expanded parents too, or a weak parent can outrank a precise child hit from another source.
  • No metadata propagation. When the parent is expanded, all child metadata (timestamps, owners, version) should travel with it; without that, downstream groundedness checks lose the evidence trail.

Frequently Asked Questions

What is a parent-document retriever?

A parent-document retriever searches small child chunks but returns the larger parent document or section that contains the match. It is used in RAG when tiny chunks need surrounding context for grounded generation.

How is a parent-document retriever different from sentence-window retrieval?

Sentence-window retrieval returns nearby sentences around a matched sentence. Parent-document retrieval returns a configured parent block, section, or document for a matched child chunk.

How do you measure a parent-document retriever?

FutureAGI uses ChunkAttribution, ContextRelevance, and trace fields such as child_chunk_id and parent_document_id to compare the child hit, returned parent, and final answer.