How is a parent-document retriever different from sentence-window retrieval?

Sentence-window retrieval returns nearby sentences around a matched sentence. Parent-document retrieval returns a configured parent block, section, or document for a matched child chunk.

How do you measure a parent-document retriever?

FutureAGI uses ChunkAttribution, ContextRelevance, and trace fields such as child_chunk_id and parent_document_id to compare the child hit, returned parent, and final answer.

What Is a Parent-Document Retriever? FutureAGI Guide (2026)

Q: What is a parent-document retriever?

A parent-document retriever searches small child chunks but returns the larger parent document or section that contains the match. It is used in RAG when tiny chunks need surrounding context for grounded generation.

What Is a Parent-Document Retriever?

A parent-document retriever is a RAG retrieval pattern that indexes small child chunks for precise vector search, then returns the larger parent document or section that contains the matching chunk. It shows up between the retriever and generator in production traces, where chunk IDs map back to parent IDs. FutureAGI evaluates this pattern with ChunkAttribution, ContextRelevance, and ChunkUtilization so teams can see whether parent context helped the answer or merely added distracting text.

Why It Matters in Production LLM and Agent Systems

Parent-document retrieval solves a real bug and creates a new one. Pure small-chunk retrieval often finds the exact term but returns too little surrounding evidence: the answer misses a caveat, date range, exception, or table header. Returning the parent section can fix that. The failure mode flips when the parent is too broad: the prompt now contains the matching clause plus unrelated policy text, and the model grounds itself on the wrong paragraph.

Developers feel it as retrieval tests that pass while generated answers still fail. SREs see token cost and latency rise because every matched child pulls a 2,000-token parent. Compliance teams see citations that point to the right document but not the specific clause. Product teams hear “the answer cited our handbook, but the answer is still wrong.”

The symptoms are visible if the trace keeps both child and parent IDs: high vector hit rate, flat reranker score, rising prompt tokens, lower ContextRelevance, and increased groundedness failures on long parent passages. Agentic systems amplify the risk. A research agent may retrieve a broad parent section, summarize the wrong subsection, call a tool with that result, then cite the original document as if the chain was grounded. In 2026 multi-step RAG pipelines, parent retrieval needs attribution checks at every handoff, not only final answer review.

How FutureAGI Handles Parent-Document Retrievers

FutureAGI’s approach is to score the child hit and parent context separately. In a FutureAGI eval run, traceAI-langchain or traceAI-llamaindex captures retriever spans with app fields such as retrieval.documents, retrieval.child_chunk_id, retrieval.parent_document_id, and parent token count. The eval dataset stores the user query, returned child chunk, returned parent passage, generated answer, and source metadata. ChunkAttribution checks whether the answer referenced retrieved evidence. ContextRelevance checks whether the returned parent context was useful for the query, and ChunkUtilization helps flag parents that were mostly unused.

Example: a support knowledge base indexes 200-token child chunks but returns the full policy subsection. After a release, FutureAGI shows ChunkAttribution pass-rate steady at 94%, while ContextRelevance on parent passages drops from 0.86 to 0.62 for billing questions and prompt tokens rise 41%. The engineer opens failed traces, sees a child match under “trial refunds” returning the entire billing chapter, and changes the parent boundary to the nearest H3 section. They then rerun the regression eval and set an alert when parent-token p95 or relevance fail-rate crosses threshold.

Unlike Ragas faithfulness, which mainly asks whether the final answer is supported by the supplied context, this workflow preserves the child-to-parent link. The team can tell whether the small chunk found the right evidence, whether the parent supplied helpful surrounding context, and whether the model used either one.

How to Measure or Detect It

Measure parent-document retrieval as a mapping problem, not just a top-k ranking:

Child hit quality: ContextRelevance on the child chunk shows whether vector search found evidence that can answer the query.
Parent usefulness: ContextRelevance on the returned parent passage catches parents that add unrelated sections around a good child hit.
Attribution: ChunkAttribution returns a pass/fail signal for whether the final answer referenced retrieved evidence.
Context use: ChunkUtilization helps quantify whether useful parent content appeared in the answer or sat unused in the prompt.
Trace signals: store retrieval.child_chunk_id, retrieval.parent_document_id, parent token count, rank, reranker score, and source version.
Dashboard signals: parent-to-child token ratio, prompt-token p95, attribution fail-rate by corpus, and thumbs-down rate on cited answers.

from fi.evals import ChunkAttribution

result = ChunkAttribution().evaluate(
    output="Trial refunds are available for unused annual plans.",
    context=["Trial refund policy: unused annual plans may be refunded."]
)
print(result.score, result.reason)

Common Mistakes

Returning the whole document as the parent. A document-sized parent can bury the matched chunk under unrelated clauses and raise prompt-token cost.
Measuring only vector recall. The child can match correctly while the expanded parent lowers ContextRelevance or distracts the generator.
Dropping child IDs after parent expansion. Without child-to-parent lineage, failed traces cannot show which small chunk caused the large context block.
Using one parent boundary for every format. FAQs, tables, policy pages, and API docs usually need different parent scopes.
Letting parent context bypass reranking. Rerank expanded parents too, or a weak parent can outrank a precise child hit from another source.