Evaluation

What Is Chunk Attribution?

A binary RAG evaluation metric that returns Pass if the model's response references the retrieved context chunks at all and Fail if the chunks were effectively unused.

What Is Chunk Attribution?

Chunk attribution is a binary RAG evaluation metric that answers one causal question: did the model actually use the retrieved chunks when it produced its response, or did it ignore them and answer from parametric memory? The evaluator inspects the response against the supplied chunks and returns Pass when the output is recognisably anchored to the context, Fail when the chunks were effectively unused. It runs on every RAG span in production and on offline RAG datasets. Attribution is the cheapest sanity check on a RAG pipeline and the first metric to confirm before tuning anything else.

Why It Matters in Production LLM and Agent Systems

A well-documented production failure is “context neglect” — the retriever returns perfect chunks, the model receives them, and then the model generates an answer from its own parametric memory anyway, ignoring the context entirely. The user sees an answer that may be plausible-but-stale or plausible-but-wrong. Faithfulness is high (no claims contradict context) only because the context is largely unreferenced. Groundedness can pass too if the response happens to align by coincidence. Without chunk attribution, this entire failure mode is invisible.

The pain falls on retrieval engineers and infra teams. An ML engineer ships a 2x larger context window and answer accuracy gets worse — because the model is now drowning in context and reverting to parametric answers. A product owner sees stale numbers in answers despite having a freshly-indexed knowledge base; the retriever is fine, the model is just not reading. A team running cost optimisation puts a smaller model behind a RAG pipeline and discovers the smaller model attends to context less reliably than the bigger one — invisible until attribution is measured.

In 2026 agentic-RAG and corrective-RAG patterns, chunk attribution is the signal that triggers a re-prompt: if the model was given context but did not use it, the agent rewrites the prompt to be more explicit before retrying — rather than retrying retrieval, which is not the broken step.

How FutureAGI Handles Chunk Attribution

FutureAGI’s approach is to ship fi.evals.ChunkAttribution as the cloud-template binary evaluator paired with the continuous ChunkUtilization and ContextUtilization local metrics. Inputs to attribution are a context (string or list of strings) and an output — the evaluator returns Pass or Fail with a written reason explaining whether the context was referenced. The reason field is the diagnostic surface: when it says “the response repeats general knowledge that does not require the provided context,” you know the failure is context neglect, not poor retrieval.

Concretely: a documentation chat-bot team running on traceAI-llamaindex instruments retrieval and answer spans. They configure ChunkAttribution to score every answer span where retrieval ran. The Agent Command Center dashboard plots attribution pass-rate by model variant. When a cost-optimisation pre-guardrail swaps claude-3-5-haiku in for claude-sonnet-4, attribution pass-rate falls from 96% to 78% — the smaller model is ignoring context far more often. The team adds a pre-guardrail that prepends “Answer using only the provided documents” to the prompt, and attribution recovers to 93% without re-routing model traffic.

We have found that pairing ChunkAttribution (binary, gate) with ChunkUtilization (continuous, gauge) is the only reliable way to separate “context neglected entirely” from “context partially used” — they answer different questions and you need both.

How to Measure or Detect It

Chunk attribution is directly measurable. Wire up:

  • fi.evals.ChunkAttribution — Pass/Fail with a reason string explaining whether the chunks were referenced.
  • fi.evals.ChunkUtilization — continuous 0-1 partner that measures how much of each chunk was actually used.
  • fi.evals.ContextUtilization — entity-and-phrase-level utilisation score, complementary signal.
  • OTel attributes retrieval.documents and llm.output — the spans an attribution evaluator depends on.
  • Attribution pass-rate by model variant (dashboard) — exposes context-neglect regressions tied to model swaps.

Minimal Python:

from fi.evals import ChunkAttribution

evaluator = ChunkAttribution()

result = evaluator.evaluate(
    output="Paris is the capital city of France.",
    context=[
        "Paris is the capital and largest city of France.",
        "Paris is known for its art museums and fashion districts."
    ]
)
print(result.score, result.reason)

Common Mistakes

  • Conflating chunk attribution with chunk utilization. Attribution is binary — used or not. Utilization is continuous — how much was used. They answer different questions; do not collapse them.
  • Reading attribution Pass as proof of correctness. Pass only means the chunks were referenced — the answer can still be wrong, contradictory, or partial. Pair with Faithfulness and AnswerRelevancy.
  • Skipping attribution because groundedness passes. Groundedness can pass on coincidental alignment; attribution catches the case where the model ignored chunks but happened to answer correctly anyway.
  • Running attribution without storing the retrieved chunks on the trace. If retrieval.documents is missing, the evaluator silently fails open. Verify the attribute is present before deploying.
  • Treating a single attribution failure as a model bug. First check whether the prompt actually instructs the model to use the context; many “context neglect” failures are prompt failures.

Frequently Asked Questions

What is chunk attribution?

Chunk attribution is a Pass/Fail RAG metric that checks whether a model used the retrieved context at all when generating its answer. It does not measure how much it used — only whether the chunks are reflected in the response.

How is chunk attribution different from chunk utilization?

Attribution is binary — did the model use the chunks at all. Utilization is continuous — how much of each chunk did the model actually use. Attribution is the gate; utilization is the gauge.

How do you measure chunk attribution?

FutureAGI's fi.evals.ChunkAttribution takes the retrieved context (string or list of strings) and the model output, then returns Pass or Fail with a written reason explaining whether the context was acknowledged.