Evaluation

What Is Source Attribution?

An evaluation metric that verifies generated claims cite the specific retrieved evidence, document, or URL that supports them.

What Is Source Attribution?

Source attribution is an LLM-evaluation metric that checks whether generated claims cite the exact source passages, documents, or URLs that support them. It shows up in RAG eval pipelines, agent answer spans, and production traces where engineers need claim-level evidence, not just a plausible response. FutureAGI measures it with SourceAttribution for citation quality and CitationPresence for the simpler question of whether the answer includes citations at all.

Why Source Attribution Matters in Production LLM and Agent Systems

Bad source attribution turns a correct-looking answer into an unverifiable answer. A RAG assistant can cite the right document family but the wrong paragraph; an agent can reuse a source from step 2 after a tool call changed the facts; a summarizer can attach one citation to a sentence that contains three unrelated claims. The result is not always a classic hallucination. It is often a traceability failure: nobody can prove which evidence supported which claim.

Developers feel it during debugging because the retriever, reranker, and generator all look plausible in isolation. SREs see rising thumbs-down rate, citation-click corrections, and support escalations without a clean failing span. Compliance teams feel it during audit review when an answer cites a policy page but the cited passage never says what the answer claims. End users feel it as low trust: the answer has citations, but the citations do not survive inspection.

Agentic systems make attribution harder in 2026 multi-step pipelines. One planning step retrieves context, a later tool call changes state, and the final answer blends both. Without source attribution on each answer span, teams only see the final sentence and a loose bibliography. Useful symptoms include missing source IDs, repeated citations across unrelated claims, a drop in citation click-through, and eval failures clustered by retriever version.

How FutureAGI Handles Source Attribution

FutureAGI’s approach is to separate “a citation exists” from “the citation supports the claim.” The specific FAGI surfaces are eval:SourceAttribution and eval:CitationPresence. SourceAttribution evaluates citation quality in RAG responses. CitationPresence is the lighter local metric that checks whether the answer includes citations at all. In practice, teams run both: presence catches missing references, attribution catches wrong or vague references.

Real example: a legal-support agent answers, “The vendor can terminate after 30 days’ notice [contract-42#termination].” The app is instrumented with traceAI-langchain. The retrieval span stores retrieval.documents; the answer span stores llm.output and citation markers; the dataset row stores expected source IDs from reviewed clauses. FutureAGI scores the answer with CitationPresence first, then SourceAttribution against the retrieved passages. If the answer cites the termination section but the 30-day language appears only in the renewal section, the row fails attribution even though a citation exists.

The engineer’s next action is concrete. They open failing traces by retriever version, find that a new reranker moved renewal clauses below termination clauses, and add a regression eval before release. Production alerts watch attribution fail rate by route and tenant. Unlike Ragas faithfulness, which asks whether answer claims are supported by context, source attribution adds the citation pointer: the evidence must support the claim and be the cited evidence.

How to Measure or Detect Source Attribution

Measure source attribution by comparing each generated claim, each citation marker, and the retrieved evidence available at generation time. The useful signals are:

  • fi.evals.SourceAttribution — scores whether cited evidence supports the claims in a RAG response.
  • fi.evals.CitationPresence — checks whether the answer includes citations before deeper attribution scoring.
  • fi.evals.Groundedness — validates claim support against context, even when citation pointers are absent.
  • Trace fields — keep retrieval.documents, llm.output, citation IDs, retriever version, and prompt version on the same trace.
  • Dashboard signal — attribution fail rate by model, route, retriever version, tenant, and document collection.
  • User proxy — citation correction clicks, citation-open-with-bounce events, thumbs-down rate, and compliance escalations.

Minimal Python:

from fi.evals import SourceAttribution

metric = SourceAttribution()
result = metric.evaluate(
    response="Refunds are available within 30 days [doc-7].",
    context={"doc-7": "Customers may request refunds within 30 days."},
)
print(result.score, result.reason)

Common Mistakes

The expensive errors usually come from treating references as decoration instead of evidence:

  • Checking citation presence but not citation correctness. A bracketed URL can point to an irrelevant or stale passage.
  • Scoring only the final answer in a multi-step agent. Intermediate unsupported claims often become later cited facts.
  • Trusting document-level citations for sentence-level claims. A correct document can still contain the wrong paragraph for the claim.
  • Letting retriever metadata vanish from traces. Without retrieval.documents, source IDs, and answer text, attribution becomes a manual audit.
  • Using attribution as a replacement for groundedness. Attribution proves where a claim points; Groundedness checks whether the evidence supports it.

Frequently Asked Questions

What is source attribution?

Source attribution checks whether generated claims cite the exact evidence that supports them. FutureAGI measures it with SourceAttribution for citation quality and CitationPresence for citation existence.

How is source attribution different from citation presence?

Citation presence only asks whether citations exist. Source attribution asks whether those citations point to the right supporting passage, document, or URL for the claims made.

How do you measure source attribution?

FutureAGI measures it with the SourceAttribution evaluator on response, context, and citation fields. Track attribution fail rate by retriever version, model, prompt, and trace cohort.