RAG

What Is a Knowledge Graph?

A structured representation of facts as a graph of typed entities and typed relationships, supporting multi-hop reasoning and explicit, auditable facts in retrieval.

What Is a Knowledge Graph?

A knowledge graph (KG) is a structured representation of facts as a graph of typed entities and typed relationships — for example, (Aspirin, treats, Headache) or (FutureAGI, headquartered_in, San Francisco). Unlike a flat document corpus that vector RAG retrieves chunks from, a KG encodes explicit relationships, hierarchies, and constraints, which supports multi-hop reasoning the model can audit. Modern LLM stacks combine KGs with vector retrieval — sometimes called graph-augmented RAG — to ground multi-hop questions where dense retrieval alone misses connections, and to constrain agent reasoning with explicit, auditable facts.

Why It Matters in Production LLM and Agent Systems

Vector RAG works well when the answer lives in one chunk. It struggles when the answer requires combining facts: “Which board members of companies that acquired startups founded by ex-Google engineers also serve on the board of a healthcare nonprofit?” That kind of multi-hop query is what knowledge graphs were built for. The cost of using vector RAG alone in those cases is silent failure — the retriever returns chunks loosely related to one entity, the LLM stitches plausible-sounding but unsupported claims, and faithfulness evals quietly drop.

ML engineers feel this when Faithfulness regresses on long-tail multi-hop questions but stays fine on single-hop queries. Product managers see it as “the chatbot can answer simple questions but anything multi-step is wrong.” Compliance teams see it in regulated domains (legal, medical) where the answer must cite a chain of explicit facts, not a vibe of relevance.

In 2026 agent stacks, knowledge graphs increasingly power agent memory and planning. An agent that reasons over a project graph (entities, dependencies, owners) makes better decisions than one that re-retrieves chunks each turn. The shift to KG-augmented retrieval also enables auditability: regulators can ask “which facts grounded this decision,” and the agent can return a subgraph instead of a paragraph. The principle: KGs add structure where vector RAG only has similarity.

How FutureAGI Handles Knowledge-Graph-Backed RAG

FutureAGI does not build knowledge graphs — we evaluate the LLM applications that consume them. At eval level, fi.evals.Faithfulness and fi.evals.Groundedness validate whether the response is grounded in the retrieved subgraph (after the graph is serialized to context). fi.evals.ContextRelevance scores subgraph relevance to the query. For multi-hop QA, a CustomEvaluation can wrap a graph-aware golden answer set and grade whether the response’s claim chain matches the ground-truth path.

At trace level, traceAI integrations such as traceAI-langchain, traceAI-llamaindex, and traceAI-langgraph emit OpenTelemetry spans for both vector and graph retrieval steps; the dashboard slices Faithfulness and ContextRelevance by retrieval mode (vector-only, graph-only, hybrid) so the team can quantify KG-augmentation lift.

Concretely: a legal-research agent team runs hybrid retrieval — vector RAG for breadth, KG traversal for citation chains. They build a 200-row multi-hop golden dataset with explicit hop paths and load it into a Dataset. Dataset.add_evaluation runs Faithfulness plus a CustomEvaluation that scores hop-path correctness. Adding KG retrieval lifts multi-hop Faithfulness from 0.66 to 0.84, and the regression eval pins this lift across releases. FutureAGI’s view: KGs add a different signal than vectors — measure that signal explicitly, don’t bury it in an aggregate score.

How to Measure or Detect It

KG-augmented RAG quality is observable through retrieval and generation evaluators:

  • fi.evals.Faithfulness — response grounded in retrieved subgraph (serialized as context).
  • fi.evals.ContextRelevance — subgraph or KG-fact relevance to the query.
  • fi.evals.Groundedness — overall response-grounding signal across vector and graph retrieval.
  • Hop-path correctness — custom eval scoring whether the response follows a valid graph path.
  • Multi-hop accuracy by hop count — sliced by 1-hop, 2-hop, 3-hop questions; KG augmentation should lift 2+ hops.
  • Subgraph-size distribution — how many entities/edges are retrieved per query; instability flags retriever drift.
from fi.evals import Faithfulness, ContextRelevance

# Subgraph serialized as a fact list before scoring
context = [
    "Aspirin treats Headache.",
    "Aspirin is_a NSAID.",
    "NSAID risk_of GI_bleeding.",
]

f = Faithfulness().evaluate(
    input="What is the GI risk of aspirin?",
    output="Aspirin is an NSAID, which carries GI bleeding risk.",
    context="\n".join(context),
)
print(f.score, f.reason)

Common Mistakes

  • Treating the KG as static. Real-world facts change; schedule KG refresh and rerun evals after every refresh.
  • Skipping subgraph serialization checks. The model only sees serialized context; verify the serialization preserves the graph’s structure.
  • Mixing entity types without typing. Untyped graphs lose the property that makes KG superior to vector RAG.
  • Replacing vector RAG entirely. KGs miss long-tail content not yet captured as triples; hybrid retrieval beats pure-graph for breadth.
  • Not scoring multi-hop separately. Aggregate accuracy hides hop-specific regressions; slice by hop count.

Frequently Asked Questions

What is a knowledge graph?

A knowledge graph is a structured representation of facts as a graph of typed entities and typed relationships — supporting multi-hop reasoning, hierarchy, and explicit constraints that flat document corpora can't represent.

How is a knowledge graph different from vector RAG?

Vector RAG retrieves chunks by semantic similarity. A KG retrieves connected facts via graph traversal — better for multi-hop questions and explicit relationships. Modern KG-augmented RAG combines both.

How do you measure KG-augmented RAG quality?

Use FutureAGI's `Faithfulness` and `ContextRelevance` evaluators on retrieved subgraphs converted to context, plus `Groundedness` for response-level grounding. Track multi-hop answer accuracy on a graph-aware golden dataset.