Models

What Is Enrichment?

The process of adding context, attributes, or metadata to raw data so downstream models, retrievers, or agents have more signal to act on.

What Is Enrichment?

Enrichment is an inference-time data preparation pattern that adds context, attributes, or metadata to raw inputs so downstream models, retrievers, or agents have better signal. In an LLM system it can attach customer-plan fields to a support query, tag a document before retrieval, expand a query before vector search, or add eval scores to a production trace. FutureAGI treats enrichment as measurable: compare enriched and unenriched paths for retrieval recall, answer faithfulness, and agent decision quality before shipping the pipeline.

Why Enrichment matters in production LLM and agent systems

The quality ceiling of an LLM application is often set by the prompt context, not the model. A good enrichment layer raises that ceiling for free; a bad one lowers it silently. In RAG, attaching customer_tier and product_line filters to a query routes the retriever to the right shard before vector search runs, cutting out a class of irrelevant chunks. In an agent stack, enriching a tool call with region, language, and risk_score lets the agent skip an unnecessary clarification turn. In analytics, enriching every trace with eval scores converts raw logs into a dataset you can slice by failure mode.

The pain shows up across roles. A retrieval engineer watches ContextRelevance stay flat after weeks of prompt tuning, then realizes the chunk metadata never made it into the index — every chunk lacked category and published_at, so the metadata filter was a no-op. An agent developer sees the agent re-asking the user for region on every turn because the user-profile enrichment is dropped between sessions. A data scientist trying to find a regression cohort cannot — because the trace was logged without prompt_version.

In 2026 multi-step agents, enrichment is the difference between a planner that can route on tool_risk_score and one that brute-forces every tool call. Step-level enrichment (attaching evaluator scores per planner step) is what turns a multi-step trace into something you can debug and regress on.

How FutureAGI evaluates enrichment

FutureAGI does not transform your raw business data, but it evaluates whether your enrichment actually helps. FutureAGI’s approach is to score the enriched and unenriched path side by side, then make the enrichment version visible on the trace. The fi.evals.ContextRelevance evaluator scores whether retrieved chunks after enrichment match the query. Faithfulness scores whether the final answer respects the enriched context. Inside Agent Command Center, every gateway request can be enriched at the route layer with prompt_version, model_id, customer_tier, and route, and those fields land on the traceAI span automatically, so cohort analysis later is a query instead of an export-and-rejoin.

A concrete pattern: a customer-support team adds customer_plan, region, and last_ticket_topic to every agent request before it reaches the LLM. They run the same eval cohort with and without the enrichment, score ContextRelevance and Faithfulness, and find a 7-point lift on the unenriched baseline. The enrichment pipeline ships behind a gateway feature flag; if the lift collapses on a new model version, the team can disable the enrichment without redeploying app code. Compared with a Langfuse-style trace-attribute approach that records but does not score, FutureAGI lets the eval layer attribute the lift to the right pipeline stage.

The engineer’s next step on a regression is to compare the enriched-vs-unenriched cohort scores and decide whether the enrichment, the model, or the retriever is the cause. The trace already has the answer.

How to measure enrichment

Treat enrichment as an A/B problem with eval coverage:

  • ContextRelevance — query-vs-chunk relevance score; should rise when the enrichment improves retrieval.
  • ContextRecall — fraction of needed chunks retrieved; rises with better metadata filtering.
  • Faithfulness — whether the answer respects the enriched context.
  • Span attributescustomer_tier, route, prompt_version, model_id, enrichment_version on every trace.
  • Cohort lift dashboard — pass-rate delta between enriched and unenriched cohorts on the same dataset.
from fi.evals import ContextRelevance

result = ContextRelevance().evaluate(
    input="enterprise customer asking about volume discount",
    context="Volume discount policy for enterprise customers (Plan: Enterprise).",
)
print(result.score)

Common mistakes

  • Enriching without measuring. Adding metadata that nobody scores is overhead with no proof of lift.
  • Stale enrichment. A customer_tier attached at request creation but never refreshed on plan changes is worse than no enrichment.
  • Schema drift. Adding a new enrichment field without updating downstream prompts means the field is invisible to the model.
  • Over-enrichment. Stuffing 20 metadata fields into the prompt context burns tokens and dilutes the actual signal.
  • Skipping enrichment of the trace itself. Without prompt_version and enrichment_version on the span, regression debugging is guesswork.

Frequently Asked Questions

What is enrichment in AI?

Enrichment is adding context, attributes, or metadata to raw data so downstream models or retrievers have more signal — for example, tagging a support query with customer plan and tier before sending it to an agent.

How is enrichment different from data augmentation?

Data augmentation creates new training examples (paraphrases, synthetic mutations) to improve model robustness. Enrichment adds attributes to existing data points to improve inference-time decisions, retrieval, or analytics.

How do you measure whether enrichment is working?

Compare ContextRelevance, ContextRecall, and final answer-quality scores between enriched and unenriched cohorts. FutureAGI's regression evaluators expose the lift directly per evaluator.