How is a feature vector different from an embedding?

An embedding is a feature vector produced by a learned model rather than hand-engineered. All embeddings are feature vectors; not all feature vectors are embeddings. In modern LLM stacks the two terms are often interchangeable.

How do you detect feature vector drift?

FutureAGI tracks `EmbeddingSimilarity` between training-time and production-time inputs; significant divergence between the two distributions is an embedding-drift signal that flags upstream pipeline changes.

Feature Vector: Definition, Examples & FutureAGI Guide

Q: What is a feature vector?

A feature vector is the numerical representation of an input as a fixed-length array of values that a model can consume. It is the contract between preprocessing and inference — shape, type, and semantics must match across the pipeline.

What Is a Feature Vector?

A feature vector is a fixed-length numerical representation of an input that a model can read. In classical ML, it is the encoded row of values used for training or inference; in text and RAG systems, it is usually a token sequence, prompt embedding, or document embedding. The vector’s shape, type, order, and meaning must stay consistent between preprocessing, training, retrieval, and serving. FutureAGI treats feature-vector stability as an evaluation and tracing concern because drift in this representation can cause silent prediction or retrieval regressions.

Why feature vectors matter in production LLM and agent systems

The feature vector is where many production bugs hide. A categorical encoder trained on 47 cities silently maps a 48th to “unknown” and reduces accuracy on a new market. A tokenizer version mismatch between training and inference shifts every token id by one and produces invalid outputs. An embedding model upgrade leaves the vector store populated with old vectors that no longer cluster the same way. Each is a feature-vector contract violation; none of them shows up in code review.

The pain falls on different roles. ML engineers debug a model that “worked yesterday” and find the upstream feature pipeline emitted a different shape. SREs see latency spikes that trace to a feature vector containing NaNs. Data scientists rerun the eval after a “trivial” preprocessing change and watch every number move. Compliance teams reading the EU AI Act’s data-governance article need provenance for every feature, not just the model.

In 2026-era stacks the feature vector is often an embedding produced by a learned model. That moves the failure mode from preprocessing bugs to embedding drift: when the embedding model upgrades, the entire vector store needs re-embedding or downstream retrieval starts surfacing the wrong chunks. FutureAGI’s RAG evaluators (ChunkAttribution, ChunkUtilization, Faithfulness) implicitly depend on a stable embedding contract, and EmbeddingSimilarity is the canonical drift detector: production-time distributions versus training-time should stay close.

How FutureAGI handles feature vector drift

FutureAGI’s approach is to surface embedding distribution and similarity as production signals, not just training artefacts. EmbeddingSimilarity runs as both an offline metric on Dataset and an online evaluator on production traces, tracking how close production prompts/responses are to the reference distribution the model was tuned on. When the distribution shifts past threshold, the dashboard flags it as embedding drift and engineers know to check the embedding model version, the tokenizer, the upstream preprocessor.

A concrete workflow: a RAG team upgrades from text-embedding-3-small to text-embedding-3-large. They run the new model on a sample of recent queries, compute EmbeddingSimilarity between the new and old vectors for the same inputs, and find a 0.78 mean similarity: close, but not identical. They re-embed the entire vector store rather than mixing vectors from two embedding spaces. They attach Faithfulness and ChunkAttribution to a Dataset.add_evaluation run to confirm no retrieval regression. Unlike a one-off Ragas faithfulness check, this joins vector drift, retrieval attribution, and answer quality in the same release record. For tabular ML logged through fi.datasets.Dataset, schema-drift checks compare the production feature distribution to the dataset baseline. The Agent Command Center supports traffic-mirroring so a new embedding-vector stack can run in parallel and outputs compared before cutover. We’ve found that feature-vector contract violations are easier to catch with mirroring than with offline eval alone because production data carries shape edge cases the dataset never saw.

How to measure or detect feature vector issues

Feature vector health is measured through distribution and contract checks:

EmbeddingSimilarity: cosine similarity between production-time and reference embeddings; drops flag drift.
Schema drift checks: production feature shape, type, and null rate compared to baseline.
Categorical novelty rate: proportion of category values unseen in training.
Numeric range checks: production min/max per feature against training quantile range.
Embedding-space PCA / projection: visual drift detection across releases.
Training-serving skew dashboard: per-feature distribution divergence (PSI or KL).
Trace fields: compare llm.token_count.prompt, embedding model version, and traceAI-langchain retriever spans across releases.

from fi.evals import EmbeddingSimilarity

sim = EmbeddingSimilarity()
result = sim.evaluate(
    input="reference embedding source",
    output="production embedding source"
)
print("similarity:", result.score)

Common mistakes

Skipping schema validation in serving. A column reorder or type change silently breaks the vector contract.
Mixing embedding versions in a vector store. Different embedding models live in different spaces; cosine similarity across them is meaningless.
No tokenizer version pinning. A tokenizer upgrade shifts every token id; pin in production.
Treating numeric drift as ignorable. A small mean shift can move a model across its decision boundary.
No production-time feature logging. Without raw feature logs, drift can’t be diagnosed retroactively.