Models

What Is a Vector DB?

A database designed to store, index, and query high-dimensional embedding vectors using approximate nearest-neighbor search.

What Is a Vector DB?

A vector DB is the short form of vector database — a system designed to store, index, and query high-dimensional embedding vectors using approximate nearest-neighbor (ANN) search. In LLM stacks, the vector DB sits between the embedding model and the LLM: documents become embeddings, embeddings are indexed, queries are matched by similarity, and top-k chunks are returned for grounding. Common engines include Pinecone, pgvector, Weaviate, Qdrant, Milvus, ChromaDB, and LanceDB. FutureAGI does not replace any of them — we evaluate the retrievals they produce, regardless of engine.

Why It Matters in Production LLM and Agent Systems

The vector DB is the highest-impact component in a RAG pipeline; it routinely makes a 10-20 point difference on faithfulness scores. A bad retrieval feeds the LLM the wrong context and the model hallucinates around the gap, even though the model itself is fine. A good retrieval makes a smaller, cheaper model answer correctly because the right facts are in front of it.

Different roles see different failures. ML engineers tune chunk size and embedding model without measuring downstream retrieval quality. Data engineers re-index after a corpus refresh and discover thresholds tuned against the old distribution no longer hold. SREs page on similarity-search p99 latency under load. Compliance leads worry about cross-tenant leak — a near-duplicate query can pull another tenant’s chunks if partition keys are missing. Product owners see “the bot doesn’t know about X” because ingestion silently dropped a document.

In 2026 agent stacks the vector DB sees more traffic and more variation. Agents may issue several queries per user request — initial retrieval, query refinement, re-retrieval based on tool outputs. Each retrieval is a span in the trajectory and an opportunity for failure to compound. Without trace-level retrieval evals, you cannot tell whether the answer was wrong because the model failed or because the retrieval failed.

How FutureAGI Handles Vector DBs

FutureAGI’s approach is to be vendor-neutral: whatever vector DB you run, the evaluator suite is the same. With traceAI-langchain or traceAI-llamaindex, every retrieval call becomes a span carrying the chunks returned, similarity scores, and the downstream model output. The same fi.evals evaluators run regardless of whether the index is Pinecone, pgvector, or Weaviate.

Concretely: a knowledge-base RAG application stores 800 K chunks in pgvector. After a chunk-size change, evals show ContextRelevance drops from 0.82 to 0.71 on a regression cohort. The team uses traceAI to drill into specific failed traces, sees that long policy chunks were now split in half, and rolls back the chunk-size change. They then run an A/B test through the Agent Command Center: 10% of traffic uses the new chunk size, 90% the old, and per-cohort Groundedness confirms the regression is intent-specific. Unlike a vector-DB-native dashboard reporting only ANN latency and recall@k against synthetic queries, FutureAGI reports retrieval quality as a function of user outcomes.

How to Measure or Detect It

Signals to track:

  • fi.evals.ContextRelevance: 0–1 score per retrieval; primary retrieval-quality signal.
  • fi.evals.Groundedness: scores answer support against retrieved chunks.
  • fi.evals.ChunkAttribution: which chunks were used in the response.
  • Recall@k against a labeled query set: classical IR metric, useful for offline tuning.
  • Similarity-search p99 latency: leading indicator of index health under load.
  • Embedding-distribution drift: distribution shift after corpus refresh.
from fi.evals import ContextRelevance, Groundedness

cr = ContextRelevance().evaluate(input=q, context=retrieved)
gr = Groundedness().evaluate(input=q, output=answer, context=retrieved)

Common Mistakes

  • Picking a vector DB by feature list rather than benchmark. Run your own corpus and your own queries; published benchmarks rarely match your distribution.
  • Mixing embedding-model versions in one index. A re-embedding migration must be a full re-index; partial migration produces silent retrieval garbage.
  • One global k. Different intents need different k; a fixed k=5 hurts recall on long-form questions and inflates latency on short ones.
  • No partition keys for multi-tenant systems. Tenant isolation belongs in the index, not the application code.

Frequently Asked Questions

What is a vector DB?

A vector DB is the short form of vector database — a system that stores, indexes, and queries high-dimensional embedding vectors using approximate nearest-neighbor search. It is the retrieval engine for RAG and semantic search.

How do you choose a vector DB?

Choose by latency-vs-recall tradeoff, scale (Pinecone for managed cloud, pgvector for embedded SQL), filter capabilities, and integration with your embedding pipeline. Run a benchmark on your own corpus before committing.

How does FutureAGI work with a vector DB?

FutureAGI evaluates retrieval quality on top of any vector DB. With `traceAI` instrumented, every retrieval becomes a span; `fi.evals.ContextRelevance` and `Groundedness` score the retrieved chunks and downstream answer.