What Is a Vector DB?
A database optimized for storing high-dimensional embedding vectors and running fast approximate nearest-neighbor similarity search.
What Is a Vector DB?
A vector DB is a database optimized for storing high-dimensional embedding vectors and running similarity search — usually approximate nearest-neighbor — at low latency and high recall. It is the retrieval engine for retrieval-augmented generation, semantic search, recommendation, deduplication, and clustering in LLM systems. Common engines include Pinecone, pgvector, Weaviate, Qdrant, Milvus, ChromaDB, and LanceDB. The vector DB sits between the embedding model and the LLM: documents go in, embeddings are stored, the query is embedded, top-k chunks come back for grounding. FutureAGI evaluates the quality of that retrieval via fi.evals.
Why It Matters in Production LLM and Agent Systems
The vector DB is the single highest-impact component in a RAG pipeline. A small change in chunk size, embedding model, or k can swing answer quality by 10–20 points on RAG faithfulness scores. A bad retrieval makes even the strongest LLM hallucinate, because the context is wrong; a good retrieval makes a smaller LLM answer correctly. Yet most teams instrument the LLM and forget the index.
Different roles see different problems. ML engineers chase chunk-size hyperparameters without measuring per-cohort recall. Data engineers see corpus refreshes that change embedding distributions and break previously-tuned similarity thresholds. SREs see latency spikes when a query asks for k=100 instead of k=10. Compliance leads worry about cross-tenant leak via near-duplicate queries. Product owners see “the bot doesn’t know about X” complaints when ingestion silently dropped a document.
In 2026 agent stacks the surface expands further. Agents don’t just retrieve once — they may rerank, refine the query, and re-retrieve based on intermediate tool output. Each retrieval call is a step in the trajectory; each is an opportunity to fail. Trace-level evaluators catch retrieval failures at the right step instead of attributing them to the final response.
How FutureAGI Handles Vector Databases
FutureAGI does not replace your vector store — Pinecone, pgvector, and Weaviate are excellent at what they do. We are the evaluation layer above them. With traceAI-langchain or traceAI-llamaindex instrumented, every retrieval becomes a span in the trace; the chunks returned, their similarity scores, and the downstream model output are all captured.
Concretely: an enterprise-search RAG pipeline indexes 1.2 M documents in Qdrant, embedded with a 1536-d OpenAI model. With FutureAGI instrumented, every query produces a trace with the retrieved chunks attached. fi.evals.ContextRelevance scores whether each retrieved chunk matches the user intent; fi.evals.Groundedness scores whether the final answer is supported by the chunks; fi.evals.ChunkAttribution flags claims that point back to no chunk at all (a hallucination signal). A nightly regression eval runs the full suite against a 500-row golden set; a Slack alert fires when ContextRelevance drops more than 5 points week-over-week. Unlike vector-DB-native dashboards that report system metrics, FutureAGI reports retrieval quality as a metric tied to user outcomes.
How to Measure or Detect It
Signals for vector-DB retrieval quality:
fi.evals.ContextRelevance: 0–1 score per retrieval; flags off-topic chunks before they reach generation.fi.evals.Groundedness: scores whether the final answer is supported by the retrieved chunks.fi.evals.ChunkAttributionandChunkUtilization: which retrieved chunks were actually used, and how thoroughly.- Recall@k: percent of relevant chunks present in the top-k for a labeled query set; tracked per tenant or corpus segment.
- Latency p99 of similarity search: a leading indicator of index health under load and after a re-index.
- Embedding-distribution drift: shifts after corpus refresh or embedding-model upgrade, useful as an early warning before quality scores drop.
- Empty-result rate: queries returning zero candidates often indicate ingestion gaps or filter mismatches.
from fi.evals import ContextRelevance, Groundedness
ctx = ContextRelevance().evaluate(input=q, context=retrieved)
grd = Groundedness().evaluate(input=q, output=answer, context=retrieved)
Common Mistakes
- Treating chunk size as a global hyperparameter. Code, prose, and tables need different chunkers; one global setting caps recall on heterogeneous corpora.
- Upgrading the embedding model without re-indexing. Mixing old and new embeddings produces silent retrieval garbage; full re-index is mandatory.
- No per-tenant partition. Shared indices leak chunks across tenants on near-duplicate queries; partition keys are not optional.
- Picking k by tradition. k=5 may be wasteful or insufficient depending on the task; tune k against
ContextRelevanceand downstream answer scores. - Confusing system-level p99 with retrieval quality. A vector DB can serve fast, on-spec ANN responses while the chunks returned are off-topic; always pair latency dashboards with eval scores.
Frequently Asked Questions
What is a vector database?
A vector database is a database optimized for storing high-dimensional embedding vectors and running fast approximate nearest-neighbor similarity search — the retrieval engine for RAG, semantic search, and recommendation.
How is a vector database different from a traditional database?
A traditional database queries by exact key, range, or full-text. A vector database queries by similarity in embedding space, returning the top-k closest vectors using ANN indices like HNSW or IVF-PQ.
How do you evaluate retrieval quality from a vector database?
FutureAGI runs `fi.evals.ContextRelevance`, `Groundedness`, and `ChunkAttribution` against every retrieval, attributing answer quality to specific retrieved chunks regardless of whether the index is Pinecone, pgvector, or Weaviate.