Models

What Is a Vector DB?

A database designed to store, index, and query high-dimensional embedding vectors using approximate nearest-neighbor search.

What Is a Vector DB?

A vector DB is the short form of vector database. a system designed to store, index, and query high-dimensional embedding vectors using approximate nearest-neighbor (ANN) search. In LLM stacks, the vector DB sits between the embedding model and the LLM: documents become embeddings, embeddings are indexed, queries are matched by similarity, and top-k chunks are returned for grounding. Common engines include Pinecone, pgvector, Weaviate, Qdrant, Milvus, ChromaDB, LanceDB, Turbopuffer, and MongoDB Atlas Vector Search. FutureAGI does not replace any of them. we evaluate the retrievals they produce, regardless of engine. The 2026 shift worth noting is that disk-first engines (Turbopuffer, LanceDB) have eaten share from in-memory-only stores for corpora over ~10M chunks where cost matters more than absolute QPS.

Why vector DBs matter in production LLM and agent systems

The vector DB is the highest-impact component in a RAG pipeline; it routinely makes a 10-20 point difference on faithfulness scores. A bad retrieval feeds the LLM the wrong context and the model hallucinates around the gap, even though the model itself is fine. A good retrieval makes a smaller, cheaper model answer correctly because the right facts are in front of it.

Different roles see different failures. ML engineers tune chunk size and embedding model without measuring downstream retrieval quality. Data engineers re-index after a corpus refresh and discover thresholds tuned against the old distribution no longer hold. SREs page on similarity-search p99 latency under load. Compliance leads worry about cross-tenant leak. a near-duplicate query can pull another tenant’s chunks if partition keys are missing. Product owners see “the bot doesn’t know about X” because ingestion silently dropped a document.

In 2026 agent stacks the vector DB sees more traffic and more variation. Agents may issue several queries per user request. initial retrieval, query refinement, re-retrieval based on tool use outputs. Each retrieval is a span in the agent trajectory and an opportunity for failure to compound. Without trace-level retrieval evals, you cannot tell whether the answer was wrong because the model failed or because the retrieval failed.

How FutureAGI handles vector DBs

FutureAGI’s approach is to be vendor-neutral: whatever vector DB you run, the evaluator suite is the same. With traceAI-langchain or traceAI-llamaindex, every retrieval call becomes a span carrying the chunks returned, similarity scores, and the downstream model output. The same fi.evals evaluators run regardless of whether the index is Pinecone, pgvector, or Weaviate.

For offline calibration, public RAG benchmarks anchor the “is this acceptable?” question: RAGTruth (18K labeled chunks) reports frontier groundedness failure rates of roughly 5-8%, RAGBench spans 100K examples across five industry verticals, and CRAG (4,409 questions, KDD Cup 2024) covers five domains with deliberately noisy retrieval. running a candidate vector index through these gives a public baseline before any production swap. Concretely: a knowledge-base RAG application stores 800 K chunks in pgvector. After a chunk-size change, evals show ContextRelevance drops from 0.82 to 0.71 on a regression eval cohort. The team uses traceAI to drill into specific failed traces, sees that long policy chunks were now split in half, and rolls back the chunk-size change. They then run an A/B test through the Agent Command Center: 10% of traffic uses the new chunk size, 90% the old, and per-cohort Groundedness confirms the regression is intent-specific. Unlike a vector-DB-native dashboard reporting only ANN latency and recall@k against synthetic queries, FutureAGI reports retrieval quality as a function of user outcomes.

Index types worth knowing in 2026

IndexRecall vs latencyMemoryBest for
HNSWHigh recall, sub-10msRAM-heavyhot search on under 100M chunks
IVF-PQLower recall, very fastquantizedlarge corpora, billion-scale
ScaNNStrong tradeoffmidmixed scale, Google ecosystem
DiskANNHigh recall on SSDlow RAMcost-sensitive cold corpora
Flat (exact)100% recall, slowlinear in Nsmall dev indices, benchmarks

How to measure or detect vector DB retrieval quality

Signals to track:

  • fi.evals.ContextRelevance: 0–1 score per retrieval; primary retrieval-quality signal.
  • fi.evals.Groundedness: scores answer support against retrieved chunks.
  • fi.evals.ChunkAttribution: which chunks were used in the response.
  • Recall@k against a labeled query set: classical IR metric, useful for offline tuning.
  • Similarity-search p99 latency: leading indicator of index health under load.
  • Embedding-distribution drift: distribution shift after corpus refresh.
  • Empty-result rate: a sudden rise often signals ingestion gaps or filter mismatches.
from fi.evals import ContextRelevance, Groundedness

for row in rag_dataset:
    cr = ContextRelevance().evaluate(input=row.query, context=row.retrieved)
    gr = Groundedness().evaluate(input=row.query, output=row.answer, context=row.retrieved)
    row.attach_scores(context_relevance=cr.score, groundedness=gr.score)

Common mistakes

  • Picking a vector DB by feature list rather than benchmark. Run your own corpus and your own queries; published benchmarks rarely match your distribution.
  • Mixing embedding-model versions in one index. A re-embedding migration must be a full re-index; partial migration produces silent retrieval garbage.
  • One global k. Different intents need different k; a fixed k=5 hurts recall on long-form questions and inflates latency on short ones.
  • No partition keys for multi-tenant systems. Tenant isolation belongs in the index, not the application code.
  • Skipping reranking on the top-50. A small cross-encoder rerank often closes the gap between cheap embeddings and expensive ones.
  • Trusting recall@k as the only metric. A high recall with off-topic top-1 still hurts ContextRelevance; pair both signals.

Frequently Asked Questions

What is a vector DB?

A vector DB is the short form of vector database. a system that stores, indexes, and queries high-dimensional embedding vectors using approximate nearest-neighbor search. It is the retrieval engine for RAG and semantic search.

How do you choose a vector DB?

Choose by latency-vs-recall tradeoff, scale (Pinecone for managed cloud, pgvector for embedded SQL), filter capabilities, and integration with your embedding pipeline. Run a benchmark on your own corpus before committing.

How does FutureAGI work with a vector DB?

FutureAGI evaluates retrieval quality on top of any vector DB. With `traceAI` instrumented, every retrieval becomes a span; `fi.evals.ContextRelevance` and `Groundedness` score the retrieved chunks and downstream answer.