What Is Reranker?
A second-pass scoring model, typically a cross-encoder, that re-orders the top-k passages from a first-stage retriever to lift top-1 relevance precision.
What Is Reranker?
A reranker is a second-pass scoring model that re-orders the top-k passages returned by a first-stage retriever (usually vector search) to put the most relevant on top before generation. Most production rerankers are cross-encoders: they take the query and a candidate passage as a joint input and output a single relevance score — slower per pair than a bi-encoder embedding lookup, but far more precise because they can attend across query and passage tokens together. The pattern is retrieve-then-rerank: fetch top-20 with cheap vector search, rerank to top-3 with the cross-encoder, generate from the top-3. Rerankers do not replace vector search; they boost its precision.
Why It Matters in Production LLM and Agent Systems
Vector search has structural precision limits. Embeddings are bi-encoders — query and document are encoded independently, then compared with a single dot product or cosine. That is fast but loses query-specific signal. The result is decent recall (the right doc is usually in top-20) but mediocre top-1 precision (the right doc is often not in top-1). A reranker fixes that gap by attending across query and document tokens jointly, surfacing the actually-best match from the candidate pool.
The pain shows up across roles. ML engineers see RAG hallucinations that turn out to be ranking errors: the right chunk was retrieved but not selected. Retrieval engineers see eval-fail-rates flatten on long-tail queries because dense-only retrieval cannot distinguish near-duplicates. Product managers see “the bot quotes the wrong section” tickets when two policy clauses are similarly worded. Cost engineers see token waste from over-fetching k=20 to compensate for poor top-1.
In 2026, rerankers have become standard in any RAG system that prioritises answer quality over absolute latency. Cohere Rerank, Voyage Rerank, and Jina Reranker are the dominant hosted options; cross-encoders fine-tuned on domain pairs are common for high-stakes verticals. Newer LLM-as-reranker patterns use a small LLM to score query-passage pairs, trading more latency for better calibration. The decision is not whether to rerank — it is which reranker.
How FutureAGI Handles Rerankers
FutureAGI’s approach is to score the rerank step independently in eval and to expose it as a primitive in the gateway. On the eval side, fi.evals.Ranking scores whether the top-1 post-rerank result is actually the best of the candidate set — the canonical reranker-quality metric. fi.evals.ContextRelevance confirms the post-rerank context is useful for the query, and ChunkAttribution confirms the LLM downstream actually used it. Run these on a golden set before any reranker change, and on sampled production traces continuously to catch drift.
On the gateway side, the Agent Command Center exposes a rerank SDK resource so teams can route reranker calls through a centralised endpoint with caching, retries, and provider-fallback. Switching from Cohere Rerank to Voyage is a config change, not a code change, and traces capture which provider served each request.
A typical FutureAGI workflow: a team adds a Cohere Rerank step between their Pinecone vector-search and their LLM, expecting a precision lift. They sample the same golden dataset before and after, run Ranking, ContextRelevance, and end-to-end RAGScore, and read the deltas. Ranking improves top-1 by 18 points; ContextRelevance lifts a comparable amount; RAGScore lifts 9 points. Latency budget cost is 80ms p99 — acceptable. They ship. That is reranker decision-making with empirical signals, not vendor benchmarks.
How to Measure or Detect It
Reranker quality is measurable on every request:
fi.evals.Ranking: scores whether top-1 is the best of the candidates — the canonical reranker-quality metric.fi.evals.ContextRelevance: 0–1 on whether post-rerank context matches query intent.- MRR (Mean Reciprocal Rank): position of first correct hit on a labelled set — sensitive to top-k ordering quality.
- NDCG@k: discounted cumulative gain — measures graded relevance across the ranked list.
- Reranker-span latency: p99 on the rerank span — captured automatically by traceAI integrations.
- Top-1 to top-5 quality lift: compare
RAGScorewith rerank disabled vs enabled — the cost-benefit signal.
from fi.evals import Ranking
result = Ranking().evaluate(
input="What is the solar system?",
context=[
"The solar system consists of the Sun and celestial objects bound to it.",
"Our solar system formed 4.6 billion years ago."
]
)
print(result.score, result.reason)
Common Mistakes
- Treating the reranker as a replacement for vector search. It is a second pass on top of recall. Without a first-stage retriever feeding candidates, there is nothing to rerank.
- Reranking too few candidates. A reranker on top-3 has nothing to work with; feed it top-20 to top-50 and let it pick top-3.
- Skipping the eval delta before/after. Adding a reranker without measuring its quality lift is paying latency tax for an unproven gain.
- Using a reranker model trained on a different domain. Generic reranker models drop sharply on legal, medical, and code corpora — fine-tune or pick a domain-specific reranker.
- Caching rerank scores by exact input. Rerankers are deterministic per (query, candidate), but query phrasing varies; cache at the embedding-similarity layer, not the rerank layer.
Frequently Asked Questions
What is a reranker?
A reranker is a second-pass scoring model that re-orders the top-k passages from a first-stage retriever to put the most relevant ones on top. Most production rerankers are cross-encoders that score query and passage jointly for higher precision than vector similarity alone.
How is a reranker different from vector search?
Vector search is a first-stage retriever that fetches top-k by embedding similarity — fast but imprecise on top-1. A reranker is a second-pass scorer (typically a cross-encoder) that re-orders that top-k. Rerankers do not replace vector search; they boost its top-1 precision.
How do you measure reranker quality?
FutureAGI's fi.evals Ranking evaluator scores whether the top-1 result is the best of the candidate set, while ContextRelevance confirms the post-rerank context is actually useful. Pair with MRR and NDCG@k on a labelled set for offline evaluation.