What Is Reranker?
A second-pass scoring model, typically a cross-encoder, that re-orders the top-k passages from a first-stage retriever to lift top-1 relevance precision.
What Is Reranker?
A reranker is a second-pass scoring model that re-orders the top-k passages returned by a first-stage retriever (usually vector search) to put the most relevant on top before generation. Most production rerankers are cross-encoders: they take the query and a candidate passage as a joint input and output a single relevance score. slower per pair than a bi-encoder embedding lookup, but far more precise because they can attend across query and passage tokens together. The pattern is retrieve-then-rerank: fetch top-20 with cheap vector search, rerank to top-3 with the cross-encoder, generate from the top-3. Rerankers do not replace vector search; they boost its precision.
Why It Matters in Production LLM and Agent Systems
Vector search has structural precision limits. Embeddings are bi-encoders. query and document are encoded independently, then compared with a single dot product or cosine. That is fast but loses query-specific signal. The result is decent recall (the right doc is usually in top-20) but mediocre top-1 precision (the right doc is often not in top-1). A reranker fixes that gap by attending across query and document tokens jointly, surfacing the actually-best match from the candidate pool.
The pain shows up across roles. ML engineers see RAG hallucinations that turn out to be ranking errors: the right chunk was retrieved but not selected. Retrieval engineers see eval-fail-rates flatten on long-tail queries because dense-only retrieval cannot distinguish near-duplicates. Product managers see “the bot quotes the wrong section” tickets when two policy clauses are similarly worded. Cost engineers see token waste from over-fetching k=20 to compensate for poor top-1.
In 2026, rerankers have become standard in any RAG system that prioritises answer quality over absolute latency. Cohere Rerank, Voyage Rerank, and Jina Reranker are the dominant hosted options; cross-encoders fine-tuned on domain pairs are common for high-stakes verticals. Newer LLM-as-reranker patterns use a small LLM to score query-passage pairs, trading more latency for better calibration. The decision is not whether to rerank. it is which reranker.
How FutureAGI Handles Rerankers
FutureAGI’s approach is to score the rerank step independently in eval and to expose it as a primitive in the gateway. On the eval side, fi.evals.Ranking scores whether the top-1 post-rerank result is actually the best of the candidate set. the canonical reranker-quality metric. fi.evals.ContextRelevance confirms the post-rerank context is useful for the query, and ChunkAttribution confirms the LLM downstream actually used it. Run these on a golden set before any reranker change, and on sampled production traces continuously to catch drift.
On the gateway side, the Agent Command Center exposes a rerank SDK resource so teams can route reranker calls through a centralised endpoint with caching, retries, and provider-fallback. Switching from Cohere Rerank to Voyage is a config change, not a code change, and traces capture which provider served each request.
A typical FutureAGI workflow: a team adds a Cohere Rerank step between their Pinecone vector-search and their LLM, expecting a precision lift. They sample the same golden dataset before and after, run Ranking, ContextRelevance, and end-to-end RAGScore, and read the deltas. Ranking improves top-1 by 18 points; ContextRelevance lifts a comparable amount; RAGScore lifts 9 points. Latency budget cost is 80ms p99. acceptable. They ship. That is reranker decision-making with empirical signals, not vendor benchmarks.
How to Measure or Detect It
Reranker quality is measurable on every request:
fi.evals.Ranking: scores whether top-1 is the best of the candidates. the canonical reranker-quality metric.fi.evals.ContextRelevance: 0–1 on whether post-rerank context matches query intent.- MRR (Mean Reciprocal Rank): position of first correct hit on a labelled set. sensitive to top-k ordering quality.
- NDCG@k: discounted cumulative gain. measures graded relevance across the ranked list.
- Reranker-span latency: p99 on the rerank span. captured automatically by traceAI integrations.
- Top-1 to top-5 quality lift: compare
RAGScorewith rerank disabled vs enabled. the cost-benefit signal.
from fi.evals import Ranking
result = Ranking().evaluate(
input="What is the solar system?",
context=[
"The solar system consists of the Sun and celestial objects bound to it.",
"Our solar system formed 4.6 billion years ago."
]
)
print(result.score, result.reason)
Reranker landscape in 2026
In our 2026 evals, the reranker market has consolidated around a few options that consistently lift top-1 precision on RAG. The table is how we think about choices:
| Reranker | Hosting | 2026 fit |
|---|---|---|
| Cohere Rerank v3+ | Managed | Strong default; multilingual coverage |
| Voyage Rerank 2 | Managed | High precision on technical corpora |
| Jina Reranker v2 | Managed or self-hosted | Cost-sensitive deployments |
| BGE Reranker v2 | Self-hosted | Open-source, fine-tuneable |
| ColBERT v2 / ColBERTv2.5 | Self-hosted | Late-interaction precision |
| LLM-as-reranker (Claude Sonnet 4.6, GPT-5-mini) | Managed | Best precision, highest latency |
| Cross-encoder (BERT-style, fine-tuned) | Self-hosted | Domain-specific verticals |
Frontier 2026 models (Claude Opus 4.7, GPT-5.1, Gemini 3 Pro) can serve as rerankers themselves through structured prompts, which is the most accurate option but adds 200-500 ms p99 to the rerank span. On RAGTruth’s 18K labeled chunks and RAGBench, hybrid retrieval plus a Cohere or Voyage reranker typically lifts top-1 precision 15-25 points over dense-only retrieval. the single largest gain available in a RAG stack short of a model upgrade. The trade-off is real. Unlike a single vendor benchmark, FutureAGI scores Ranking, ContextRelevance, ContextPrecision, and end-to-end RAGScore against the same golden cohort with reranking enabled and disabled, so the lift is measured on your corpus, not the provider’s. The recommended 2026 default is hybrid retrieval at k=30-50, then a Cohere or Voyage reranker to top-5, with Ranking and Groundedness wired as release gates.
Common Mistakes
- Treating the reranker as a replacement for vector search. It is a second pass on top of recall. Without a first-stage retriever feeding candidates, there is nothing to rerank.
- Reranking too few candidates. A reranker on top-3 has nothing to work with; feed it top-20 to top-50 and let it pick top-3.
- Skipping the eval delta before/after. Adding a reranker without measuring its quality lift is paying latency tax for an unproven gain.
- Using a reranker model trained on a different domain. Generic reranker models drop sharply on legal, medical, and code corpora. fine-tune or pick a domain-specific reranker.
- Caching rerank scores by exact input. Rerankers are deterministic per (query, candidate), but query phrasing varies; cache at the embedding-similarity layer, not the rerank layer.
- Skipping cohort cuts on reranker eval. A reranker that lifts global top-1 by 18 points can still regress on Spanish-locale RAG pipeline queries; slice Ranking, ContextPrecision, and Groundedness by tenant, locale, and intent before promoting.
- Ignoring traceAI rerank-span attributes. Without
gen_ai.tool.name, provider id, top-n in/out, and rerank latency on the span, an incident becomes guesswork. The evaluator cannot help if the span is opaque.
Frequently Asked Questions
What is a reranker?
A reranker is a second-pass scoring model that re-orders the top-k passages from a first-stage retriever to put the most relevant ones on top. Most production rerankers are cross-encoders that score query and passage jointly for higher precision than vector similarity alone.
How is a reranker different from vector search?
Vector search is a first-stage retriever that fetches top-k by embedding similarity. fast but imprecise on top-1. A reranker is a second-pass scorer (typically a cross-encoder) that re-orders that top-k. Rerankers do not replace vector search; they boost its top-1 precision.
How do you measure reranker quality?
FutureAGI's fi.evals Ranking evaluator scores whether the top-1 result is the best of the candidate set, while ContextRelevance confirms the post-rerank context is actually useful. Pair with MRR and NDCG@k on a labelled set for offline evaluation.