RAG

What Is Reranker?

A second-pass scoring model, typically a cross-encoder, that re-orders the top-k passages from a first-stage retriever to lift top-1 relevance precision.

What Is Reranker?

A reranker is a second-pass scoring model that re-orders the top-k passages returned by a first-stage retriever (usually vector search) to put the most relevant on top before generation. Most production rerankers are cross-encoders: they take the query and a candidate passage as a joint input and output a single relevance score — slower per pair than a bi-encoder embedding lookup, but far more precise because they can attend across query and passage tokens together. The pattern is retrieve-then-rerank: fetch top-20 with cheap vector search, rerank to top-3 with the cross-encoder, generate from the top-3. Rerankers do not replace vector search; they boost its precision.

Why It Matters in Production LLM and Agent Systems

Vector search has structural precision limits. Embeddings are bi-encoders — query and document are encoded independently, then compared with a single dot product or cosine. That is fast but loses query-specific signal. The result is decent recall (the right doc is usually in top-20) but mediocre top-1 precision (the right doc is often not in top-1). A reranker fixes that gap by attending across query and document tokens jointly, surfacing the actually-best match from the candidate pool.

The pain shows up across roles. ML engineers see RAG hallucinations that turn out to be ranking errors: the right chunk was retrieved but not selected. Retrieval engineers see eval-fail-rates flatten on long-tail queries because dense-only retrieval cannot distinguish near-duplicates. Product managers see “the bot quotes the wrong section” tickets when two policy clauses are similarly worded. Cost engineers see token waste from over-fetching k=20 to compensate for poor top-1.

In 2026, rerankers have become standard in any RAG system that prioritises answer quality over absolute latency. Cohere Rerank, Voyage Rerank, and Jina Reranker are the dominant hosted options; cross-encoders fine-tuned on domain pairs are common for high-stakes verticals. Newer LLM-as-reranker patterns use a small LLM to score query-passage pairs, trading more latency for better calibration. The decision is not whether to rerank — it is which reranker.

How FutureAGI Handles Rerankers

FutureAGI’s approach is to score the rerank step independently in eval and to expose it as a primitive in the gateway. On the eval side, fi.evals.Ranking scores whether the top-1 post-rerank result is actually the best of the candidate set — the canonical reranker-quality metric. fi.evals.ContextRelevance confirms the post-rerank context is useful for the query, and ChunkAttribution confirms the LLM downstream actually used it. Run these on a golden set before any reranker change, and on sampled production traces continuously to catch drift.

On the gateway side, the Agent Command Center exposes a rerank SDK resource so teams can route reranker calls through a centralised endpoint with caching, retries, and provider-fallback. Switching from Cohere Rerank to Voyage is a config change, not a code change, and traces capture which provider served each request.

A typical FutureAGI workflow: a team adds a Cohere Rerank step between their Pinecone vector-search and their LLM, expecting a precision lift. They sample the same golden dataset before and after, run Ranking, ContextRelevance, and end-to-end RAGScore, and read the deltas. Ranking improves top-1 by 18 points; ContextRelevance lifts a comparable amount; RAGScore lifts 9 points. Latency budget cost is 80ms p99 — acceptable. They ship. That is reranker decision-making with empirical signals, not vendor benchmarks.

How to Measure or Detect It

Reranker quality is measurable on every request:

  • fi.evals.Ranking: scores whether top-1 is the best of the candidates — the canonical reranker-quality metric.
  • fi.evals.ContextRelevance: 0–1 on whether post-rerank context matches query intent.
  • MRR (Mean Reciprocal Rank): position of first correct hit on a labelled set — sensitive to top-k ordering quality.
  • NDCG@k: discounted cumulative gain — measures graded relevance across the ranked list.
  • Reranker-span latency: p99 on the rerank span — captured automatically by traceAI integrations.
  • Top-1 to top-5 quality lift: compare RAGScore with rerank disabled vs enabled — the cost-benefit signal.
from fi.evals import Ranking

result = Ranking().evaluate(
    input="What is the solar system?",
    context=[
        "The solar system consists of the Sun and celestial objects bound to it.",
        "Our solar system formed 4.6 billion years ago."
    ]
)
print(result.score, result.reason)

Common Mistakes

  • Treating the reranker as a replacement for vector search. It is a second pass on top of recall. Without a first-stage retriever feeding candidates, there is nothing to rerank.
  • Reranking too few candidates. A reranker on top-3 has nothing to work with; feed it top-20 to top-50 and let it pick top-3.
  • Skipping the eval delta before/after. Adding a reranker without measuring its quality lift is paying latency tax for an unproven gain.
  • Using a reranker model trained on a different domain. Generic reranker models drop sharply on legal, medical, and code corpora — fine-tune or pick a domain-specific reranker.
  • Caching rerank scores by exact input. Rerankers are deterministic per (query, candidate), but query phrasing varies; cache at the embedding-similarity layer, not the rerank layer.

Frequently Asked Questions

What is a reranker?

A reranker is a second-pass scoring model that re-orders the top-k passages from a first-stage retriever to put the most relevant ones on top. Most production rerankers are cross-encoders that score query and passage jointly for higher precision than vector similarity alone.

How is a reranker different from vector search?

Vector search is a first-stage retriever that fetches top-k by embedding similarity — fast but imprecise on top-1. A reranker is a second-pass scorer (typically a cross-encoder) that re-orders that top-k. Rerankers do not replace vector search; they boost its top-1 precision.

How do you measure reranker quality?

FutureAGI's fi.evals Ranking evaluator scores whether the top-1 result is the best of the candidate set, while ContextRelevance confirms the post-rerank context is actually useful. Pair with MRR and NDCG@k on a labelled set for offline evaluation.