Gateway

What Is an LLM Router?

The component inside an LLM gateway that selects which provider and model serve each request, using strategies like round-robin, weighted, least-latency, or cost-optimized.

What Is an LLM Router?

An LLM router is the component inside an LLM gateway that picks which provider and model handle a given request. It runs after authentication and pre-guardrails and before the provider call, evaluating a configured strategy — round-robin, weighted, least-latency, cost-optimized, or a conditional rule tree — against attributes like the requested model, user tier, region, session, or arbitrary metadata. The router returns a target (provider + model + weight) to the gateway, which then issues the upstream call. FutureAGI’s Agent Command Center exposes this surface as routing-policies.

Why it matters in production LLM/agent systems

A single hard-coded provider = "openai" works until it doesn’t. The first failure mode is rate limits — OpenAI 429s during peak hours leave the entire app down. The second is cost — paying GPT-4o prices for what gpt-4o-mini could answer. The third is region — EU users hitting a US-only provider violate data-residency rules.

An LLM router fixes all three at the same place:

  • Load distribution — split a model’s traffic across two regions or two API keys to stay under per-key rate limits.
  • Cost optimisation — route low-stakes queries to a cheaper model based on metadata.tier or input length.
  • Latency targetingleast-latency routing watches a rolling p50 across providers and prefers the fastest healthy one.
  • Compliance routing — a conditional rule like metadata.region == "eu" pins traffic to an EU-hosted Bedrock provider.
  • Canary traffic — route 10% of GPT-4o traffic to a new fine-tune via a weighted target list.

For agent systems where one task triggers 10–50 model calls, router decisions compound — choosing the wrong strategy turns a 2-second task into a 20-second one, or a $0.05 task into a $0.50 one.

How FutureAGI handles it

FutureAGI’s router lives inside Agent Command Center’s Go-based gateway and is configured via the routing-policies resource. The strategy switch in internal/routing/strategy.go resolves five names: round-robin, weighted, least-latency, cost-optimized, and adaptive (plus a fastest race-mode handled separately). For conditional routing, internal/routing/conditional.go evaluates a recursive Condition tree with operators $eq, $ne, $in, $nin, $regex, $gt, $lt, $gte, $lte, and $exists, combinable with $and, $or, $not. Fields available include model, user, stream, provider, session_id, request_id, and any metadata.<key>.

A typical production policy:

routing:
  default_strategy: "least-latency"
  conditional_routes:
    - name: "enterprise-tier"
      priority: 10
      condition: { field: "metadata.tier", op: "$eq", value: "enterprise" }
      action: { provider: "openai-dedicated" }
    - name: "eu-residency"
      priority: 20
      condition: { field: "metadata.region", op: "$eq", value: "eu" }
      action: { provider: "bedrock-eu" }
  targets:
    gpt-4o:
      - { provider: openai-east, weight: 50 }
      - { provider: openai-west, weight: 50 }

The router emits an OTel span on each decision, carrying agentcc.routing.strategy and agentcc.routing.target attributes that join the rest of the traceAI tree. Unlike LiteLLM’s Python Router class, this runs as a horizontally scaled Go service with Redis-coordinated latency tracking — every replica sees the same provider health window. FutureAGI also wires the router output into the same trace tree as pre-guardrail and post-guardrail spans, so a trace explains routing, safety, and cost together.

How to measure or detect it

Track router behaviour with these signals:

  • Target distribution — count of selections per (strategy, provider, model) tuple. Confirms weighted policies match intent.
  • Strategy decision latency — p99 of the router-decision span. Should be sub-millisecond; a spike means health-check or condition-eval is degraded.
  • Per-target healthy ratio — fraction of targets passing circuit-breaker checks at decision time.
  • Conditional-rule match rate — how often each named conditional route fires; helps detect dead rules.
  • agentcc.routing.strategy OTel attribute — joins per-trace cost and latency back to the routing decision that produced them.

When the routing-decision span shows the wrong strategy or target for a request, the operator updates the policy via the Agent Command Center routing-policies API on the FutureAGI control plane.

Common mistakes

  • Conflating the router with the gateway. The router only picks a target; caching, retry, and fallback live elsewhere.
  • Using least-latency without a warmup window — a freshly-deployed provider with zero traffic always looks “fastest”.
  • Writing conditional rules with overlapping priorities. First match wins, so order matters.
  • Hard-coding routing in app code via if/else on model — every change becomes a deploy. Move the logic into a routing policy.
  • Mixing routing weights with model fallbacks and expecting both to balance traffic. Weighted routing distributes load; fallback only fires on errors.

Frequently Asked Questions

What is an LLM router?

An LLM router is the component inside a gateway that picks the provider and model for each request based on a strategy like round-robin, weighted, least-latency, cost-optimized, or conditional rules.

Is an LLM router the same as an LLM gateway?

No. The router is one component of the gateway. The gateway also handles caching, guardrails, retries, fallback, and observability around the routing decision.

How does FutureAGI's LLM router work?

Agent Command Center exposes a routing-policies resource that supports round-robin, weighted, least-latency, cost-optimized, and conditional strategies with eq/ne/in/regex/gt/lt comparators on request attributes.