Gateway

What Is a Model Store?

A governed catalog of model endpoints, versions, capabilities, limits, and routing metadata used by an LLM gateway.

What Is a Model Store?

A model store is a gateway catalog that records which LLMs, embedding models, rerankers, and audio models can be called in production. It stores endpoint names, provider families, versions, credentials, capabilities, context limits, cost metadata, and routing eligibility. The gateway reads it before provider calls, retries, fallback, and traffic-mirroring. In FutureAGI, the model store appears in Agent Command Center’s models surface so engineers can control model traffic without hard-coding provider details in application code.

Why it matters in production LLM/agent systems

Model selection becomes fragile when it lives in scattered constants, environment variables, and provider-specific clients. The first failure mode is stale-model routing: an app keeps sending requests to a deprecated provider model, so traces fill with 404s or unsupported-parameter errors. The second is unsafe fallback: a primary model supports tool calling or JSON mode, but its fallback does not, causing schema-validation failures downstream.

A model store gives developers, SREs, and compliance teams one place to answer operational questions:

  • Which provider, region, and endpoint serve gpt-4o traffic?
  • Which models support function calling, long context, streaming, embeddings, or audio?
  • Which model aliases are approved for regulated workloads?
  • Which costs, rate limits, and timeout ceilings should routing policies respect?

The symptoms are easy to spot when the store is missing: unknown-model errors, sudden cost shifts after a provider default changes, p99 latency spikes after traffic moves to a slower endpoint, and fallback chains that return lower-quality outputs because the candidate model was never qualified.

For 2026-era agentic pipelines, the risk compounds. A single user task may trigger planner, retriever, tool-selection, summarization, and final-answer calls. If each step resolves model names differently, debugging becomes trace archaeology instead of engineering work.

How FutureAGI handles model stores

For gateway:models, FutureAGI maps model-store data to Agent Command Center’s models SDK resource. The inventory includes provider families such as OpenAI, Anthropic, Google, Meta, Mistral, and OpenAI Embeddings, with examples like gpt-4o, gpt-4o-mini, claude-sonnet-4, gemini-2.0-flash, llama-3.1-70b, and text-embedding-3-large. The same gateway also exposes routing-policies, prompts, guardrails, logs, and webhooks, so the model catalog is not isolated from runtime behavior.

A practical workflow looks like this:

  1. Add a provider model to the Agent Command Center models surface with its provider, alias, context window, streaming support, cost tier, and region.
  2. Attach that model to a routing policy: cost-optimized, a model fallback chain, or a traffic-mirroring experiment.
  3. Observe requests through traceAI attributes such as gen_ai.request.model, gen_ai.system, and llm.token_count.prompt.
  4. If a new model increases eval-fail-rate or p99 latency, remove it from routing eligibility or lower its weight.

FutureAGI’s approach is to treat model metadata as production control data, not documentation. Unlike LiteLLM alias maps that often sit beside application code, Agent Command Center ties the model entry to routing, fallback, trace fields, and gateway dashboards. That lets an engineer test gpt-4o-mini on mirrored traffic, compare cost and latency, then promote it into the live policy without changing the calling service.

How to measure or detect it

Measure a model store by checking whether runtime traffic matches the catalog and whether catalog changes produce expected route changes:

  • Catalog coverage - percentage of production gen_ai.request.model values that resolve to a model-store entry. Anything below 100% means traffic is bypassing governance.
  • Stale-model rate - requests to deprecated, disabled, or unqualified models per 1,000 gateway calls.
  • Capability mismatch rate - JSON, tool-calling, streaming, or embedding requests routed to models without that capability.
  • Routing eligibility drift - count of models available in the store but excluded from all routing-policies or fallback chains.
  • Cost attribution check - compare model-store price metadata with observed llm.token_count.prompt, completion tokens, and token-cost-per-trace.
  • Incident correlation - provider 404, 429, timeout, and fallback-trigger spikes grouped by model alias and provider family.

Healthy model stores make every production call explainable: a trace should show the requested alias, resolved provider model, route, cost basis, and fallback outcome.

Common mistakes

  • Treating the model store as a static spreadsheet. Runtime routing needs machine-readable metadata, not a human-maintained list copied into code.
  • Mixing model aliases with provider IDs. fast-chat-model is a product alias; claude-sonnet-4 is a provider model.
  • Storing costs without units. Per-1M-token input, output, cached-input, and audio pricing need separate fields.
  • Marking a model as fallback-safe before testing tool calls, JSON mode, refusal behavior, and latency under load.
  • Forgetting region and data-residency metadata. A cheap endpoint can still be invalid for regulated user traffic.

Frequently Asked Questions

What is a model store?

A model store is the governed catalog of model endpoints, versions, capabilities, limits, and routing metadata that an LLM gateway uses to decide which models can serve production traffic.

How is a model store different from a model registry?

A model registry usually tracks trained model artifacts and lifecycle state. A model store is runtime-facing: it tracks endpoint metadata, provider credentials, costs, regions, limits, and gateway eligibility.

How do you measure a model store?

In FutureAGI, measure it through Agent Command Center's models surface, routed-request coverage, stale-model rate, and trace fields such as gen_ai.request.model, gen_ai.system, and llm.token_count.prompt.