Portkey is an LLM gateway that sits between applications and model providers to route requests, cache responses, manage retries, apply fallbacks, and collect observability data.

How is Portkey different from LiteLLM?

LiteLLM is commonly used as an OpenAI-compatible proxy and provider abstraction. Portkey adds gateway governance features such as routing rules, virtual keys, caching, observability, and policy controls around provider calls.

How do you measure Portkey?

Measure Portkey with traceAI:portkey spans, provider latency, retry rate, fallback rate, cache hit rate, token cost, and eval-fail-rate using evaluators such as Groundedness or TaskCompletion.

What Is Portkey? Definition, Examples & FutureAGI Guide (2026)

What Is Portkey?

Portkey is an LLM gateway and AI-infrastructure platform that sits between an application and model providers to route requests, apply retries and fallbacks, cache responses, manage keys, and collect observability data. In production, it appears in gateway and trace layers around OpenAI, Anthropic, Bedrock, self-hosted models, and agent frameworks. FutureAGI treats Portkey as an integration surface through traceAI:portkey, so teams can connect provider routing and latency signals to evaluator results.

Why Portkey matters in production LLM/agent systems

Portkey matters because gateway behavior can turn a healthy model into a production incident. A bad retry policy can multiply a provider outage into runaway cost. A fallback route can send regulated traffic to the wrong model family. A stale cache entry can preserve an answer that was fixed in the prompt. A missing trace hop can leave SREs staring at provider logs while the actual failure came from routing, auth, cache, or timeout configuration.

The pain lands on several teams. Developers see intermittent failures that do not reproduce in direct SDK calls. SREs see p99 latency, 429s, 5xx bursts, queueing, and retry storms. Finance sees token spend rise faster than traffic. Product teams see users receive slower or less accurate answers after a provider switch. Compliance teams care because virtual keys, tenant budgets, and guardrail bypasses become audit evidence.

Agentic systems make this sharper. A single 2026-era support workflow may run a planner call, retrieval synthesis, tool selection, tool repair, final answer generation, and a post-response check. If Portkey sits on those calls, every route, cache hit, retry, and fallback changes the trace. Gateway success alone is not enough: the route also has to preserve task completion, grounding, JSON shape, and safety policy.

How FutureAGI handles Portkey

FutureAGI handles Portkey through the specific traceAI surface traceAI:portkey. The integration exists for Python and TypeScript and is used when Portkey is the gateway hop between the application, agent framework, and model providers. It does not make Portkey an evaluator. It makes the gateway leg observable inside the same trace tree as prompts, tool calls, retrieval, and model outputs.

A real workflow starts with a team routing OpenAI, Anthropic, Bedrock, and a self-hosted endpoint through Portkey. The app is instrumented with traceAI-portkey; the trace records provider target, model name, llm.token_count.prompt, llm.token_count.completion, latency, retry outcome, cache outcome, and fallback outcome. Agent Command Center can then mirror a candidate route with traffic-mirroring, block unsafe prompts with a pre-guardrail, and keep model fallback decisions tied to the same trace id.

FutureAGI’s approach is to separate gateway health from answer health. Compared with a Portkey-only or LiteLLM-only gateway dashboard, the engineer can pair routing data with Groundedness, TaskCompletion, ToolSelectionAccuracy, or JSONValidation on the returned answer. If fallback rate improves while TaskCompletion drops for the support-agent cohort, the next action is not “ship the route.” It is tighten the routing policy, disable the unsafe cache key, pin a prompt version, or run a regression eval before expanding traffic.

How to measure or detect Portkey issues

Measure Portkey as the gateway layer that shapes each provider call:

Fallback rate by route — shows whether the primary provider, model, or timeout policy is failing under real traffic.
Retry count and retry latency — catches retry storms before they inflate cost and user-visible latency.
Cache hit rate and cache age — separates useful semantic-cache wins from stale answers that should have expired.
llm.token_count.prompt and llm.token_count.completion — expose prompt bloat, completion drift, and cost changes after route edits.
p99 gateway latency — includes gateway overhead, provider latency, retries, and cache lookup time.
Eval-fail-rate-by-cohort — pairs Portkey route cohorts with Groundedness, TaskCompletion, or JSONValidation.

Minimal quality check after a Portkey route change:

from fi.evals import Groundedness

metric = Groundedness()
result = metric.evaluate(response=answer, context=context)
print(trace_id, portkey_route, fallback_used, result.score)

Treat a Portkey rollout as healthy only when gateway metrics and evaluator metrics both stay inside threshold. A route that is cheaper but less grounded is not a reliability win.

Common mistakes

Engineers usually misread Portkey when they treat the gateway as plumbing instead of a policy boundary:

Routing regulated traffic by provider availability alone; tenant, region, and data-retention rules must stay part of the route decision.
Caching by prompt text without tenant, tool state, prompt version, or retrieved-context fingerprint.
Retrying schema failures through another provider; malformed JSON usually needs JSONValidation, repair, or prompt changes.
Measuring only provider status codes while ignoring fallback quality and evaluator pass rate.
Comparing Portkey and direct SDK calls without matching temperature, max tokens, stop sequences, and model aliases.