Best 5 Cloudflare AI Gateway Alternatives in 2026
Five Cloudflare AI Gateway alternatives scored on routing intelligence, observability depth, per-tenant chargeback, ecosystem portability, and the eval/optimizer loop that Cloudflare's gateway never grew.
Table of Contents
Cloudflare AI Gateway is the obvious first gateway. It’s free, ships behind a one-line base_url change, and any team already on Workers, R2, or D1 gets it without a new procurement loop. In 2026, the exits are running ahead of the installs.
The pattern in HN threads, /r/LLMDevs migration posts, and the AI Gateway GitHub issue tracker repeats: teams adopt Cloudflare for the easy on-ramp, hit a ceiling at three to six months of production traffic, and look for a gateway that does more than relay-and-log. The ceiling is the same five capabilities, routing intelligence beyond round-robin, observability deeper than per-request rows, per-tenant chargeback at scale, escape from Cloudflare ecosystem lock, and a feedback loop where trace data actually changes how the gateway behaves.
This guide ranks five gateways worth migrating to and walks through the migration step that always trips teams up: the URL prefix pattern.
TL;DR: pick by exit reason
| Why you are leaving Cloudflare AI Gateway | Pick | Why |
|---|---|---|
| You want trace data to feed back into routing and prompts | Future AGI Agent Command Center | Closes the loop from trace through eval to optimizer to route |
| You want a hosted gateway with the same proprietary polish but more routing depth | Portkey | Hosted virtual keys, prompt registry, and richer routing rules |
| You want a self-hosted, source-available proxy | LiteLLM | MIT-licensed proxy that runs entirely inside your VPC |
| You need enterprise SLA and plugin ecosystem on a Tier-1 API gateway | Kong AI Gateway | Extends a Kong stack with AI-specific policies and audit |
| You want a hosted gateway tightly bound to the Vercel platform | Vercel AI Gateway | Same one-line ergonomics, broader provider list, framework-native |
Why people are leaving Cloudflare AI Gateway in 2026
Five exit drivers show up repeatedly. They aren’t all equal, the first three drive most migrations.
1. Routing intelligence stops at round-robin
Cloudflare AI Gateway’s routing surface, as documented in May 2026, supports fallback chains, per-provider rate limits, and basic retry on 5xx. What it doesn’t support: cost-aware, latency-aware, or model-aware routing (Sonnet for code, Haiku for classification, GPT-4o for vision, decided per request). The fallback chain is static and round-robin within tier. Teams running hot-swap A/B tests on Sonnet 4.6 vs GPT-4.1 vs Gemini 2.5 hit the ceiling fast and either bolt a routing layer in front (defeating the point) or migrate.
2. Observability is per-request rows, not spans
The dashboard shows a flat list of requests with timestamp, model, latency, token counts, and cost. Useful for debugging a single failing call. Insufficient for an agent workload where one user action triggers eight LLM calls plus four tool calls, there’s no native span hierarchy, no session view, no agent-trace timeline. Cloudflare’s documented workaround is to emit OTel from your application code, but then the gateway isn’t the source of truth, your collector is, and you’re paying for two stacks.
3. Per-tenant chargeback gets harder at scale
The gateway exposes a cf-aig-metadata header that the client sets to tag requests by user, session, or team. The data lands in Logpush exports. For a small team, fine. For a multi-tenant SaaS billing 200 customers on per-token usage, the workflow (Logpush to R2, Athena or BigQuery query, join with the application identity table, generate invoice CSV) is a weekly batch job, not a live dashboard. /r/LLMDevs threads describe three sprints to ship a chargeback pipeline that other gateways ship native.
4. Lock-in to the Cloudflare ecosystem
The free price is the headline benefit and the lock-in vector. AI Gateway integrates cleanly with Workers AI, R2 for Logpush, D1 for application data, and the Cloudflare dashboard for routing config. A team with prompt versions in R2, audit logs in Logpush, and routing config in the dashboard has three Cloudflare touchpoints to migrate, not one. Most teams discover this after a year of accumulated artifacts.
5. No eval or optimizer loop
The largest functional gap, under-weighted at adoption. Cloudflare AI Gateway is a relay with a dashboard. It captures traces; it doesn’t score them, cluster failures, drive prompt rewrites, or update routing rules from trace data. In 2026, where Agent Command Center, Portkey’s Guardrails+Optimizer stack, and Maxim’s eval pipelines all close that loop, a relay-only gateway means prompt and routing improvements happen in spreadsheets and pull requests, not in the gateway.
What to look for in a Cloudflare AI Gateway replacement
Score replacements on the seven axes that map to the surfaces you’re actually outgrowing:
| Axis | What it measures |
|---|---|
| 1. Routing intelligence | Cost-aware, latency-aware, model-aware — or just round-robin? |
| 2. Observability depth | Span hierarchies, session views, agent traces — or per-request rows? |
| 3. Per-tenant chargeback | Native dashboard slicing by tenant — or a Logpush batch job? |
| 4. Ecosystem portability | Does the gateway lock you to a single cloud’s storage and config? |
| 5. Self-host posture | Can the gateway run inside your VPC, fully air-gapped from the vendor? |
| 6. Eval + optimizer loop | Does the gateway use its own trace data to improve routing and prompts? |
| 7. Migration tooling | Are there published guides or importers for Cloudflare specifically? |
1. Future AGI Agent Command Center: Best for closing the loop
Verdict: Future AGI is the only gateway here that fixes Cloudflare AI Gateway’s largest gap, captured traces never feed back into how the gateway behaves. Agent Command Center captures the trace, scores it with the eval library, clusters failures, runs the optimizer, and pushes the updated route or prompt back into the gateway on the next request. The other four are observation and policy layers. FAGI is an observation layer wired to an optimizer.
What it fixes versus Cloudflare AI Gateway:
- Routing intelligence beyond round-robin. Cost-aware and latency-aware routing rules ship native. Model-aware routing keys off prompt classification (Sonnet for code, Haiku for classification, GPT-4o for vision) and the classifier is itself a prompt the optimizer rewrites against eval scores.
- Span hierarchies and session views.
traceAI(Apache 2.0) instruments any agent framework. LangChain, LlamaIndex, CrewAI, plain SDK calls, producing OpenTelemetry spans with agent-trace conventions. One user action shows as one trace tree with LLM, tool, and retrieval calls nested correctly. - Per-tenant chargeback as a dashboard, not a batch job. Cost slices by session, user, repo, route, and tag are native dashboard filters. Invoice CSV is one export, not a three-sprint pipeline.
- Native eval, not bolt-on. Every captured trace is scored against task-completion, faithfulness, and tool-use rubrics by default with
ai-evaluation(Apache 2.0). Cost and quality data sit in the same row, so the routing policy can use both. - Self-improving loop.
agent-opt(Apache 2.0) rewrites prompts via six optimizers — ProTeGi, GEPA, Bayesian, MetaPrompt, RandomSearch, PromptWizard, driven by eval scores. The routing policy is itself a target the optimizer can tune. Cloudflare’s gateway is static configuration; FAGI’s is a feedback loop. - OSS instrumentation, hosted control plane.
traceAI,ai-evaluation, andagent-optare all Apache 2.0. The hosted Command Center adds RBAC, failure-cluster views, the Protect guardrails layer (median 65 ms text-mode latency per arXiv 2510.13351, 107 ms image), and AWS Marketplace procurement.
Migration from Cloudflare AI Gateway: Cloudflare’s invocation pattern sets the SDK’s base_url to https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/{provider} plus an API token. FAGI uses the same shape, change the base_url and swap the token. Routing rules, fallback chains, and metadata headers map directly. Eval and optimizer are additive, ship a like-for-like swap in two days, turn on the loop later. Timeline: three to five engineering days for the proxy cutover with shadow traffic; another sprint for eval and optimizer.
Where it falls short:
-
agent-opt is opt-in, start with traceAI + ai-evaluation in week one and turn the optimizer on once eval baselines stabilize. The loop compounds value over weeks rather than at day one.
-
For all-Cloudflare stacks (Workers, R2, D1), FAGI’s strongest surfaces sit outside the Cloudflare control plane, the integration is clean, but a single-vendor team gains a second vendor.
Pricing: Free tier with 100K traces/month. Scale tier from $99/month with linear per-trace scaling above 5M (no add-on multipliers). Enterprise with SOC 2 Type II and AWS Marketplace.
Score: 7 of 7 axes.
2. Portkey: Best for hosted depth
Verdict: Portkey is the natural step-up from Cloudflare AI Gateway when the requirement is “stay hosted but get more routing depth and a real prompt registry.” Closest functional match in hosted shape; the Palo Alto Networks April 2026 acquisition has added enterprise weight without (yet) reshaping the SMB SKU.
What it fixes versus Cloudflare AI Gateway:
- Richer routing. Conditional routing matches on metadata, prompt class, or user attribute, then sends to the right provider. Cloudflare’s fallback is static; Portkey’s is rule-driven.
- Native prompt registry. Prompt Studio stores prompts as versioned objects with diff views and rollback. Cloudflare has no first-party prompt store.
- Virtual keys. Every developer or service gets a Portkey-issued key that fans out to one underlying provider key, preserving bulk pricing while exposing per-identity attribution. Stronger than Cloudflare’s metadata header.
- Guardrails and Audit Logs as a bundled surface. Both configurable from the dashboard. Cloudflare leaves guardrails to upstream code.
Migration from Cloudflare AI Gateway: Both use the URL prefix pattern, so the base_url change is mechanical, point at https://api.portkey.ai/v1/proxy and add a virtual key header. Routing rules need re-expression in Portkey’s config (richer, so more work). The Logpush-to-Athena pipeline can be retired in favor of Portkey’s native dashboard. Timeline: five to seven engineering days.
Where it falls short:
- No optimizer. Traces inform humans, not the gateway.
- Pricing escalates above 5M requests/month; marginal cost compounds with add-ons (Guardrails, Prompt Studio, Audit Logs) enabled.
- Post-Palo Alto acquisition, the standalone SMB SKU’s pricing trajectory is uncertain on a 12-to-24-month horizon.
- Prompt Studio’s template dialect is Portkey-specific (handlebars-shaped with proprietary filters), so a future exit needs a rewrite step.
Pricing: Free tier with 10K requests/month. Scale tier from $99/month. Enterprise custom.
Score: 5 of 7 axes (missing: optimizer, fully open self-host).
3. LiteLLM: Best for self-hosted exit
Verdict: LiteLLM is the pick when the exit driver is ecosystem lock-in and the requirement is “this proxy runs entirely on our infrastructure, with source we can audit.” MIT-licensed, Python-native, the most popular self-hosted AI proxy on GitHub. You give up Cloudflare’s hosted polish; you gain full sovereignty.
What it fixes versus Cloudflare AI Gateway:
- Ecosystem portability. A Python package and a container. Runs on Fly, AWS, GCP, bare metal. Cloudflare’s gateway is bound to Cloudflare. For multi-cloud or single-cloud-but-not-Cloudflare strategies, the cleanest swap.
- Self-host posture. The proxy runs in your VPC. No telemetry leaves unless you configure an OTel sink. The answer for security reviews that flagged Cloudflare AI Gateway as a third-party data path.
- Cost curve at scale. Open-source means no per-request licensing. Compute and storage scale linearly. Enterprise tier (from ~$250/month) adds SSO, audit, and SLA without per-request escalation.
- Per-identity keys via
team_id/user_id. A cleaner chargeback primitive than Cloudflare’s metadata header, with a native dashboard rather than a Logpush batch job.
Migration from Cloudflare AI Gateway: OpenAI-compatible endpoint, provider keys, metadata-as-key all map directly. URL prefix swap is one line. LiteLLM has no first-party prompt registry; pair with Langfuse, Future AGI, or in-repo Jinja2 files. You lose Cloudflare’s hosted dashboard UX and the Logpush-to-R2 pipeline. Timeline: five to seven engineering days plus a week for whatever observability sink you choose.
Where it falls short:
- No optimizer.
- The bundled UI is the weakest in this list; polish lives in the Enterprise tier.
- Prompt-library story is a separate purchase or build.
- Self-hosted ops carries an SRE cost the Cloudflare free tier didn’t.
Pricing: Open source under MIT. Enterprise from ~$250/month for small teams.
Score: 5 of 7 axes (missing: native prompt registry, optimization loop).
4. Kong AI Gateway: Best for enterprise platform teams
Verdict: Kong AI Gateway is the pick when your platform team already runs Kong for the company’s REST APIs and the path of least resistance is to extend the existing stack with AI-specific policies. Strengths: SLA, plugin ecosystem, audit posture. Weakness: AI-specific surfaces (prompt registry, eval, optimizer) live in plugins or upstream tools, not the product.
What it fixes versus Cloudflare AI Gateway:
- Enterprise SLA and procurement. A Tier-1 API-gateway vendor for a decade. Compliance covers SOC 2, ISO 27001, and HIPAA-eligible workloads, the bar Cloudflare AI Gateway’s free tier was never sold against.
- Plugin ecosystem. Existing Kong customers reuse rate-limiting, auth, request-transformation, and OTel plugins. The AI Proxy plugin (Kong 3.6+) handles Anthropic, OpenAI, Bedrock, and Vertex passthrough including tool calls. The OTel plugin gives the span hierarchies Cloudflare’s per-request log never had.
- Self-host posture. Runs anywhere, bare metal, VPC, hybrid. Konnect (managed) is optional. No single-cloud lock-in.
- Per-tenant chargeback via consumer + tag. Consumer abstraction plus tagging plus OTel feeds Grafana or Datadog for live per-tenant dashboards. Replaces the Logpush-to-Athena batch job.
Migration from Cloudflare AI Gateway: OpenAI-compatible endpoint via the AI Proxy plugin, provider keys, consumer + tag as the metadata analog, OTel for observability, all map directly. The URL prefix swap is one line. Kong has no first-party prompt registry; pair with Langfuse, Future AGI, or an in-house Git-backed store. You lose Cloudflare’s free tier and zero-ops posture. Timeline: ten to fifteen engineering days because work spans platform (AI Proxy plugin, OTel sink, Grafana for chargeback) and application (prompt-registry replacement) teams.
Where it falls short:
- AI-specific observability is plugin-driven. The default dashboard is the API-gateway view, not the LLM-cost view.
- No optimizer, no prompt registry, no eval library.
- The two-week-plus setup means migration ROI shows up later than lighter alternatives.
- Free tier is open-source self-host only; Konnect managed starts free but scales toward enterprise plans from ~$1.5K/month.
Pricing: Kong AI Gateway is open source. Konnect (managed) starts free. Enterprise plans from ~$1.5K/month.
Score: 5 of 7 axes (missing: native prompt registry, optimizer, native AI cost dashboard).
5. Vercel AI Gateway: Best for Vercel-platform teams
Verdict: Vercel AI Gateway is the pick when the application is already on Vercel and the goal is to swap Cloudflare’s URL prefix for Vercel’s without changing anything else. Same one-line ergonomics, broader provider list, framework-native bindings to the AI SDK, procurement on the existing Vercel invoice.
What it fixes versus Cloudflare AI Gateway:
- Framework-native integration. Wired into Vercel’s AI SDK so the gateway feels like a framework feature rather than a separate vendor. For teams already on Next.js + AI SDK, the smallest delta.
- Broader provider list. As of May 2026, routes to OpenAI, Anthropic, Google, Mistral, Groq, xAI, Cohere, and Bedrock with first-class support. Failover-plus-aliasing is slightly richer than Cloudflare’s for cross-provider swaps.
- Procurement and billing on existing Vercel. Already paying Vercel for hosting? The gateway shows up on the same invoice. No new vendor onboarding, no new security review.
- Edge runtime alignment. Co-located with the application’s edge functions, removing one hop teams sometimes hit when running Vercel functions through Cloudflare AI Gateway.
Migration from Cloudflare AI Gateway: Pure URL prefix swap. Replace https://gateway.ai.cloudflare.com/v1/... with https://gateway.ai.vercel.app/v1/... and swap the token. Metadata headers are similar; observability is comparable depth (per-request, with session grouping). Timeline: two to three days.
Where it falls short:
- No optimizer.
- Observability is still per-request, slightly richer than Cloudflare’s dashboard but not span-hierarchy native. Production agent teams hit the same ceiling six months later.
- No first-party prompt registry, no eval library, no audit-log depth at the level of Kong or Portkey.
- Ecosystem lock-in is real, trading Cloudflare’s lock for Vercel’s. For teams whose exit driver was specifically “single-vendor risk,” this gateway doesn’t solve the problem.
- Pricing once usage scales is competitive with Cloudflare’s paid tier but no longer free.
Pricing: Free credit on Vercel Pro/Enterprise; usage-based above the credit, with rates per provider.
Score: 4 of 7 axes (missing: optimizer, native prompt registry, depth on chargeback and span hierarchies).
Capability matrix
| Axis | Future AGI | Portkey | LiteLLM | Kong AI Gateway | Vercel AI Gateway |
|---|---|---|---|---|---|
| Routing intelligence | Cost + latency + model-aware, optimizer-tuned | Conditional rules | Cost + latency rules | Plugin-driven | Failover + aliasing |
| Observability depth | Span hierarchies + sessions | Per-request + session | Functional UI | OTel plugin + Grafana | Per-request + grouping |
| Per-tenant chargeback | Native dashboard | Virtual-key dashboard | team_id / user_id | Consumer + tag + Grafana | Per-request usage |
| Ecosystem portability | BYOC + OSS | Hosted (closed) | MIT, runs anywhere | OSS, runs anywhere | Vercel-bound |
| Self-host posture | OSS instrumentation, hosted control plane | Hosted only | MIT, full VPC | OSS, runs anywhere | Hosted only |
| Eval + optimizer loop | Yes (ai-evaluation + agent-opt) | No | No | No | No |
| Cloudflare migration tooling | Routing + metadata mapping | URL prefix swap | URL prefix swap | Manual setup | URL prefix swap |
Migration notes: what breaks when leaving Cloudflare AI Gateway
Cloudflare AI Gateway uses the URL prefix pattern, so the cutover is a base_url change. Three surfaces around that one-line change need attention.
Re-routing client base URLs
Cloudflare AI Gateway is invoked by setting the SDK’s base_url to https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/{provider} plus an API token header. Every alternative here uses the same URL prefix shape, so the cutover is one line. But services hard-code the URL in three places: SDK initialization, runtime config (often a feature-flag file or Workers env binding), and the deployment manifest. Teams with the URL in one place finish in an afternoon; teams with it pasted across services budget two days for the audit alone.
Replacing the Logpush chargeback pipeline
If you built per-tenant chargeback as Logpush → R2 → Athena → CSV, the replacement gateway probably ships chargeback native and the pipeline can be retired. Two steps: stand up the new dashboard, validate totals match the last 30 days of Athena exports within a few percent, then turn off Logpush. Portkey and Future AGI expose per-tenant cost slices natively; LiteLLM and Kong need the consumer/tag pattern plus a Grafana board.
Reshaping routing rules
Cloudflare’s fallback chain is static and round-robin within tier. Every alternative supports richer rules, but they don’t translate one-to-one. Start with a like-for-like fallback chain on the new gateway, validate parity in shadow mode for a week, then add cost-aware, latency-aware, or model-aware routing. Redesigning routing during the cutover compounds risk.
Decision framework: Choose X if
Choose Future AGI if your reason for leaving is more than swapping vendors, you also want trace data to drive prompt rewrites and routing-policy updates, so the cost curve bends down over time. Pick this when production agent workloads are a significant line item and the OSS instrumentation (traceAI, ai-evaluation, agent-opt) plus the hosted Command Center together justify the migration.
Choose Portkey if you want the closest hosted match to Cloudflare’s shape with more routing depth, a real prompt registry, and virtual keys for multi-tenant chargeback. Pick this when you’re comfortable with the post-acquisition pricing trajectory and willing to accept the proprietary prompt-template dialect.
Choose LiteLLM if ecosystem lock-in is the dealbreaker and the requirement is “runs on our hardware, with source we can audit.” Pick this when self-host posture beats hosted polish and you have engineering budget for a separate prompt store and observability sink.
Choose Kong AI Gateway if your platform team already runs Kong and the path of least resistance is to extend the existing stack. Pick this when SLA, plugin ecosystem, and operational familiarity outweigh AI-specific shallowness.
Choose Vercel AI Gateway if your application is already on Vercel and the goal is the smallest possible delta. Pick this when “single-vendor risk” isn’t your exit driver and framework-native integration with the AI SDK is worth more than the depth you give up.
What we did not include
Three products show up in other 2026 listicles that we left out: OpenRouter (consumer-facing model marketplace, not the shape for an enterprise gateway replacement with chargeback and routing depth); Helicone (capable hosted observation layer, but routing is basic and chargeback depth is comparable to Cloudflare’s, so it solves only one of the five exit drivers); TrueFoundry (capable MLOps gateway, but Cloudflare-specific migration tooling isn’t published yet, worth a second look in Q3 2026).
Related reading
- Best 5 Portkey Alternatives in 2026
- Best LLM Gateways in 2026
- What Is an AI Gateway? The 2026 Definition
- Best AI Gateways for Agentic AI in 2026
- Cloudflare AI Gateway vs OpenRouter in 2026
Sources
- Cloudflare AI Gateway documentation, developers.cloudflare.com/ai-gateway
- Cloudflare AI Gateway routing and fallback reference, developers.cloudflare.com/ai-gateway/configuration
- Cloudflare Logpush for AI Gateway, developers.cloudflare.com/ai-gateway/observability
- Hacker News threads on AI Gateway migration patterns, Q1-Q2 2026, news.ycombinator.com
- Reddit /r/LLMDevs migration discussions, January-May 2026
- Portkey product documentation, portkey.ai/docs
- LiteLLM GitHub repository, github.com/BerriAI/litellm
- Kong AI Gateway product page, konghq.com/products/kong-ai-gateway
- Vercel AI Gateway documentation, vercel.com/docs/ai-gateway
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI traceAI, github.com/future-agi/traceAI (Apache 2.0)
- Future AGI ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
- Future AGI agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
- Future AGI Protect latency benchmark, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
Frequently asked questions
Why are people moving off Cloudflare AI Gateway in 2026?
What is the closest like-for-like alternative to Cloudflare AI Gateway?
How do I migrate off Cloudflare AI Gateway?
Is there an open-source Cloudflare AI Gateway alternative?
Which Cloudflare AI Gateway alternative is cheapest at scale?
How does Future AGI Agent Command Center compare to Cloudflare AI Gateway?
Five Pydantic AI alternatives scored on multi-agent depth, language reach, observability without Logfire, optimizer presence, and what each replacement actually fixes for teams who outgrew the type-system-first framework.
Five Eyer AI alternatives scored on multi-language SDK coverage, self-host posture, gateway and optimizer reach, and what each replacement actually fixes for teams outgrowing AI-monitoring-only tooling.
Five Replicate alternatives scored on LLM inference depth, catalog breadth, per-token versus per-second economics, and custom container support — plus the gateway-in-front pattern most teams settle on.