Guides

Best 5 Self-Hosted AI Gateways in 2026

Five self-hosted AI gateways scored on Kubernetes and Docker support, HA topology, self-observability, upgrade path, license, footprint, ops burden.

April 14, 2026

19 min read

ai-gateway 2026

Table of Contents

The decision to self-host an AI gateway isn’t a license decision. It’s a deployment decision. Two teams can pick the same Apache 2.0 binary and end up with very different operational realities, one team runs a single-process container that on-call pages on twice a year, the other team runs a five-replica Kubernetes deployment with a Redis sidecar, a Postgres cluster, a Grafana dashboard, and a quarterly upgrade rehearsal. Same source. Same license. Wildly different burden.

This post scores five gateways on the seven axes that decide what self-hosting actually costs your platform team. Not “is the LICENSE file Apache 2.0”, that question is answered in our open-source listicle. The question here is harder: once you commit to running the binary on your own infrastructure, how many hours does that cost per month, and which binary gives you the least painful path to a production posture?

Self-hosting is the right choice when one of four constraints is binding: a security policy that prohibits sending prompts to a hosted gateway, an air-gapped or sovereign-cloud environment, a compliance regime requiring data residency, or a procurement preference for open-source-only software in the critical path. If the gateway concept itself is new to your team, what an AI gateway is covers the governance, routing, and observability surface in full.

TL;DR: pick by deployment posture

Posture	Pick	Why
Self-host with the gateway feeding back into prompts and routes	Future AGI Agent Command Center (BYOC)	Only entry that combines self-host with the trace-to-optimizer loop
Self-host as Python proxy your team can read line-by-line	LiteLLM	Source-available Python; engineers can audit and patch in-place
Self-host on top of the API gateway your platform team already runs	Kong AI Gateway	If Kong is already your control plane, AI Proxy plugin fits the existing ops model
Self-host as Kubernetes-native data-plane sidecar	agentgateway.dev	Pure-Go binary built for service-mesh patterns; clean K8s-native ops
Self-host as single Go binary with smallest ops surface	Maxim Bifrost	Static Go binary, optional external state, zero-config startup

The 7 axes we score on

Three years ago “self-hosted” meant “you can docker run the binary on a VPS.” In 2026 the bar is higher. A production gateway has to clear five operational checks: pinnable container digest, documented HA topology, native self-observability, simple upgrade path, and license clarity for commercial use. The seven axes operationalise those checks.

Axis	What it measures
1. Kubernetes / Docker support	Official Helm chart? Container image with semver digest? Documented `docker-compose.yml`?
2. HA topology	Documented multi-replica deployment with shared state and rolling-restart semantics
3. Gateway self-observability	Native Prometheus + OTel traces about the gateway itself, not the LLM calls it proxies
4. Upgrade path simplicity	Minor upgrades are `helm upgrade`. Major upgrades ship a migration script. Documented rollback.
5. License commercial-use clarity	Apache 2.0, MIT, or OSI permissive — not AGPLv3, not source-available-with-no-commercial-clause
6. Single-binary vs framework	One static binary you `chmod +x` and run, or a framework with external dependencies?
7. Operational burden	Estimated platform-engineer hours per month at >99.9% on production traffic

We deliberately didn’t pick on raw throughput. Throughput at 5,000 RPS is rarely the binding constraint for self-hosters, the binding constraint is how many evenings the on-call platform engineer loses to upgrades, degraded-mode debugging, and Helm-chart drift.

1. Future AGI Agent Command Center (BYOC): Best for self-host plus closed loop

Verdict: The only gateway here that ships a fully self-hostable deployment AND closes the loop from captured traces back into optimised prompts and routing policies. The other four are observation and routing layers. Agent Command Center BYOC observes, routes, AND uses the same traces to drive prompt rewrites through agent-opt.

Deployment posture:

Kubernetes / Docker support. Official Helm chart at charts.futureagi.com/agent-command-center with a published values schema. Container image futureagi/agent-command-center with semver tags and pinnable digest. docker-compose.yml for single-node dev ships in the repo.
HA topology. Documented 3-replica deployment behind a Kubernetes Service with a Redis cluster (3 master + 3 replica) for shared rate-limit and budget state and a Postgres cluster for trace metadata. Rolling restart drains in-flight requests via preStop hooks; the published topology is load-tested at 8,000 RPS sustained on a 6-node deployment.
Gateway self-observability. Native Prometheus endpoint at /-/metrics exposes 47 metrics about the gateway itself (request rate by upstream, error rate, cache hit ratio, queue depth, Redis latency, Postgres connection pool saturation). OTel traces for the gateway’s own request lifecycle are exported via OTLP, separate from the LLM-call traces, so the platform team gets gateway health without parsing application traces.
Upgrade path. Minor upgrades are helm upgrade --version X.Y.Z. Major upgrades ship a migration script in the same chart and a --dry-run mode that reports schema changes before applying them. Rollback is documented and tested.
License commercial-use clarity. The three open-source building blocks (traceAI, ai-evaluation, agent-opt) are Apache 2.0. The BYOC deployment is licensed under a commercial agreement that explicitly permits self-host inside your VPC and air-gapped environments.
Single-binary vs framework. Hybrid. The gateway is a single Go binary; Redis and Postgres are external dependencies for HA. Single-node mode can run with embedded SQLite.
Operational burden estimate. ~4-6 platform-engineer hours per month at steady state. Most of that’s Helm-chart upgrades during minor releases and quarterly review of Prometheus alert thresholds.

The loop, in a self-hosted deployment. Spans captured by traceAI get scored by ai-evaluation for faithfulness, task-completion, and tool-use correctness. Low-scoring sessions cluster by failure mode, and agent-opt (ProTeGi, Bayesian, GEPA) proposes prompt rewrites or routing-policy adjustments. The new policy ships back through Helm; the gateway picks it up on the next request. The loop runs entirely inside your infrastructure if you deploy the eval and optimizer services alongside the gateway.

The Protect guardrail layer (PII, prompt injection, hallucination) runs inline at ~67ms p50 added latency for text and ~109ms p50 for image, measured on the harness in arXiv 2510.13351. The sub-100ms budget is why teams keep Protect inline rather than async.

Where it falls short:

HA topology requires Redis and Postgres. Teams wanting a literal single-binary footprint with zero external state should look at Bifrost or agentgateway.dev.
agent-opt is opt-in, deploy with the optimizer off and turn it on once eval baselines stabilize. The optimizer compounds value, so teams typically light it up after the first month.
The chart assumes a standard Kubernetes environment (1.27+, CSI volumes, NetworkPolicy). Air-gapped clusters with custom CNI sometimes need a custom values file the support team helps construct.

Pricing. BYOC is licensed per instance with annual commercial agreements. Apache 2.0 components are free for any use. AWS Marketplace listing for procurement.

Score: 7/7 axes.

2. LiteLLM: Best for Python-native self-host your team can read

Verdict: Right pick when the security team’s first question is “can our engineers read every line of code that touches a prompt?” and the answer needs to be yes by Tuesday. MIT-licensed, source-available, Python, the most-forked self-host gateway in the category. Doesn’t optimize prompts or routes, observes, routes, reports. But does so in code your team can patch.

Deployment posture:

Kubernetes / Docker support. Official Helm chart in BerriAI/litellm. Container image ghcr.io/berriai/litellm with semver tags. docker-compose.yml in the repo.
HA topology. Multi-replica deployment with Postgres for virtual keys, spend tracking, budget state. Redis recommended for rate-limit and cache. Tested at 3 replicas; teams above 5,000 RPS typically scale to 5 replicas. Python GIL caps per-replica throughput earlier than the Go binaries in this list.
Gateway self-observability. Prometheus metrics at /metrics cover request rate, latency by upstream, virtual-key usage, spend. OTel trace export is supported but LLM-call spans and the gateway’s own internal spans share the same exporter, you filter in the collector to separate gateway health from application traces.
Upgrade path. pip install 'litellm>=X.Y.Z' or helm upgrade. Minor upgrades uneventful most months. Major upgrades occasionally require Postgres schema migration; a script ships in the repo. Berri AI’s post-incident CI/CD v2 ships Sigstore-signed builds verifiable via Rekor from v1.83.0-nightly onward.
License commercial-use clarity. MIT, verified against the upstream LICENSE. Commercial use unrestricted. Enterprise tier adds SLA, SSO, audit logs.
Single-binary vs framework. Framework. Python with FastAPI; depends on Postgres, Redis, and a Python interpreter on the host. Teams already operating Python services have this stack; teams coming from Go or Rust feel the dependency footprint.
Operational burden estimate. ~6-10 platform-engineer hours per month. Python runtime adds operational surface a static binary doesn’t have, interpreter version pinning, virtualenv management, post-March-2026 Sigstore verification.

Where it falls short:

Python runtime tops out earlier than a Go binary under high concurrency. Teams above 10,000 RPS run several replicas behind a load balancer.
No optimizer. Traces feed humans, not the gateway.
Install path remains a Python package, which has more attack surface than a static Go binary by language design. Container-only deployments mitigate; the March 2026 PyPI compromise is why “container-only” is now a procurement requirement for many teams.
UI is functional, not polished. Slicing spend by repository or feature flag means going to a SQL dashboard.

Pricing. MIT open source. Enterprise tier starts around $250/month for small teams with SLA, SSO, audit.

Score: 6/7 axes (missing: optimizer / feedback loop).

3. Kong AI Gateway: Best for self-host on the API gateway you already run

Verdict: Right pick when the platform team already runs Kong for north-south REST traffic and the path of least resistance is to extend the existing stack rather than introduce a second control plane. Strengths are operational familiarity, plugin ecosystem, and a battle-tested HA topology that pre-dates the AI workload. Weaknesses are AI-specific feature depth, much of it lives in Kong Enterprise, not the Apache 2.0 edition.

Deployment posture:

Kubernetes / Docker support. One of the most mature K8s stories in the API-gateway category. Official Helm chart at kong/charts. Kong Ingress Controller integrates with the Kubernetes Gateway API. Container image with semver and digest.
HA topology. Multi-replica deployment with declarative config (DB-less mode) or Postgres backing store. DB-less is a clean fit for GitOps, config in Git, applied via decK. HA at 3-5 replicas is standard; teams above 20,000 RPS sometimes scale to 10+ replicas. The base proxy is nginx + Lua, with decades of production hardening underneath.
Gateway self-observability. Native Prometheus plugin exposes 30+ metrics about Kong itself (request rate, latency percentiles, upstream health, plugin execution time). OTel plugin exports traces. Operational maturity is deeper than the newer AI-native projects.
Upgrade path. Mature. Minor upgrades are helm upgrade. Major upgrades ship migration scripts; Kong’s docs include a multi-version upgrade matrix. Rollback documented.
License commercial-use clarity. Apache 2.0 for Kong Gateway core. The complication is open-core: AI Semantic Cache, AI Semantic Prompt Guard, AI Anthropic Advanced, AI RAG Injector, and AI Proxy Advanced are Kong Enterprise. The pure Apache 2.0 core ships routing and basic transformations, not the full marketed AI surface.
Single-binary vs framework. Framework. Kong runs on Lua + Go plugins on top of nginx. One process to the operator, but the internals are a stack.
Operational burden estimate. ~3-5 platform-engineer hours per month for teams already running Kong. Greenfield Kong introduction costs 1-2 weeks of platform-engineer time before steady-state ops begins.

Where it falls short:

AI-specific observability is plugin-driven, not native. The default dashboard is the API-gateway view, not the LLM-cost view; finance-ready chargeback requires wiring several plugins or an external observability stack.
Open-core split is a real procurement consideration. Teams that need semantic caching or AI Anthropic Advanced under self-host pay the Enterprise license; teams happy with the open-source core get routing and basic transformations only.
No optimizer loop.
Greenfield teams pay the Kong learning curve before the AI part begins to deliver value. The pick only saves time when Kong is already in the stack.

Pricing. Kong Gateway core is Apache 2.0. Kong Konnect starts free for small teams. Kong Enterprise for SLA, support, advanced AI plugins starts around $1,500/month and scales with traffic.

Score: 6/7 axes (missing: optimizer; open-core split is a soft penalty on commercial-use clarity).

4. agentgateway.dev: Best for Kubernetes-native data-plane self-host

Verdict: Youngest entrant on this list (v0.7 in May 2026) and the right pick when the platform team prefers a Kubernetes-native data-plane built in Go that fits service-mesh patterns. Apache 2.0, pure-Go binary, integrates with the Kubernetes Gateway API as a first-class deployment surface. Trade-off is feature depth, intentionally narrow, optimised for the data-plane case rather than the full observability + budgets + optimization stack.

Deployment posture:

Kubernetes / Docker support. Helm chart in the project repo. Container image is a scratch-based Go binary (under 20 MB compressed) with a published SBOM. Deployment manifest is a Kubernetes Gateway API Gateway plus an HTTPRoute, no custom CRDs unless you want fine-grained policy.
HA topology. Stateless multi-replica from day one. Rate-limit and budget state delegated to an external store (Redis recommended). Rolling restart is uneventful, binary boots in under 200 ms. Teams report 5-replica deployments handling 12,000 RPS sustained on modest hardware.
Gateway self-observability. Native Prometheus metrics at /metrics (request rate, latency percentiles, upstream health, in-flight connection counts). OTel traces via OTLP. Narrower than Future AGI’s 47-metric set but covers the operational essentials cleanly.
Upgrade path. Minor upgrades complete in under 60 seconds for a 5-replica deployment because the binary is small and stateless. Major upgrades clean across v0.5 → v0.6 → v0.7, but the API surface is still maturing, expect occasional breaking changes during the v0.x phase.
License commercial-use clarity. Apache 2.0. Community-governed with no pending acquisition as of May 2026.
Single-binary vs framework. Closest to a literal single binary on this list. One Go binary that depends on Redis only if you enable stateful features.
Operational burden estimate. ~2-3 platform-engineer hours per month. Stateless data-plane is genuinely low-maintenance once Helm values are pinned. Most time is on v0.x version upgrades while the API stabilises toward v1.0.

Where it falls short:

Feature surface is intentionally narrow. No semantic caching in the data-plane, no native virtual-key hierarchy beyond the basics, no optimizer.
Provider list is narrower (~10 providers as of May 2026) than LiteLLM’s 100+ or Future AGI’s 100+.
Pre-v1.0 versioning means major-version upgrades carry more risk. Teams wanting “set it and forget it” should wait for v1.0 or run on a vendored fork.
Deep per-tenant cost attribution and trace-to-eval linking require wiring an external observability stack.

Pricing. Apache 2.0 open source. No vendor cloud tier; community-governed.

Score: 6/7 axes (missing: optimizer; narrow provider list is a soft penalty).

5. Maxim Bifrost: Best for single Go binary with the smallest ops surface

Verdict: Right pick when the operational principle is “smallest number of moving parts on disk.” Single static Go binary, Apache 2.0, zero external runtime dependencies in single-node mode, lowest published per-hop overhead in the category. Trade-offs are guardrail depth and observability (both thinner than Future AGI Agent Command Center or Kong) and published supply-chain hygiene posture isn’t as deep as LiteLLM’s post-incident posture.

Deployment posture:

Kubernetes / Docker support. Helm chart published. Container image maximhq/bifrost with semver tags and pinnable digest. docker run maximhq/bifrost is the documented entry point.
HA topology. Multi-replica deployment with optional external state (Redis for budgets and rate limits). Rolling restart is fast, binary boots in under 500 ms. 3-replica deployments tested at 5,000 RPS sustained, with the vendor publishing 11-microsecond P50 single-hop overhead at that load on a t3.xlarge harness.
Gateway self-observability. Prometheus metrics and OTel trace export are supported. Metrics surface is the operational essentials (request rate, error rate, latency, upstream health, cache hits) without the deep multi-tenant cost-attribution view Future AGI ships.
Upgrade path. Minor upgrades are helm upgrade or docker pull. Velocity is high and the upgrade cadence is faster than Kong’s, plan for a monthly minor-version refresh rather than quarterly.
License commercial-use clarity. Apache 2.0. Maxim runs a managed cloud tier on top of the same binary; the open-source path is fully commercial-usable.
Single-binary vs framework. True single binary. One Go executable; external state is optional. For teams wanting literal “copy one file to a server” deployment, Bifrost is the cleanest fit.
Operational burden estimate. ~2-4 platform-engineer hours per month. Single-binary footprint genuinely reduces operational tax, no Python runtime, no Lua plugin ecosystem, no CRD set to maintain.

Where it falls short:

Provider list is narrower than LiteLLM’s. Bifrost ships ~23 providers and ~1,000 model identifiers as of Q2 2026.
Built-in guardrail depth is PII plus basic moderation. Teams needing 18+ scanners under one license get a thinner library here than Future AGI Agent Command Center ships.
Supply-chain hygiene posture not publicly disclosed as of May 2026, no Sigstore signing, no public SBOM, no external incident-response engagement on record. The static Go binary structurally avoids the install-time attack surface that caused the March 2026 LiteLLM PyPI compromise, but the absence of a published artifact-signing posture is a procurement gap.
No optimizer loop.

Pricing. Apache 2.0 open source. Managed cloud tier on top of the same binary, priced commercially via Maxim.

Score: 6/7 axes (missing: optimizer; thinner published supply-chain hygiene posture).

Capability matrix

Axis	Future AGI ACC	LiteLLM	Kong AI Gateway	agentgateway.dev	Bifrost
K8s / Docker support	Helm + Docker + digest	Helm + Docker + digest	Helm + Docker + Gateway API	Helm + scratch container	Helm + Docker + digest
HA topology	3-replica + Redis + Postgres; 8K RPS tested	3-5 replica + Postgres; GIL caps per-replica	3-5 replica DB-less or Postgres; 20K+ RPS at scale	Stateless N-replica + optional Redis; 12K RPS at 5 replicas	3-replica + optional Redis; 5K RPS published
Self-observability	47 native metrics + 2-stream OTel	Prometheus + shared OTel	Prometheus + OTel plugins	Prometheus + OTel native	Prometheus + OTel
Upgrade path	Helm minor; migration script + dry-run for major	pip / Helm; Postgres migrations on major	Helm + decK; mature migration matrix	Helm; v0.x stability warning	Helm / docker pull; frequent minors
Commercial-use clarity	Apache 2.0 OSS components; commercial BYOC license	MIT	Apache 2.0 core; open-core for AI plugins	Apache 2.0	Apache 2.0
Single-binary vs framework	Hybrid (Go binary + Redis + Postgres)	Framework (Python + Postgres)	Framework (nginx + Lua + Go)	Closest to single binary	True single binary
Operational burden (hrs/mo)	4-6	6-10	3-5 (existing Kong)	2-3	2-4
Closed-loop optimization	Yes (`agent-opt`)	No	No	No	No

Decision framework: Choose X if

Choose Future AGI Agent Command Center BYOC if you want self-host plus the closed loop, the gateway capturing traces, the eval service scoring them, and the optimizer rewriting prompts and routing policies that go back into the gateway. Pick this when AI spend is large enough that the cost curve has to bend downward over time, and compliance requires the entire loop inside your VPC.

Choose LiteLLM if your security team requires engineers to read every line of code that touches a prompt, and your team is already comfortable operating Python services.

Choose Kong AI Gateway if your platform team already operates Kong for the company’s REST APIs and the path of least resistance is to extend the existing stack with AI Proxy plugins.

Choose agentgateway.dev if you run a Kubernetes-native stack and want a stateless Go data-plane that fits the Gateway API patterns your platform team already uses, and the AI-specific feature surface beyond routing isn’t load-bearing.

Choose Maxim Bifrost if your principle is “smallest number of moving parts on disk” and the workload fits inside the ~23 providers Bifrost supports.

The five mistakes teams make when picking a self-hosted gateway

Mistake	What goes wrong	Fix
Treating “OSS license” as the only deployment-posture signal	Two Apache 2.0 gateways can have wildly different ops burdens	Score on K8s/Docker, HA, observability, upgrade, burden — not just license
Choosing on benchmark throughput	11 µs P50 is real, but operational burden in hours/month is what your platform team actually pays	Use throughput as a floor check, pick on burden
Single-replica deployment “for now”	First production outage on day 47 takes the AI app down with the gateway	Deploy 3 replicas from day one
Ignoring the upgrade path until the first major release	You discover the upgrade is destructive on a Friday at 4 PM	Pin the version, dry-run upgrades, document rollback
Skipping gateway self-observability because LLM observability is enough	The gateway degrades silently; LLM-call traces look normal because the calls that get through are normal	Wire Prometheus alerts on gateway error rate and p95 latency from day one

How Future AGI closes the loop on self-hosted gateway spend

The other four picks treat self-hosting as the end state. Future AGI Agent Command Center BYOC treats it as the foundation, and adds a feedback loop on top. Six stages, all running inside your infrastructure if your deployment posture requires it:

Trace. Every gateway request produces a span tree via traceAI (Apache 2.0). Spans capture inputs, outputs, tool calls, model, upstream latency, cache hit/miss, and application-supplied attributes.
Evaluate. ai-evaluation (Apache 2.0) scores every interaction against task-completion, faithfulness, and code-correctness rubrics. Scores live alongside cost data.
Cluster. Low-scoring sessions cluster by failure mode. A common pattern: the team picked one model for all traffic and the eval data shows 60% would have been served just as well by a smaller, cheaper model.
Optimize. agent-opt (Apache 2.0). ProTeGi, Bayesian, GEPA, rewrites the system prompt or adjusts the routing policy against the clustered failures. Output is a versioned policy that ships back through Helm.
Route. The data-plane picks up the new policy on the next request. No restart; the policy reload is a hot update.
Re-deploy and roll back. The new prompt and route are versioned. If the score regresses, automatic rollback. The experiment is bounded, which makes the loop safe on production traffic.

Net effect: the same traces that prove “the gateway is healthy” become the input that makes the gateway smarter. Teams running this loop end-to-end typically see cost-per-task fall 15-30% within four to six weeks, with no developer behaviour change.

The three building blocks are traceAI, ai-evaluation, and agent-opt, all Apache 2.0 on github.com/future-agi. The BYOC deployment adds the failure-cluster view, the live Protect guardrail layer with ~67ms text and ~109ms image p50 latency per arXiv 2510.13351, RBAC, SOC 2 Type II certified, and AWS Marketplace for procurement.

What we did not include

We deliberately left three gateways out of the main five:

Envoy AI Gateway. Strong Apache 2.0 project built on Envoy filters, but deployment assumes Kubernetes + Istio fluency and the feature surface is narrower than agentgateway.dev’s at v0.5.
Helicone. Apache 2.0 with a clean self-host path, but the Mintlify acquisition in March 2026 moved the project to maintenance posture. Greenfield self-host decisions shouldn’t bet a 2026 roadmap on pre-acquisition velocity.
Apache APISIX. Strong Apache 2.0 ASF top-level project with an AI plugin layer added in 2026. Deployment posture is excellent but the AI feature surface is still maturing relative to Kong’s.

Best 7 Open Source AI Gateways in 2026, the license-class companion to this deployment-posture post
Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
Best AI Gateways for Compliance and Audit Trails in 2026

Sources

Future AGI Agent Command Center BYOC, futureagi.com/platform/monitor/command-center
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text p50, 109ms image p50)
traceAI, github.com/future-agi/traceAI (Apache 2.0)
ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
LiteLLM, github.com/BerriAI/litellm (MIT)
Kong AI Gateway, konghq.com/products/kong-ai-gateway
agentgateway.dev, agentgateway.dev
Maxim Bifrost, github.com/maximhq/bifrost (Apache 2.0)

Frequently asked questions

What is the difference between 'open source' and 'self-hosted' for AI gateways?

Open source is a license property of the source. Self-hosted is a deployment posture — where the binary runs. The two correlate but are not identical. OpenRouter is self-hostable only via the closed-source vendor offering. AGPLv3 gateways are open source but legally complicated to self-host inside a commercial product because of the network clause. Apache 2.0 and MIT gateways are clean to self-host commercially.

Can I self-host a gateway in an air-gapped environment?

Yes, with care. The five gateways here all ship container images that can be pulled into a private registry and deployed inside an air-gapped cluster. The complication is that the upstream LLM provider itself is rarely accessible from a true air-gapped environment, so most 'air-gapped' self-host deployments are actually 'egress-only to a small allowlist of provider IPs.' Future AGI Agent Command Center BYOC supports this posture explicitly, including eval and optimizer services running inside the same network boundary.

How many platform engineers does it take to operate a self-hosted AI gateway?

At steady state, less than 1 FTE for all five picks if your team is already comfortable with Kubernetes. Bifrost and agentgateway.dev come in lowest (2-4 hours/month); LiteLLM is highest (6-10 hours/month) because the Python runtime adds maintenance surface Go binaries do not. Initial deployment cost is 3-10 platform-engineer days for all five.

Is self-hosting cheaper than the hosted gateway tier?

Sometimes. The trade-off is operational cost (engineer hours + cluster cost) vs vendor markup. For workloads above ~$20K/month in LLM spend, self-hosting typically pays off. Below that threshold, the hosted tier is usually cheaper once you account for engineer hours. The decision is rarely purely financial — most self-host decisions are driven by data-residency, compliance, or air-gap requirements rather than cost.

Do I need Redis or Postgres for production self-host?

It depends. agentgateway.dev's data-plane is stateless and Redis is optional. Bifrost is single-binary with optional external state. Future AGI Agent Command Center, LiteLLM, and Kong all assume external state in their documented HA topologies. If your platform team prefers to avoid running additional databases, agentgateway.dev or Bifrost are the cleanest fits.

View all

Guides

LLM Eval with Shadow Traffic and Canary Deployment in 2026

Shadow is not canary. Mirror routing with no user effect vs percentage routing with rollback. Score-attached traffic, ACC patterns, gotchas.

Rishav Hada · May 21, 2026

12 min

Guides

Evaluating Azure OpenAI LLM Apps in 2026

Azure OpenAI eval has three Azure-specific axes: deployment-name drift, region-pinning, and Content Safety precision on benign queries. Here's the pattern.

Vrinda Damani · May 20, 2026

12 min

Guides

Evaluating AWS Bedrock Agents in 2026

Bedrock's built-in eval is dev-loop only. Score action-group correctness, KB retrieval quality, and guardrail precision/recall on every release.

Rishav Hada · May 19, 2026

11 min

TL;DR: pick by deployment posture

The 7 axes we score on

1. Future AGI Agent Command Center (BYOC): Best for self-host plus closed loop

2. LiteLLM: Best for Python-native self-host your team can read

3. Kong AI Gateway: Best for self-host on the API gateway you already run

4. agentgateway.dev: Best for Kubernetes-native data-plane self-host

5. Maxim Bifrost: Best for single Go binary with the smallest ops surface

Capability matrix

Decision framework: Choose X if

The five mistakes teams make when picking a self-hosted gateway

How Future AGI closes the loop on self-hosted gateway spend

What we did not include

Related reading

Sources

Frequently asked questions