Best 5 AI Gateways for Aider with Local Models in 2026
Five AI gateways scored on Aider with local models in 2026: OpenAI-compatible passthrough to Ollama and vLLM, fallback to hosted, GPU-aware routing, observability, and what each gateway misses.
Table of Contents
Aider is a CLI coding agent that pair-programs with you from a terminal. Point it at hosted Claude or GPT and it works on day one. Point it at a 14B local model on a single H100 and the picture changes: 32K context instead of 200K, shakier tool calling, fast on small diffs and brutally slow on multi-file refactors, and the repo never leaves the machine. The local-model setup is what makes Aider attractive to teams that can’t send code through a hosted API.
“Aider plus a local model” is a configuration, not a product. You have to decide which model handles which turn, which turns spill out to a hosted backup, how observability survives when half the traffic never hits a managed dashboard, and what stops the small local model from confidently writing the wrong patch. A gateway sits between Aider and the mix of ollama serve, vllm serve, llama-server, and api.anthropic.com that backs it. It turns the configuration into a workflow.
This post scores the five gateways usable for Aider plus local models in May 2026. Only one turns the local-vs-hosted traces into a feedback loop that gets the routing decision right more often each week.
TL;DR
Future AGI Agent Command Center is the strongest pick for an AI gateway in front of Aider with local models because it exposes Ollama, vLLM, llama.cpp, and LM Studio alongside Anthropic, Bedrock, and Vertex behind one OpenAI-compatible OPENAI_API_BASE, with deterministic hosted-spillover triggers (GPU OOM, context overflow, latency-budget breach), GPU-aware health checks against each local replica, per-developer virtual keys, and OpenTelemetry-native cost telemetry on local-and-hosted traces in the same dashboard. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Local-plus-hosted routing under one base URL, deterministic spillover triggers, GPU-aware health checks, and unified cost telemetry.
- LiteLLM — Best self-hosted Python proxy that fronts Ollama, vLLM, llama.cpp under one OpenAI URL. Python-native, source-available, every local backend has a working adapter; pin commits after the March 24, 2026 PyPI compromise.
- Portkey — Best when the local model is the primary and you only need a clean spillover to hosted. Hosted gateway with virtual keys (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- vLLM with a proxy front — Best raw throughput on a single GPU, paired with a minimal compatibility shim. GPU-native serving for the lowest steady-state latency.
- Kong AI Gateway — Best if you already run Kong and want the AI-specific policies inside the same control plane. API-gateway-grade plugin stack on top of your existing platform.
Why Aider with local models needs a gateway
Aider speaks the OpenAI chat-completions API and accepts OPENAI_API_BASE set to Ollama, vLLM, or any OpenAI-compatible endpoint. That config works for one developer, one model, one machine. It doesn’t survive the second engineer, the second model, or the first time the GPU is saturated.
Three properties make this a routing problem.
-
Local models aren’t interchangeable with hosted ones. A
qwen2.5-coder-14bat 4-bit quantization runs at roughly 60-90 tokens per second on a single H100 SXM with a 32K context window.claude-sonnet-4-6runs at hundreds of tokens per second under load with 200K context. Not substitutable on hard turns. The gateway has to know which turn goes where. -
Local inference fails differently from hosted inference. Hosted returns a 429 you can retry. Local returns a CUDA OOM, a model-not-loaded error, a context-overflow truncation, or a process death because someone kicked off a fine-tune on the same node. The gateway has to translate those into a clean fallback, not propagate the GPU error to Aider’s terminal.
-
The point of running local is that the prompt never leaves the box. If the gateway ships traces to a hosted observability backend, the data-flow argument collapses. Either the gateway speaks to a self-hosted sink, or the local-model story doesn’t hold.
For the rest of this post, “Aider plus local models” means Aider with OPENAI_API_BASE pointing at a gateway, the gateway fronting one or more local backends (Ollama, vLLM, llama.cpp, LM Studio), and an optional hosted fallback.
The 7 axes we score on
Generic “best AI gateway” axes are too coarse for the local-model variant. We scored each pick on seven axes specific to Aider plus local models.
| Axis | What it measures |
|---|---|
| 1. Local backend adapter coverage | Does the gateway have working adapters for Ollama, vLLM, llama.cpp, and LM Studio without per-team Python? |
| 2. Hosted spillover policy | Can it deterministically fall back to a hosted model on GPU OOM, context overflow, or context-too-large, without breaking the Aider session? |
| 3. Turn-routing by complexity | Can it route easy turns (small diffs, single-file edits) to the local model and hard turns to a hosted one, by a rule a non-ML engineer can read? |
| 4. Streaming + tool-call fidelity | Aider streams tokens and (with --auto-commits, --lint) issues tool-shaped calls; do those pass through without buffering or re-serialization? |
| 5. Local-only observability sink | Can traces stay inside the perimeter, in a self-hosted Phoenix / Loki / ClickHouse / Postgres? |
| 6. GPU-aware health checks | Does the gateway distinguish a stuck Ollama process from a slow one, and route around the dead replica? |
| 7. Self-improving loop | Do captured traces drive next-week’s routing and prompts, or do they just sit in a dashboard? |
The verdict line at the end of each pick scores all seven.
How we picked
We started from the public AI gateways that ship an OpenAI-compatible endpoint and document a local-backend adapter as of May 2026. We removed gateways without an adapter for at least Ollama and vLLM. We removed gateways that broke tool calls on translation. We removed two consumer-facing model directories whose self-host story isn’t real. The remaining five are below.
A note on the 2026 trust cohort: LiteLLM had a PyPI supply-chain compromise on versions 1.82.7 and 1.82.8 (March 24, 2026), remediated past 1.83.7. Portkey is mid-acquisition by Palo Alto Networks (announced April 30, 2026). Both are still in this list, both still production-ready. But the procurement-independence question is now real.
1. Future AGI Agent Command Center: Best for local-plus-hosted Aider routing
Verdict: Future AGI exposes Ollama, vLLM, llama.cpp, and LM Studio alongside Anthropic, Bedrock, and Vertex behind one OpenAI-compatible OPENAI_API_BASE, with deterministic hosted-spillover triggers (GPU OOM, context overflow, latency-budget breach), per-developer virtual keys, cross-developer cache, and GPU-aware health checks against each local replica. Per-turn cost and quality sit in the same OpenTelemetry span tree, so finance and engineering both read off the same local-and-hosted dashboard rather than reconciling two systems.
What it does for Aider with local models:
-
Local backend adapter coverage. Native adapters for Ollama, vLLM, llama.cpp, and LM Studio. Each normalizes the OpenAI tool-call JSON so Aider’s
apply_patchand shell-exec survive the local-to-hosted hop. Config entry, not a wrapper. -
Hosted spillover with three deterministic triggers: GPU OOM, context overflow above the local window, and a latency-budget breach (default P95 over 8 seconds for turns under 4K input). Default targets are
claude-sonnet-4-6for OOM and overflow,claude-haiku-4-5for latency. -
Turn-routing by complexity via a YAML route policy that reads the request body. Default: under 8K input tokens to
qwen2.5-coder-14b, 8K to 32K toclaude-sonnet-4-6, over 32K toclaude-opus-4-7. The optimizer rewrites it weekly from eval scores. -
Streaming + tool-call fidelity. SSE pass-through on both legs; tool-use JSON is parsed and re-emitted, not re-serialized as text.
-
Local-only observability sink via Agent Command Center BYOC plus the Apache 2.0
traceAIlibrary. Traces, evals, spans stay in your VPC. -
GPU-aware health checks via periodic dummy completions against each Ollama and vLLM replica. Stuck processes get pulled in under 30 seconds at defaults.
-
Self-improving loop. Every Aider turn becomes a
traceAIspan (50+ AI surfaces across Python, TypeScript, Java, and C# (including Spring Boot starter, Spring AI, LangChain4j, Semantic Kernel), OpenInference-native), gets scored byfi.evals, low-scoring turns get clustered, andfi.opt.optimizers(six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the prompt or the routing rule. Error Feed (the part of the eval stack, the clustering and what-to-fix layer that feeds the self-improving evaluators) sits alongside as the zero-config error monitor: auto-clusters related per-model failures into named issues (50 traces → 1 issue), auto-writes the root cause plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so emerging local-model regressions surface like exceptions. Typical week-one discovery: the local model is handling turns where its failure rate is 4x the hosted model, but the rule was sending them local on token count alone. The optimizer adds a language and task-type heuristic and the cluster-failure rate drops.
Where it falls short:
-
agent-opt is opt-in, start with traceAI + ai-evaluation for one-week pilots and turn the optimizer on once eval baselines stabilize. If the goal is just “front Ollama with one URL,” LiteLLM is a smaller surface area.
-
The Protect guardrail layer is gated behind the enterprise tier; the free tier exposes routing and traces but not realtime guardrails (Protect runs at ~65 ms text latency per arXiv 2510.13351).
Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II and a BAA. AWS Marketplace listing for procurement.
Score: 7/7 axes.
2. LiteLLM: Best for self-hosted Python proxy fronting Ollama, vLLM, and llama.cpp
Verdict: LiteLLM is the gateway most teams reach for first when “Aider plus local models” is the workload. It runs as a Python proxy inside your VPC, has adapters for every local backend that matters, and exposes the unified OpenAI URL Aider expects. The strongest fit for the day-one configuration. It doesn’t optimize back, the dashboard slicing is shallow, and the March 2026 supply-chain incident is now part of the procurement story.
What it does for Aider with local models:
-
Local backend adapter coverage for Ollama, vLLM, llama.cpp, LM Studio, Together, HuggingFace TGI, and Replicate. Each maps to a
modelentry in the proxy config. -
Hosted spillover via the
fallbacksconfig, declare an ordered list (qwen2.5-coder-14b → claude-sonnet-4-6 → claude-opus-4-7) and LiteLLM walks it on configured error types (context-overflow, 429, 5xx). -
Turn-routing by complexity via the
Routerclass. A custom routing function in Python inspects the request body and picks the model; “small turns local, big turns hosted” is ten lines. -
Streaming + tool-call fidelity confirmed for Aider’s diff-edit and shell-exec on local backends. Tool calls translate correctly between Anthropic and OpenAI shapes.
-
Local-only observability sink via built-in Postgres logging plus optional OTel exporter. Wire Phoenix, Langfuse self-hosted, or Future AGI’s BYOC
traceAIbehind LiteLLM and traces stay in the perimeter. -
GPU-aware health checks via the proxy’s health-check endpoint. Request-roundtrip check, not a GPU-utilization read.
-
Self-improving loop. Not built in.
Where it falls short:
- The March 24, 2026 supply-chain compromise on PyPI versions
1.82.7and1.82.8(remediated past1.83.7) is the procurement conversation now. Pin the version, vendor the dependency, scan with Sigstore. - The UI is functional, not polished. Slicing by developer or repo means a SQL dashboard.
- The Router policy is Python, fine for ML engineers, friction for a platform team that wants YAML.
- No native optimizer; the gateway is as smart as the last commit a human made.
Pricing: Open source under MIT. LiteLLM also sells an Enterprise tier with SLA + SSO + audit; starts around $250/month for small teams.
Score: 5.5/7 axes (missing: native polished dashboard, optimizer).
3. Portkey: Best for hosted gateway with virtual keys when local is primary and hosted is the fallback
Verdict: Portkey is the pick when the local model handles 80%+ of Aider turns and the gateway’s main job is to clean up the spillover. The local-only-observability angle is weaker than LiteLLM or Future AGI BYOC. Portkey’s traces live in the hosted environment by default. But the BYOC option closes most of that gap. The Palo Alto Networks acquisition (announced April 30, 2026) is the new procurement context.
What it does for Aider with local models:
-
Local backend adapter coverage. 250+ provider integrations advertised; local-backends include Ollama, vLLM, and llama.cpp via
custom_host. Register the Ollama URL as a Portkey provider, route from a virtual key. -
Hosted spillover via
fallbackandloadbalanceconfigs in declarative YAML / JSON. -
Turn-routing by complexity via conditional-routing config matching request-body. Config, not code.
-
Streaming + tool-call fidelity confirmed with Aider on
claude-sonnet-4-6and Ollamaqwen2.5-coder-14b. SSE solid; gRPC roadmap. -
Local-only observability sink. Default is Portkey’s hosted dashboard, wrong fit if “no prompt leaves the box” is the rule. BYOC deploys the gateway in your VPC with traces in your own stack.
-
GPU-aware health checks via request-roundtrip provider pings, pulls a dead Ollama replica, not a slow one.
-
Self-improving loop. Not built in.
Where it falls short:
- Default hosted-trace path is the wrong fit for the strict-local case. Inside-perimeter traces require BYOC, which is enterprise-tier.
- No optimizer.
- Palo Alto Networks acquisition (announced April 30, 2026, close expected in PANW fiscal Q4) is the new procurement question for shops that prefer vendor independence.
Pricing: Free tier with 10K requests/day. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II.
Score: 6/7 axes (missing: feedback loop / optimization).
4. vLLM with a proxy front: Best for GPU-native serving with a thin OpenAI-compatibility shim
Verdict: vLLM isn’t a gateway on its own. It’s a GPU-native LLM server with the highest throughput per H100 of any open-source runtime as of May 2026, published benchmarks show roughly 2 to 4x the throughput of llama.cpp on the same hardware for a 14B-class model. Pair vLLM with a thin proxy (LiteLLM, an Envoy filter, or a 200-line FastAPI shim) and you get a gateway for the team whose priority is “serve the local model as fast as possible and route on top.”
What it does for Aider with local models:
-
Local backend adapter. vLLM serves OpenAI-compatible chat-completions and Responses API natively via
vllm serve <model>. Aider points at it directly. The “proxy front” (LiteLLM, Envoy, Kong) is what you add for routing and observability. -
Spillover and turn-routing are the proxy’s job. vLLM’s contribution is to make the local path fast enough that more turns stay local.
-
Streaming + tool-call fidelity. Excellent SSE. Tool calling on local models is model-dependent:
qwen2.5-coder-32b-instructandllama-3.3-70b-instructwork reliably; smallerqwen2.5-coder-14bis fine on simple schemas, flakier on multi-tool dispatch. -
Local-only observability sink. Prometheus metrics native, per-request latency, KV-cache hit rate, GPU utilization, queue depth, feed cleanly into self-hosted Grafana.
-
GPU-aware health checks. This is the pick where GPU-awareness is real. Prometheus exposes actual GPU memory, KV pressure, and queue depth, so a proxy in front can load-shed on signal rather than a heartbeat.
-
Self-improving loop. Not built in. The loop is whatever you build on top.
Where it falls short:
- “vLLM with a proxy front” is two products held together by your platform team. If the proxy goes down, vLLM is exposed; if vLLM goes down, the proxy has to spillover correctly.
- Local-model tool-calling is still model-quality dependent. Run a tool-use eval on your specific model before production.
- The OpenAI-compatibility surface is narrower than a real gateway’s. Edge cases (logprobs, response_format, beta features) are sometimes missing or behind a flag.
- No native multi-provider translation. Claude routing needs a real gateway underneath; vLLM alone won’t do it.
Pricing: vLLM is Apache 2.0. The proxy front is whatever you choose (LiteLLM MIT, Kong open-source / enterprise). Hardware is the meaningful line item.
Score: 4.5/7 axes (missing: native multi-provider, native optimizer, native turn-routing).
5. Kong AI Gateway: Best for plugin-stack control plane on top of your existing Kong
Verdict: Kong AI Gateway is the pick when the platform team already runs Kong for the rest of the company’s APIs and the path of least resistance is to extend that stack with the AI Proxy plugin. Strengths: plugin ecosystem, operational familiarity, single control plane. Weaknesses: AI-specific shallowness, the local-backend adapters are improving in 3.6+ but less mature than LiteLLM’s, and most observability and routing is plugin composition rather than first-class.
What it does for Aider with local models:
-
Local backend adapter coverage via the AI Proxy plugin’s custom-LLM provider option. Kong 3.6+ documents Ollama and OpenAI-compatible custom endpoints; vLLM and llama.cpp work through the OpenAI path. Functional, not breadth-first.
-
Hosted spillover and turn-routing via the AI Proxy fallback config plus request-transformer plugin, plus expression-based routing on body fields. “Local first, hosted on error” is a multi-plugin composition, half a day for a platform engineer.
-
Streaming + tool-call fidelity supported in 3.6+. SSE works; tool-call JSON survives.
-
Local-only observability sink through Kong’s plugin ecosystem. OpenTelemetry to a self-hosted collector, Prometheus to your scraper, Splunk / Datadog / Loki for the log line.
-
GPU-aware health checks through Kong’s upstream health-check, a heartbeat, not a GPU-load read. A custom Lua plugin can ping vLLM’s
/metrics, but that’s code you write. -
Self-improving loop. Not built in.
Where it falls short:
- AI-specific features lag dedicated AI gateways by typically two quarters.
- The “five plugins glued together” pattern is fine for an existing Kong team and miserable for a team setting up a control plane from scratch for one workload.
- No optimizer.
- The AI Proxy plugin’s local-backend adapters are documented but not as battle-tested as LiteLLM’s; expect upstream bugs in the first month.
Pricing: Kong is open source. Kong Konnect (managed) starts free. Enterprise plans for SLA, plugins, and support start around $1.5K/month.
Score: 4.5/7 axes (missing: native AI observability depth, optimizer, mature local-backend adapters).
Capability matrix
| Axis | Future AGI | LiteLLM | Portkey | vLLM + proxy | Kong AI Gateway |
|---|---|---|---|---|---|
| Local backend adapter coverage | Ollama, vLLM, llama.cpp, LM Studio | Ollama, vLLM, llama.cpp, LM Studio, TGI | Ollama, vLLM, llama.cpp (custom_host) | vLLM only; proxy adds rest | Ollama + OpenAI-compatible custom (3.6+) |
| Hosted spillover policy | Three triggers, declarative YAML | Fallbacks list, declarative | Fallback config, declarative | Proxy’s job | Plugin composition |
| Turn-routing by complexity | YAML route policy, optimizer-tuned | Python Router class | Conditional-routing config | Proxy’s job | Expression-based routing |
| Streaming + tool-call fidelity | Yes | Yes | Yes | Yes (vLLM 0.10+) | Yes (3.6+) |
| Local-only observability sink | BYOC + traceAI Apache 2.0 | Postgres + OTel | BYOC tier | Prometheus + your stack | Plugins to your sink |
| GPU-aware health checks | Heartbeat + latency budget | Heartbeat | Heartbeat | Prometheus metrics native | Heartbeat + custom Lua |
| Self-improving loop | fi.opt + fi.evals + traceAI | None | None | None | None |
Decision framework: Choose X if
Choose Future AGI if the goal is “Aider plus local models gets better at the routing decision every week without a human in the loop.” Pick this when the cost-quality curve is the metric leadership cares about. The loop is the wedge; BYOC is what makes it acceptable to security.
Choose LiteLLM if the goal is “front Ollama, vLLM, and llama.cpp with one OpenAI URL inside the VPC, with deterministic fallback to hosted, and the team is comfortable in Python.” Pin past 1.83.7+, vendor the dependency, ship. This is where most teams that get to production start.
Choose Portkey if the local model handles most turns and the gateway’s main job is to clean up the hosted spillover with mature virtual keys and observability. Pick the BYOC tier if “no prompt leaves the box” is non-negotiable.
Choose vLLM with a proxy front if the GPU utilization story dominates the routing story, every percentage point of better throughput means more turns stay local. Pair with LiteLLM for routing, Future AGI traceAI for observability, and own the integration.
Choose Kong AI Gateway if Kong is already the control plane for the rest of the company’s APIs and the platform team would rather extend a known stack than introduce a new one.
Common mistakes when wiring Aider through a gateway for local models
| Mistake | What goes wrong | Fix |
|---|---|---|
| Pointing Aider only at the local model with no fallback | A 32K-context overflow on a multi-file refactor truncates input; Aider commits the wrong diff | Always configure a hosted fallback for context overflow |
--auto-commits with a flaky-tool-call local model | Aider commits hallucinated function calls | Use --no-auto-commits until the local model passes a tool-use eval on your repo |
| Routing to local by token count alone | A 500-token rename across 12 files fails where a 500-token bug-fix succeeds | Add a language and task-type heuristic; let the optimizer learn the rest |
| One Ollama replica shared across developers | One long context overflows the KV cache and stalls everyone | Run multiple replicas behind the gateway and load-balance |
| Hosted observability while running local models | Defeats the point | Use a self-hosted sink: traceAI, Phoenix, Langfuse self-hosted, Postgres |
Stuck on LiteLLM 1.82.7 / 1.82.8 | Known-bad version | Pin past 1.83.7, vendor the dep, scan with Sigstore |
| Treating vLLM throughput numbers as a routing signal | Throughput is per-batch; tail latency is what Aider’s UX feels | Add a P95 latency budget to the routing rule |
How Future AGI closes the loop on Aider with local models
The other four gateways treat local-vs-hosted routing as a one-time configuration: declare the rule, ship, hope it ages. Future AGI treats it as the input to a feedback loop. Six stages:
-
Trace. Every Aider turn, local or hosted, produces a span tree via
traceAI(Apache 2.0). Spans capture inputs, outputs, tool calls, model used, latency, GPU replica, and the file paths Aider was operating on. Local traces stay inside the perimeter when running BYOC. -
Evaluate.
fi.evalsscores every turn against task-completion, faithfulness, and code-correctness rubrics. Wire your CI’s unit-test-pass-rate as an additional signal, it’s the single most predictive eval for an Aider workload. -
Cluster. Low-scoring sessions get clustered by failure mode. Two common week-one patterns: “local model is consistently failing tool-call dispatch on the same kind of refactor,” and “the routing rule is sending high-context turns to the local model and the context is silently truncating.”
-
Optimize.
fi.opt.optimizers(six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the system prompt or the routing policy against the clustered failures. Typical edits: route diffs spanning more than three files toclaude-sonnet-4-6, route Python turns local and Rust turns hosted, route turns after a tool-call failure to hosted. -
Route. The gateway applies the updated policy on the next request. The local-replica health check and the optimizer-tuned rule cooperate, the gateway won’t route to a stuck replica even if the rule says it should.
-
Re-deploy. New prompt and route are versioned. Roll forward; on eval regression, automatic rollback.
Net effect: a team starting with Aider plus a local 14B-coder typically sees the “wrong-routing” rate drop from roughly one in four turns to one in 12 within four weeks, without any developer changing their workflow.
The three building blocks are open source:
traceAI, github.com/future-agi/traceAI (Apache 2.0)ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
The hosted Agent Command Center adds the failure-cluster view, the Future AGI Protect model family as the inline guardrail layer at ~65 ms p50 text and ~107 ms p50 image (arXiv 2510.13351) (FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain) plus RBAC, SOC 2 Type II certified, and AWS Marketplace for procurement.
What we did not include
We deliberately left out three options that show up in other 2026 listicles:
- Helicone. Strong observability layer but the local-backend story is shallower than LiteLLM’s, and routing is round-robin / failover rather than complexity-aware.
- OpenRouter. Consumer-facing model directory; no local-model story.
- Cloudflare AI Gateway. Strong edge-deployment story but the local-backend adapter path requires a Workers-side custom backend, which is workable but not a published feature.
All three are worth a second look in Q3 2026.
Related reading
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- Best 5 AI Gateways to Route Codex CLI to Any Model in 2026
- Best Open Source AI Gateways in 2026
- What Is an AI Gateway? The 2026 Definition
Sources
- Aider documentation, aider.chat
- Aider repository, github.com/Aider-AI/aider
- Ollama, ollama.com
- vLLM, github.com/vllm-project/vllm
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- LiteLLM proxy, github.com/BerriAI/litellm
- Portkey AI gateway, portkey.ai
- Kong AI Gateway, konghq.com/products/kong-ai-gateway
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
Frequently asked questions
Why run Aider with a local model in the first place?
Does Aider support OpenAI-compatible endpoints?
Can I route Aider through multiple local models on the same machine?
How do I track cost per developer when the model is local?
What happens to Aider's tool calls when the gateway routes to a local model?
Is it safe to send source code through an AI gateway?
How is Future AGI Agent Command Center different from LiteLLM for Aider with local models?
LLM security is four layers — input, output, retrieval, tool-call. Defenders that secure all four ship reliably; defenders that secure only the input layer lose to anything beyond a hello-world attack.
Agent rollout is a four-stage gate: shadow, canary, percentage, full. Each stage has a different eval question. Skipping one ships a production incident.
Helpful and harmless trade. Labs that pretend otherwise are training to a benchmark, not a behavior. A practitioner's reading of the alignment paradox in mid-2026.