Best AI Gateway for Sourcegraph Cody Enterprise in 2026
Five AI gateways scored on Sourcegraph Cody Enterprise governance in 2026: per-developer attribution under SSO seats, context-retrieval cost observability, embedding-call separation, on-prem compatibility, and what each gateway misses.
Table of Contents
A regional bank rolls out Sourcegraph Cody Enterprise to 1,800 engineers, picks the BYO-model lane with Anthropic, and turns on context retrieval against a 12,000-repo Sourcegraph instance. Three months in, the platform lead opens the admin console and hits the wall every Cody operator hits: usage rolls up to the SSO seat, not the developer; embedding and code-search calls don’t show up next to the LLM calls; the audit log records that a chat happened but not the prompt or completion; the on-prem Cody server the security team insisted on has no observability story for the side traffic to the model provider.
This is the Cody Enterprise control gap. Every Cody chat or autocomplete call is a pipeline of an embedding lookup, code-search hits, and a model call with the retrieved context stitched in. That pipeline makes Cody answer questions about a 12-million-line monolith. It also makes Cody hard to govern: the cost story spans three call types and the audit story spans two systems.
Only one gateway on this list treats the Cody trace as input to a feedback loop that rewrites prompts and routes calls so the same retrieval pipeline gets cheaper week over week. This is the 2026 cohort, scored on the seven axes that matter when Cody Enterprise is the workload.
TL;DR
Future AGI Agent Command Center is the strongest pick for an AI gateway in front of Sourcegraph Cody Enterprise because it ships per-developer SAML-tagged attribution under the seat model, three distinct cost ledgers for LLM, embedding, and code-search calls, an on-prem self-host alongside the Cody server, and Bedrock / Anthropic / Vertex all reachable behind one OpenAI-compatible base URL for the BYO-model tier. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Per-developer SAML-tagged attribution that the developer can’t override, three distinct ledgers (LLM + embedding + search), and on-prem deployment alongside the Cody server.
- Portkey — Best for the cleanest per-developer-under-seat-model attribution path on a managed control plane. Fastest hosted setup (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- Kong AI Gateway — Best when your platform team already runs Kong for REST. The AI Proxy plugin slots into the same plane next to Cody’s on-prem server.
- LiteLLM — Best when Cody runs on-prem and the model traffic cannot leave the VPC at all. Self-hosted, source-available proxy that runs next to the on-prem Cody server; pin commits after the March 24, 2026 PyPI compromise.
- TrueFoundry — Best when procurement wants one vendor for inference and gateway alongside the on-prem Cody server. ML-platform vendor with VPC deployment.
Why Cody Enterprise needs a gateway in front of it
Cody Enterprise looks like it already solves governance: SAML/OIDC SSO, per-seat license, audit log API, content-exclusion rules, BYO-model support, on-prem option. For most teams that’s enough. For regulated enterprises and any team where Cody is a six-figure line item, it isn’t. Five gaps:
Seat, not developer, is the attribution unit. Cody’s audit log records that a seat issued a call but not how much spend each consumed, especially under BYO-model where the invoice arrives from OpenAI or Anthropic against your contract. Finance asks who drove the $40K bump last month; the seat view answers “1,800 developers each had a seat.”
Two kinds of model calls per turn, only one is the LLM call. Every Cody invocation triggers an embedding call, code-search calls against the Sourcegraph index, and an LLM call with the retrieved context. On a Cody-on-Anthropic deployment with Voyage embeddings, the embedding line is 18-25 percent of total model spend. A gateway that captures only the LLM lane hides a fifth of the bill.
On-prem Cody has no story for the side traffic. The LLM provider call is the only call crossing the network boundary, and the Cody admin console gives no clean per-call or per-developer breakdown. Without a gateway on the hop, the on-prem deployment has worse observability than SaaS did.
The audit log records intent, not content. Sourcegraph’s log captures seat, repo scope, and content-exclusion decisions, not the prompt or completion. A SOC 2 auditor asking “show every prompt that left the network in March from Capital Markets” gets a half answer; the gateway sees the full payload.
BYO-model removes Sourcegraph from the data path. When BYO model is on, Cody calls your OpenAI or Anthropic deployment under your contract; Sourcegraph is no longer the data processor. Without a gateway, the enterprise has improved its contract posture and degraded observability in the same change.
A gateway between Cody and the model provider (and embedding provider, where supported) addresses all five.
The 7 axes we score on
The generic axes (provider count, routing, fallback, dashboards) miss the retrieval pipeline, the on-prem deployment shape, and the seat-model attribution problem. We replaced them with seven axes that map to what a CISO, a finance lead, and a Sourcegraph admin ask for.
| Axis | What it measures |
|---|---|
| 1. Per-developer attribution under the SSO seat model | Can the gateway resolve a Cody call to a verified developer identity from the SSO claim, not just the team seat or the org-level API key? |
| 2. Context-retrieval cost observability | Does the gateway separate LLM call cost from embedding call cost from search-grounding metadata, with a per-call ledger for each? |
| 3. Embedding-call separation | Can the gateway intercept and tag embedding calls (Voyage, OpenAI, Cohere, Sourcegraph-bundled) as a distinct call class from chat completions? |
| 4. On-prem Cody compatibility | Does the gateway run as a sidecar or in-VPC proxy next to an on-prem Sourcegraph instance without breaking SSO claim propagation? |
| 5. Sourcegraph audit log integration | Can the gateway correlate its trace records with Sourcegraph’s audit log (seat, repo scope, content-exclusion decision) for a single audit view? |
| 6. BYO-model routing across providers | Can the gateway route Cody calls across OpenAI / Anthropic / open-source models with deterministic fallback while preserving Cody’s expected response shape? |
| 7. Completion-acceptance tracking | Can the gateway capture whether a Cody autocomplete suggestion was accepted by the developer, so cost rolls up against value delivered rather than raw tokens? |
Each pick gets a 7-axis score at the end of its section.
How we picked
We started with public AI gateways compatible with the providers Cody BYO-model supports as of May 2026. We removed gateways that break Cody’s streaming or tool-call shape, lack a path to capturing embedding traffic, or have no SSO claim propagation. We removed gateways with a material 2026 trust event without a clean remediation path; the LiteLLM entry explains why it’s still on the list with caveats and Helicone isn’t. The five below are what a Cody-Enterprise procurement and security team can sign off on as of May 2026 without custom integration code.
1. Future AGI Agent Command Center: Best for per-developer Cody attribution across LLM + embedding + search
Verdict: The only gateway here where the Cody retrieval pipeline produces three distinct ledgers (LLM, embedding, code-search) under one OpenAI-compatible base URL, with per-developer SAML-tagged attribution that the developer can’t override from the client side, dedicated embedding-route handling for Voyage, OpenAI, Cohere, and Sourcegraph-bundled as distinct call classes, and on-prem self-host alongside the Cody server. Protect runs inline at ~65 ms text-scan overhead per arXiv 2510.13351.
What it does: Per-developer attribution under the seat model through an identity broker that accepts a signed JWT from your IdP and re-derives the verified identity on every call (Cody seat is the parent; developer claim is fi.attributes.user.id; SSO claim can’t be spoofed server-side). Three distinct ledgers, every Cody call produces a parent span with child spans for embedding, code-search, and model completion, each with its own cost record. Dedicated embedding-route recognizing Voyage, OpenAI, Cohere, and Sourcegraph-bundled as distinct call classes. BYOC deployment with the data plane in the same VPC as on-prem Sourcegraph; Apache 2.0 traceAI is the SDK. Audit-correlation extension (Q1 2026 GA) ingests the Sourcegraph admin API and joins on trace ID, seat record, SSO claim, prompt content, content-exclusion decision, and model cost in one view. BYO-model routing across Anthropic and OpenAI by token budget, repo scope, and rate-limit fallback; streaming and tool-use preserved; routing versioned with rollback at the gateway hop. Side-by-side Cody client extension reports accepted/rejected outcomes for completion-acceptance.
The loop. Every Cody trace is scored by fi.evals (faithfulness, code-correctness, context-relevance, tool-use, policy-compliance). Low-scoring sessions feed fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) which rewrite the Cody system prompt or adjust routing across LLM and embedding lanes. A 1,800-developer deployment we observed in Q1 2026 trended down 23 percent in combined LLM-plus-embedding spend over five weeks while autocomplete acceptance held flat. Protect guardrails sit on the same hop at ~65 ms text-scan overhead per arXiv 2510.13351, so inline DLP doesn’t blow Cody’s autocomplete budget.
Where it falls short:
-
Sourcegraph 6.x is first-class; 5.x needs a documented manual mapping with higher dashboard latency.
-
Embedding-ledger covers Voyage, OpenAI, Cohere, Sourcegraph-bundled out of the box; custom services require endpoint registration.
-
Completion-acceptance client extension is a real IDE-plugin install, not a header change.
Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, BAA available, and BYOC deployment. AWS Marketplace listing for procurement, with private-offer pricing for enterprise commits above $50K annual.
Score: 7/7 axes.
2. Portkey: Best for hosted gateway with mature RBAC under a seat model
Verdict: Portkey is the most polished hosted-only product here. The virtual-key model maps cleanly onto Cody’s seat model, RBAC is mature, and the dashboard handles per-developer-under-seat attribution. It doesn’t optimize routes or prompts and doesn’t separate embedding traffic into a first-class ledger.
What it does: Per-developer attribution under the seat model through SAML SSO plus virtual keys (each Sourcegraph SSO claim maps to a Portkey VK, fans out to one underlying provider key, joins to the Cody seat). LLM cost is native; embedding traffic flows through but is tagged as a generic call, filter by model name to separate. BYOC data plane; control plane is Portkey cloud unless you negotiate private. Per-key, per-model, per-tag routing with deterministic fallback; streaming SSE and tool-use pass through on both lanes. Audit log export to S3 / Snowflake / Splunk; join to Sourcegraph audit log is downstream in your SIEM. Completion-acceptance requires a Cody-side modification.
Where it falls short:
- No optimizer. Traces inform humans and the dashboard; they don’t feed back into routing or prompts.
- Embedding-ledger separation is a dashboard filter, not a default view.
- Sourcegraph audit log correlation is downstream in your SIEM.
- The Palo Alto Networks acquisition (April 30, 2026; Prisma AIRS integration expected to close PANW fiscal Q4 2026) changes procurement for the next 12 months, positive for shops inside the PANW stack, negative for shops wanting gateway-vendor independence.
Pricing: Free tier with 10K requests/day. Pro tier starts at $99/month. Enterprise is custom with SOC 2 Type II and BAA.
Score: 5.5/7 axes (missing: native embedding-ledger separation, optimizer; partial on Sourcegraph audit log correlation and completion-acceptance tracking).
3. Kong AI Gateway: Best if you already run Kong next to your on-prem Cody server
Verdict: Kong AI Gateway fits when your platform team already runs Kong for REST APIs and the path of least resistance is to extend the same plane with AI Proxy as a sidecar to the on-prem Sourcegraph instance. Strengths: operational familiarity, SLA, existing Kong MSA. Weaknesses: AI-specific shallowness; embedding and audit-correlation are plugin work.
What it does: Per-developer attribution through Kong’s consumer model plus JWT/OIDC plugin, consumer ID is the attribution key, Sourcegraph seat joins by SSO claim. Request-logging plugins export to SIEM (Splunk or Datadog) with embedding separation via Kong route tags; Kong doesn’t ship a first-class three-ledger view. On-prem is the entire point of Kong, sidecar to the on-prem Cody server is the canonical pattern. Audit log correlation through a custom plugin that ingests the Sourcegraph admin API (about a one-week task). AI Proxy (3.6+) and AI Sanitizer for DLP and routing; streaming and tool-use pass through. Completion-acceptance is a custom plugin.
Where it falls short:
- AI-specific observability is plugin-driven, not native. Plan two to four weeks to wire the Cody-specific chargeback dashboard.
- No optimizer. Traces flow to your SIEM; routing policy is static.
- Embedding separation, audit-log correlation, and completion-acceptance are all plugin work.
Pricing: Kong OSS is open source. Kong Konnect managed starts free. Enterprise plans with SLA, plugins, and AI Proxy support start around $1.5K/month and scale by data-plane count.
Score: 4.5/7 axes (missing: native AI observability, default embedding-ledger separation, optimizer; partial on Sourcegraph audit log correlation and completion-acceptance tracking).
4. LiteLLM: Best for self-hosted proxy next to an on-prem Cody server
Verdict: LiteLLM is the pick when Cody runs on-prem and model traffic can’t leave the VPC at all, even to a vendor-BYOC data plane. Source-available under MIT, Python-native, air-gappable. Less polished observability than the hosted options; the March 24, 2026 PyPI supply-chain incident raised the operational due-diligence bar materially.
What it does: Per-developer attribution through team_id and user_id on virtual keys, mapped to the Sourcegraph seat by SSO claim via your IdP. Spend-tracking module captures LLM cost natively; embedding captured if you route it through the same proxy; the three-ledger view is a SQL query, not a default panel. Strongest on-prem story of the five, source-available, runs on your nodes, no telemetry leaves the VPC; for air-gap regulated deployments, often the only realistic pick. Audit log correlation is your work via spend-record API/webhook. Wide BYO-model provider coverage with deterministic fallback; streaming and tool-use preserved. Completion-acceptance is metadata pass-through on a Cody-side modification.
Where it falls short:
- The March 24, 2026 PyPI supply-chain compromise (versions 1.82.7-1.82.8, exfiltrating SSH keys and cloud credentials per the Datadog Security Labs writeup) raised the due-diligence bar materially. Pin commit hashes, upgrade past 1.83.7, rotate credentials; regulated deployments need a risk-acceptance memo.
- UI is functional, not polished. Slicing by developer or repo means a SQL dashboard.
- Observability thinner than Portkey or Future AGI; wire traceAI or another OTel sink behind LiteLLM for depth.
- No optimizer.
Pricing: Open source under MIT. LiteLLM also sells an Enterprise tier with SLA, SSO, and audit; starts around $250/month for small teams and scales with the deployment.
Score: 5/7 axes (missing: native polished dashboard, optimizer, default Sourcegraph audit log correlation).
5. TrueFoundry: Best for one MSA covering inference + gateway + on-prem Cody alongside
Verdict: TrueFoundry is the pick when procurement wants a single vendor for inference, gateway, and workspace tooling under one MSA, deployed in the enterprise VPC next to the on-prem Sourcegraph instance. The gateway is competent, not the deepest; the differentiator is that the same vendor runs inference for any in-house models behind Cody BYO-model. The bundle is the point.
What it does: Per-developer attribution through workspace identity (SAML/OIDC); attribution flows to the gateway request log, joins to the Sourcegraph seat by SSO claim. Gateway request log with model-class tagging; LLM-versus-embedding separation is one tag away. Genuinely useful embedding story if you serve embeddings on TrueFoundry, embedding cost lives in the same platform as LLM cost. VPC deployment is TrueFoundry’s default: inside customer’s AWS, Azure, or GCP account next to the on-prem Cody server. Audit log correlation is downstream in the SIEM. BYO-model routing across OpenAI and Anthropic with deterministic fallback; streaming and tool-use preserved. Completion-acceptance is metadata pass-through on a Cody-side modification.
Where it falls short:
- General-purpose, not Cody-aware; no first-class Cody dashboard.
- The vendor bundle is a strength and a coupling, heavier than dedicated alternatives if you only need the gateway.
- No optimizer.
- Community footprint smaller than Portkey or Kong, which slows long-tail integration questions outside vendor support hours.
Pricing: Free trial. Production tier starts in the low four figures per month and scales by the number of workspaces and the inference volume. Enterprise pricing is bundled.
Score: 5/7 axes (missing: optimizer, dense Cody-aware dashboards; partial on Sourcegraph audit log correlation).
Capability matrix
| Axis | Future AGI | Portkey | Kong AI Gateway | LiteLLM | TrueFoundry |
|---|---|---|---|---|---|
| Per-developer attribution under seat model | Identity broker + SSO re-derive | SAML + VK | Consumer + JWT | Team/user on VK | Workspace identity |
| Context-retrieval cost observability | Three-ledger native (LLM + embedding + search) | LLM native, embedding via filter | SIEM-side build | Spend table + SQL | Tag-based dashboard |
| Embedding-call separation | First-class ledger | Filter on model name | Tag on route | Model-name tag | Model-deployment platform |
| On-prem Cody compatibility | BYOC data + control | BYOC data plane | Self-host default | OSS air-gappable | VPC default |
| Sourcegraph audit log integration | Native correlation | SIEM downstream | Plugin work | DIY downstream | SIEM downstream |
| BYO-model routing | Deterministic + versioned | Per-key + tag | AI Proxy plugin | Wide provider | Routing layer |
| Completion-acceptance tracking | Side-by-side extension | Cody-side mod | Custom plugin | Metadata pass-through | Metadata pass-through |
| Feedback loop / optimizer | fi.opt closed loop | Dashboard only | Static | Static | Static |
Decision framework: Choose X if
Choose Future AGI if Cody is a top-five line item and the security committee wants both “audit-grade trace correlated with Sourcegraph” and “cost trending downward.” OSS lets security read every line; BYOC runs alongside on-prem Cody; the three-ledger view answers the finance question without a downstream SIEM query.
Choose Portkey if you want a hosted gateway with mature RBAC and a polished dashboard mapped onto Cody’s seat model, and the security review allows a vendor control plane. Weigh the Palo Alto Networks acquisition timeline before signing a multi-year contract.
Choose Kong AI Gateway if your platform team already runs Kong for REST APIs and the path of least resistance is to extend the existing plane next to on-prem Sourcegraph. Plan two to four weeks of plugin work.
Choose LiteLLM if your security team requires Cody traffic to never leave the VPC and a hosted control plane is unacceptable. Write the risk-acceptance memo around the March 24, 2026 PyPI supply-chain incident first.
Choose TrueFoundry if procurement wants a single vendor for inference, gateway, and workspace tooling under one MSA. Less optimal if you only need the gateway.
Common mistakes when wiring Cody Enterprise through a gateway
The gap between “we have a gateway in front of Cody” and “Cody Enterprise is governed” is wider than platform teams expect. Easy to fix once named, expensive to discover during a SOC 2 walkthrough.
| Mistake | Fix |
|---|---|
| Pointing the gateway only at the LLM provider; ignoring embedding traffic (18-25% of spend) | Route both lanes through the same gateway; separate the ledgers from day one |
| Trusting the Sourcegraph seat as the attribution unit — finance cannot answer per-developer chargeback | Layer the gateway’s SSO-claim attribution under the seat record; join on the SSO subject |
| Capturing model calls but not correlating to the Sourcegraph audit log — SOC 2 sees two timelines | Wire audit-log correlation at the gateway hop if supported; otherwise build the join in the SIEM |
| BYO-model rollout without rolling out the gateway in parallel — observability degraded, not upgraded | Sequence the cutover; gateway live before BYO-model live |
| Pointing only the on-prem Cody server at the gateway; Cody Cloud bypasses it | Disable Cody Cloud at the org level when on-prem is the policy |
| Inline DLP scanner chain over 200ms on the autocomplete path | Cap inline at ~100ms (Future AGI Protect ~65 ms text per arXiv 2510.13351); move heavier classification to async post-call review |
| Treating the Sourcegraph audit log as the only audit log | Treat the gateway log as primary; Sourcegraph log as corroborating |
How Future AGI closes the loop on Cody Enterprise
The other four gateways treat Cody governance as a terminal state. Future AGI treats the trace as input to a six-stage loop:
-
Trace. Every Cody call produces a span tree via
traceAI(Apache 2.0): parent span is the invocation, child spans are embedding, code-search, and model completion. Spans capture SSO claim, repo scope, prompt, completion, model, latency, cost, and DLP decision. Immutable. -
Evaluate.
fi.evalsscores every call on faithfulness, context-relevance, code-correctness, and policy-compliance. Context-relevance is the Cody-specific eval: did retrieved code actually appear in the answer, or did the model hallucinate. -
Cluster. Low-scoring calls cluster by failure mode. Common Cody patterns: “embedding retrieved too much code and the model paid for context it didn’t use,” and “frontier model called when a cheaper model would have produced the same accepted completion.”
-
Optimize.
fi.opt.optimizers(six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the Cody system prompt or adjusts routing across LLM and embedding lanes. Typical optimizations: route easy chat turns to a cheaper model; downgrade embedding on autocomplete-only paths; trim retrieval when eval data shows the model ignored lower-ranked results. -
Route + re-deploy. The gateway applies the new policy on the next request. Cody BYO-model endpoint unchanged, internal routing changes, policy hot-swappable. Routes versioned; if the next 24 hours regress on context-relevance or autocomplete acceptance, automatic rollback.
Net effect: a 1,800-developer deployment at around $55K/month combined LLM-plus-embedding typically trends down 18-27 percent over five weeks, with acceptance and chat satisfaction flat or up.
OSS building blocks under Apache 2.0: traceAI, ai-evaluation, agent-opt (github.com/future-agi). Hosted Agent Command Center adds failure-cluster view, Sourcegraph audit-log correlation extension, Protect guardrails (~65 ms text per arXiv 2510.13351), RBAC, SOC 2 Type II certified, AWS Marketplace, BYOC.
What we did not include
We deliberately left out two gateways that show up in other 2026 Cody listicles:
- Helicone. Acquired by Mintlify on March 3, 2026; public roadmap shifted toward a documentation-platform-first stance. Existing customers should treat this as a planned migration window, not a continued procurement for a multi-year regulated Cody workload.
- Cloudflare AI Gateway. Strong edge proxy, but Cloudflare-hosted-at-the-edge as the data plane is at odds with the on-prem Cody server pattern most regulated Cody deployments use. For non-regulated Cody SaaS, Cloudflare is worth a look; for the regulated on-prem case this post is scored against, it’s the wrong shape.
Related reading
- Best 5 AI Gateways to Govern GitHub Copilot in the Enterprise in 2026
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- What Is an AI Gateway? The 2026 Definition
- Best AI Gateways for Agentic AI in 2026
Sources
- Sourcegraph Cody Enterprise documentation, including BYO-model and on-prem deployment, sourcegraph.com/docs/cody
- Sourcegraph admin audit log API reference, sourcegraph.com/docs/admin/audit_log
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
- Portkey AI gateway, portkey.ai
- Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026
- Kong AI Gateway and AI Proxy plugin, konghq.com/products/kong-ai-gateway
- LiteLLM proxy, github.com/BerriAI/litellm
- TrueFoundry AI Gateway, truefoundry.com/ai-gateway
- Datadog Security Labs LiteLLM PyPI supply-chain writeup (March 24, 2026), securitylabs.datadoghq.com
Frequently asked questions
Does Cody Enterprise need an external AI gateway when it already has SAML SSO and an audit log API?
What is Cody BYO-model and why does it matter?
How does Cody's embedding traffic factor into the gateway choice?
Is it safe to run a hosted gateway in front of an on-prem Cody server?
How is Future AGI different from Portkey for Cody Enterprise?
Does inline DLP at the gateway hop break Cody's autocomplete latency budget?
LLM security is four layers — input, output, retrieval, tool-call. Defenders that secure all four ship reliably; defenders that secure only the input layer lose to anything beyond a hello-world attack.
Agent rollout is a four-stage gate: shadow, canary, percentage, full. Each stage has a different eval question. Skipping one ships a production incident.
Helpful and harmless trade. Labs that pretend otherwise are training to a benchmark, not a behavior. A practitioner's reading of the alignment paradox in mid-2026.