Guides

Best AI Gateway for Sourcegraph Cody Enterprise in 2026

Five AI gateways scored on Sourcegraph Cody Enterprise governance: per-developer attribution, context-retrieval cost, on-prem, embedding split.

January 12, 2026

19 min read

ai-gateway 2026

Table of Contents

A regional bank rolls out Sourcegraph Cody Enterprise to 1,800 engineers, picks the BYO-model lane with Anthropic, and turns on context retrieval against a 12,000-repo Sourcegraph instance. Three months in, the platform lead opens the admin console and hits the wall every Cody operator hits: usage rolls up to the SSO seat, not the developer; embedding and code-search calls don’t show up next to the LLM calls; the audit log records that a chat happened but not the prompt or completion; the on-prem Cody server the security team insisted on has no observability story for the side traffic to the model provider.

This is the Cody Enterprise control gap. Every Cody chat or autocomplete call is a pipeline of an embedding lookup, code-search hits, and a model call with the retrieved context stitched in. That pipeline makes Cody answer questions about a 12-million-line monolith. It also makes Cody hard to govern: the cost story spans three call types and the audit story spans two systems.

Only one gateway on this list treats the Cody trace as input to a feedback loop that rewrites prompts and routes calls so the same retrieval pipeline gets cheaper week over week. This is the 2026 cohort, scored on the seven axes that matter when Cody Enterprise is the workload.

TL;DR

Future AGI Agent Command Center is the strongest pick for an AI gateway in front of Sourcegraph Cody Enterprise because it ships per-developer SAML-tagged attribution under the seat model, three distinct cost ledgers for LLM, embedding, and code-search calls, an on-prem self-host alongside the Cody server, and Bedrock / Anthropic / Vertex all reachable behind one OpenAI-compatible base URL for the BYO-model tier. The other four picks below win on specific edges.

Future AGI Agent Command Center — Best overall. Per-developer SAML-tagged attribution that the developer can’t override, three distinct ledgers (LLM + embedding + search), and on-prem deployment alongside the Cody server.
Portkey — Best for the cleanest per-developer-under-seat-model attribution path on a managed control plane. Fastest hosted setup (verify the Palo Alto Networks acquisition timeline before signing multi-year).
Kong AI Gateway — Best when your platform team already runs Kong for REST. The AI Proxy plugin slots into the same plane next to Cody’s on-prem server.
LiteLLM — Best when Cody runs on-prem and the model traffic cannot leave the VPC at all. Self-hosted, source-available proxy that runs next to the on-prem Cody server; pin commits after the March 24, 2026 PyPI compromise.
TrueFoundry — Best when procurement wants one vendor for inference and gateway alongside the on-prem Cody server. ML-platform vendor with VPC deployment.

Why Cody Enterprise needs a gateway in front of it

Cody Enterprise looks like it already solves governance: SAML/OIDC SSO, per-seat license, audit log API, content-exclusion rules, BYO-model support, on-prem option. For most teams that’s enough. For regulated enterprises and any team where Cody is a six-figure line item, it isn’t. Five gaps:

Seat, not developer, is the attribution unit. Cody’s audit log records that a seat issued a call but not how much spend each consumed, especially under BYO-model where the invoice arrives from OpenAI or Anthropic against your contract. Finance asks who drove the $40K bump last month; the seat view answers “1,800 developers each had a seat.”

Two kinds of model calls per turn, only one is the LLM call. Every Cody invocation triggers an embedding call, code-search calls against the Sourcegraph index, and an LLM call with the retrieved context. On a Cody-on-Anthropic deployment with Voyage embeddings, the embedding line is 18-25 percent of total model spend. A gateway that captures only the LLM lane hides a fifth of the bill.

On-prem Cody has no story for the side traffic. The LLM provider call is the only call crossing the network boundary, and the Cody admin console gives no clean per-call or per-developer breakdown. Without a gateway on the hop, the on-prem deployment has worse observability than SaaS did.

The audit log records intent, not content. Sourcegraph’s log captures seat, repo scope, and content-exclusion decisions, not the prompt or completion. A SOC 2 auditor asking “show every prompt that left the network in March from Capital Markets” gets a half answer; the gateway sees the full payload.

BYO-model removes Sourcegraph from the data path. When BYO model is on, Cody calls your OpenAI or Anthropic deployment under your contract; Sourcegraph is no longer the data processor. Without a gateway, the enterprise has improved its contract posture and degraded observability in the same change.

A gateway between Cody and the model provider (and embedding provider, where supported) addresses all five.

The 7 axes we score on

The generic axes (provider count, routing, fallback, dashboards) miss the retrieval pipeline, the on-prem deployment shape, and the seat-model attribution problem. We replaced them with seven axes that map to what a CISO, a finance lead, and a Sourcegraph admin ask for.

Axis	What it measures
1. Per-developer attribution under the SSO seat model	Can the gateway resolve a Cody call to a verified developer identity from the SSO claim, not just the team seat or the org-level API key?
2. Context-retrieval cost observability	Does the gateway separate LLM call cost from embedding call cost from search-grounding metadata, with a per-call ledger for each?
3. Embedding-call separation	Can the gateway intercept and tag embedding calls (Voyage, OpenAI, Cohere, Sourcegraph-bundled) as a distinct call class from chat completions?
4. On-prem Cody compatibility	Does the gateway run as a sidecar or in-VPC proxy next to an on-prem Sourcegraph instance without breaking SSO claim propagation?
5. Sourcegraph audit log integration	Can the gateway correlate its trace records with Sourcegraph’s audit log (seat, repo scope, content-exclusion decision) for a single audit view?
6. BYO-model routing across providers	Can the gateway route Cody calls across OpenAI / Anthropic / open-source models with deterministic fallback while preserving Cody’s expected response shape?
7. Completion-acceptance tracking	Can the gateway capture whether a Cody autocomplete suggestion was accepted by the developer, so cost rolls up against value delivered rather than raw tokens?

Each pick gets a 7-axis score at the end of its section.

How we picked

We started with public AI gateways compatible with the providers Cody BYO-model supports as of May 2026. We removed gateways that break Cody’s streaming or tool-call shape, lack a path to capturing embedding traffic, or have no SSO claim propagation. We removed gateways with a material 2026 trust event without a clean remediation path; the LiteLLM entry explains why it’s still on the list with caveats and Helicone isn’t. The five below are what a Cody-Enterprise procurement and security team can sign off on as of May 2026 without custom integration code.

1. Future AGI Agent Command Center: Best for per-developer Cody attribution across LLM + embedding + search

Verdict: The only gateway here where the Cody retrieval pipeline produces three distinct ledgers (LLM, embedding, code-search) under one OpenAI-compatible base URL, with per-developer SAML-tagged attribution that the developer can’t override from the client side, dedicated embedding-route handling for Voyage, OpenAI, Cohere, and Sourcegraph-bundled as distinct call classes, and on-prem self-host alongside the Cody server. Protect runs inline at ~65 ms text-scan overhead per arXiv 2510.13351.

What it does: Per-developer attribution under the seat model through an identity broker that accepts a signed JWT from your IdP and re-derives the verified identity on every call (Cody seat is the parent; developer claim is fi.attributes.user.id; SSO claim can’t be spoofed server-side). Three distinct ledgers, every Cody call produces a parent span with child spans for embedding, code-search, and model completion, each with its own cost record. Dedicated embedding-route recognizing Voyage, OpenAI, Cohere, and Sourcegraph-bundled as distinct call classes. BYOC deployment with the data plane in the same VPC as on-prem Sourcegraph; Apache 2.0 traceAI is the SDK. Audit-correlation extension (Q1 2026 GA) ingests the Sourcegraph admin API and joins on trace ID, seat record, SSO claim, prompt content, content-exclusion decision, and model cost in one view. BYO-model routing across Anthropic and OpenAI by token budget, repo scope, and rate-limit fallback; streaming and tool-use preserved; routing versioned with rollback at the gateway hop. Side-by-side Cody client extension reports accepted/rejected outcomes for completion-acceptance.

The loop. Every Cody trace is scored by fi.evals (faithfulness, code-correctness, context-relevance, tool-use, policy-compliance). Low-scoring sessions feed fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) which rewrite the Cody system prompt or adjust routing across LLM and embedding lanes. A 1,800-developer deployment we observed in Q1 2026 trended down 23 percent in combined LLM-plus-embedding spend over five weeks while autocomplete acceptance held flat. Protect guardrails sit on the same hop at ~65 ms text-scan overhead per arXiv 2510.13351, so inline DLP doesn’t blow Cody’s autocomplete budget.

Where it falls short:

Sourcegraph 6.x is first-class; 5.x needs a documented manual mapping with higher dashboard latency.
Embedding-ledger covers Voyage, OpenAI, Cohere, Sourcegraph-bundled out of the box; custom services require endpoint registration.
Completion-acceptance client extension is a real IDE-plugin install, not a header change.

Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, BAA available, and BYOC deployment. AWS Marketplace listing for procurement, with private-offer pricing for enterprise commits above $50K annual.

Score: 7/7 axes.

2. Portkey: Best for hosted gateway with mature RBAC under a seat model

Verdict: Portkey is the most polished hosted-only product here. The virtual-key model maps cleanly onto Cody’s seat model, RBAC is mature, and the dashboard handles per-developer-under-seat attribution. It doesn’t optimize routes or prompts and doesn’t separate embedding traffic into a first-class ledger.

What it does: Per-developer attribution under the seat model through SAML SSO plus virtual keys (each Sourcegraph SSO claim maps to a Portkey VK, fans out to one underlying provider key, joins to the Cody seat). LLM cost is native; embedding traffic flows through but is tagged as a generic call, filter by model name to separate. BYOC data plane; control plane is Portkey cloud unless you negotiate private. Per-key, per-model, per-tag routing with deterministic fallback; streaming SSE and tool-use pass through on both lanes. Audit log export to S3 / Snowflake / Splunk; join to Sourcegraph audit log is downstream in your SIEM. Completion-acceptance requires a Cody-side modification.

Where it falls short:

No optimizer. Traces inform humans and the dashboard; they don’t feed back into routing or prompts.
Embedding-ledger separation is a dashboard filter, not a default view.
Sourcegraph audit log correlation is downstream in your SIEM.
The Palo Alto Networks acquisition (April 30, 2026; Prisma AIRS integration expected to close PANW fiscal Q4 2026) changes procurement for the next 12 months, positive for shops inside the PANW stack, negative for shops wanting gateway-vendor independence.

Pricing: Free tier with 10K requests/day. Pro tier starts at $99/month. Enterprise is custom with SOC 2 Type II and BAA.

Score: 5.5/7 axes (missing: native embedding-ledger separation, optimizer; partial on Sourcegraph audit log correlation and completion-acceptance tracking).

3. Kong AI Gateway: Best if you already run Kong next to your on-prem Cody server

Verdict: Kong AI Gateway fits when your platform team already runs Kong for REST APIs and the path of least resistance is to extend the same plane with AI Proxy as a sidecar to the on-prem Sourcegraph instance. Strengths: operational familiarity, SLA, existing Kong MSA. Weaknesses: AI-specific shallowness; embedding and audit-correlation are plugin work.

What it does: Per-developer attribution through Kong’s consumer model plus JWT/OIDC plugin, consumer ID is the attribution key, Sourcegraph seat joins by SSO claim. Request-logging plugins export to SIEM (Splunk or Datadog) with embedding separation via Kong route tags; Kong doesn’t ship a first-class three-ledger view. On-prem is the entire point of Kong, sidecar to the on-prem Cody server is the canonical pattern. Audit log correlation through a custom plugin that ingests the Sourcegraph admin API (about a one-week task). AI Proxy (3.6+) and AI Sanitizer for DLP and routing; streaming and tool-use pass through. Completion-acceptance is a custom plugin.

Where it falls short:

AI-specific observability is plugin-driven, not native. Plan two to four weeks to wire the Cody-specific chargeback dashboard.
No optimizer. Traces flow to your SIEM; routing policy is static.
Embedding separation, audit-log correlation, and completion-acceptance are all plugin work.

Pricing: Kong OSS is open source. Kong Konnect managed starts free. Enterprise plans with SLA, plugins, and AI Proxy support start around $1.5K/month and scale by data-plane count.

Score: 4.5/7 axes (missing: native AI observability, default embedding-ledger separation, optimizer; partial on Sourcegraph audit log correlation and completion-acceptance tracking).

4. LiteLLM: Best for self-hosted proxy next to an on-prem Cody server

Verdict: LiteLLM is the pick when Cody runs on-prem and model traffic can’t leave the VPC at all, even to a vendor-BYOC data plane. Source-available under MIT, Python-native, air-gappable. Less polished observability than the hosted options; the March 24, 2026 PyPI supply-chain incident raised the operational due-diligence bar materially.

What it does: Per-developer attribution through team_id and user_id on virtual keys, mapped to the Sourcegraph seat by SSO claim via your IdP. Spend-tracking module captures LLM cost natively; embedding captured if you route it through the same proxy; the three-ledger view is a SQL query, not a default panel. Strongest on-prem story of the five, source-available, runs on your nodes, no telemetry leaves the VPC; for air-gap regulated deployments, often the only realistic pick. Audit log correlation is your work via spend-record API/webhook. Wide BYO-model provider coverage with deterministic fallback; streaming and tool-use preserved. Completion-acceptance is metadata pass-through on a Cody-side modification.

Where it falls short:

The March 24, 2026 PyPI supply-chain compromise (versions 1.82.7-1.82.8, exfiltrating SSH keys and cloud credentials per the Datadog Security Labs writeup) raised the due-diligence bar materially. Pin commit hashes, upgrade past 1.83.7, rotate credentials; regulated deployments need a risk-acceptance memo. See our LiteLLM incident response and migration guide for the full remediation checklist.
UI is functional, not polished. Slicing by developer or repo means a SQL dashboard.
Observability thinner than Portkey or Future AGI; wire traceAI or another OTel sink behind LiteLLM for depth.
No optimizer.

Pricing: Open source under MIT. LiteLLM also sells an Enterprise tier with SLA, SSO, and audit; starts around $250/month for small teams and scales with the deployment.

Score: 5/7 axes (missing: native polished dashboard, optimizer, default Sourcegraph audit log correlation).

5. TrueFoundry: Best for one MSA covering inference + gateway + on-prem Cody alongside

Verdict: TrueFoundry is the pick when procurement wants a single vendor for inference, gateway, and workspace tooling under one MSA, deployed in the enterprise VPC next to the on-prem Sourcegraph instance. The gateway is competent, not the deepest; the differentiator is that the same vendor runs inference for any in-house models behind Cody BYO-model. The bundle is the point.

What it does: Per-developer attribution through workspace identity (SAML/OIDC); attribution flows to the gateway request log, joins to the Sourcegraph seat by SSO claim. Gateway request log with model-class tagging; LLM-versus-embedding separation is one tag away. Genuinely useful embedding story if you serve embeddings on TrueFoundry, embedding cost lives in the same platform as LLM cost. VPC deployment is TrueFoundry’s default: inside customer’s AWS, Azure, or GCP account next to the on-prem Cody server. Audit log correlation is downstream in the SIEM. BYO-model routing across OpenAI and Anthropic with deterministic fallback; streaming and tool-use preserved. Completion-acceptance is metadata pass-through on a Cody-side modification.

Where it falls short:

General-purpose, not Cody-aware; no first-class Cody dashboard.
The vendor bundle is a strength and a coupling, heavier than dedicated alternatives if you only need the gateway.
No optimizer.
Community footprint smaller than Portkey or Kong, which slows long-tail integration questions outside vendor support hours.

Pricing: Free trial. Production tier starts in the low four figures per month and scales by the number of workspaces and the inference volume. Enterprise pricing is bundled.

Score: 5/7 axes (missing: optimizer, dense Cody-aware dashboards; partial on Sourcegraph audit log correlation).

Capability matrix

Axis	Future AGI	Portkey	Kong AI Gateway	LiteLLM	TrueFoundry
Per-developer attribution under seat model	Identity broker + SSO re-derive	SAML + VK	Consumer + JWT	Team/user on VK	Workspace identity
Context-retrieval cost observability	Three-ledger native (LLM + embedding + search)	LLM native, embedding via filter	SIEM-side build	Spend table + SQL	Tag-based dashboard
Embedding-call separation	First-class ledger	Filter on model name	Tag on route	Model-name tag	Model-deployment platform
On-prem Cody compatibility	BYOC data + control	BYOC data plane	Self-host default	OSS air-gappable	VPC default
Sourcegraph audit log integration	Native correlation	SIEM downstream	Plugin work	DIY downstream	SIEM downstream
BYO-model routing	Deterministic + versioned	Per-key + tag	AI Proxy plugin	Wide provider	Routing layer
Completion-acceptance tracking	Side-by-side extension	Cody-side mod	Custom plugin	Metadata pass-through	Metadata pass-through
Feedback loop / optimizer	`fi.opt` closed loop	Dashboard only	Static	Static	Static

Decision framework: Choose X if

Choose Future AGI if Cody is a top-five line item and the security committee wants both “audit-grade trace correlated with Sourcegraph” and “cost trending downward.” OSS lets security read every line; BYOC runs alongside on-prem Cody; the three-ledger view answers the finance question without a downstream SIEM query.

Choose Portkey if you want a hosted gateway with mature RBAC and a polished dashboard mapped onto Cody’s seat model, and the security review allows a vendor control plane. Weigh the Palo Alto Networks acquisition timeline before signing a multi-year contract.

Choose Kong AI Gateway if your platform team already runs Kong for REST APIs and the path of least resistance is to extend the existing plane next to on-prem Sourcegraph. Plan two to four weeks of plugin work.

Choose LiteLLM if your security team requires Cody traffic to never leave the VPC and a hosted control plane is unacceptable. Write the risk-acceptance memo around the March 24, 2026 PyPI supply-chain incident first.

Choose TrueFoundry if procurement wants a single vendor for inference, gateway, and workspace tooling under one MSA. Less optimal if you only need the gateway.

Common mistakes when wiring Cody Enterprise through a gateway

The gap between “we have a gateway in front of Cody” and “Cody Enterprise is governed” is wider than platform teams expect. Easy to fix once named, expensive to discover during a SOC 2 walkthrough.

Mistake	Fix
Pointing the gateway only at the LLM provider; ignoring embedding traffic (18-25% of spend)	Route both lanes through the same gateway; separate the ledgers from day one
Trusting the Sourcegraph seat as the attribution unit — finance cannot answer per-developer chargeback	Layer the gateway’s SSO-claim attribution under the seat record; join on the SSO subject
Capturing model calls but not correlating to the Sourcegraph audit log — SOC 2 sees two timelines	Wire audit-log correlation at the gateway hop if supported; otherwise build the join in the SIEM
BYO-model rollout without rolling out the gateway in parallel — observability degraded, not upgraded	Sequence the cutover; gateway live before BYO-model live
Pointing only the on-prem Cody server at the gateway; Cody Cloud bypasses it	Disable Cody Cloud at the org level when on-prem is the policy
Inline DLP scanner chain over 200ms on the autocomplete path	Cap inline at ~100ms (Future AGI Protect ~65 ms text per arXiv 2510.13351); move heavier classification to async post-call review
Treating the Sourcegraph audit log as the only audit log	Treat the gateway log as primary; Sourcegraph log as corroborating

How Future AGI closes the loop on Cody Enterprise

The other four gateways treat Cody governance as a terminal state. Future AGI treats the trace as input to a six-stage loop:

Trace. Every Cody call produces a span tree via traceAI (Apache 2.0): parent span is the invocation, child spans are embedding, code-search, and model completion. Spans capture SSO claim, repo scope, prompt, completion, model, latency, cost, and DLP decision. Immutable.
Evaluate. fi.evals scores every call on faithfulness, context-relevance, code-correctness, and policy-compliance. Context-relevance is the Cody-specific eval: did retrieved code actually appear in the answer, or did the model hallucinate.
Cluster. Low-scoring calls cluster by failure mode. Common Cody patterns: “embedding retrieved too much code and the model paid for context it didn’t use,” and “frontier model called when a cheaper model would have produced the same accepted completion.”
Optimize. fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the Cody system prompt or adjusts routing across LLM and embedding lanes. Typical optimizations: route easy chat turns to a cheaper model; downgrade embedding on autocomplete-only paths; trim retrieval when eval data shows the model ignored lower-ranked results.
Route + re-deploy. The gateway applies the new policy on the next request. Cody BYO-model endpoint unchanged, internal routing changes, policy hot-swappable. Routes versioned; if the next 24 hours regress on context-relevance or autocomplete acceptance, automatic rollback.

Net effect: a 1,800-developer deployment at around $55K/month combined LLM-plus-embedding typically trends down 18-27 percent over five weeks, with acceptance and chat satisfaction flat or up.

OSS building blocks under Apache 2.0: traceAI, ai-evaluation, agent-opt (github.com/future-agi). Hosted Agent Command Center adds failure-cluster view, Sourcegraph audit-log correlation extension, Protect guardrails (~65 ms text per arXiv 2510.13351), RBAC, SOC 2 Type II certified, AWS Marketplace, BYOC.

What we did not include

We deliberately left out two gateways that show up in other 2026 Cody listicles:

Helicone. Acquired by Mintlify on March 3, 2026; public roadmap shifted toward a documentation-platform-first stance. Existing customers should treat this as a planned migration window, not a continued procurement for a multi-year regulated Cody workload.
Cloudflare AI Gateway. Strong edge proxy, but Cloudflare-hosted-at-the-edge as the data plane is at odds with the on-prem Cody server pattern most regulated Cody deployments use. For non-regulated Cody SaaS, Cloudflare is worth a look; for the regulated on-prem case this post is scored against, it’s the wrong shape.

Sources

Sourcegraph Cody Enterprise documentation, including BYO-model and on-prem deployment, sourcegraph.com/docs/cody
Sourcegraph admin audit log API reference, sourcegraph.com/docs/admin/audit_log
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
Portkey AI gateway, portkey.ai
Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026
Kong AI Gateway and AI Proxy plugin, konghq.com/products/kong-ai-gateway
LiteLLM proxy, github.com/BerriAI/litellm
TrueFoundry AI Gateway, truefoundry.com/ai-gateway
Datadog Security Labs LiteLLM PyPI supply-chain writeup (March 24, 2026), securitylabs.datadoghq.com

Frequently asked questions

Does Cody Enterprise need an external AI gateway when it already has SAML SSO and an audit log API?

The Sourcegraph audit log records that a chat happened by which seat against which repo scope; it does not record the full prompt or completion. For SOC 2 / SOX walkthroughs requiring 'show every prompt that left the network,' it is insufficient. The gateway captures the full payload and is the only place the SSO claim cannot be spoofed in the chargeback chain.

What is Cody BYO-model and why does it matter?

BYO model is the Enterprise-tier feature that points Cody at your own OpenAI or Anthropic deployment instead of Sourcegraph's bundled backend. When it is on, Sourcegraph is no longer the data processor; the customer is. The gateway is then the only place where DLP, attribution, embedding-traffic capture, and audit logging happen end to end.

How does Cody's embedding traffic factor into the gateway choice?

Every Cody call triggers an embedding lookup before the LLM call. On a Cody-on-Anthropic deployment with Voyage embeddings, the embedding line is 18-25 percent of total model spend. Future AGI ships first-class embedding-ledger separation by default; Portkey and LiteLLM capture the traffic but separation is a dashboard filter; Kong and TrueFoundry require tag-based or SIEM-downstream separation.

Is it safe to run a hosted gateway in front of an on-prem Cody server?

For non-regulated repos, yes — hosted gateways have SOC 2 Type II. For SOX, HIPAA, GDPR, or restricted-source classification, only if the gateway runs in your VPC: Future AGI BYOC, LiteLLM self-host, Kong on-prem, TrueFoundry VPC. Portkey's control plane is Portkey cloud unless you negotiate a private deployment.

How is Future AGI different from Portkey for Cody Enterprise?

Portkey gives you a polished dashboard mapped onto Cody's seat model. Future AGI adds first-class three-ledger separation for LLM, embedding, and search; native correlation between the gateway trace and the Sourcegraph audit log; and an optimization layer that feeds eval data back into prompts and routing. The Palo Alto Networks acquisition (April 30, 2026) is also a procurement consideration; Future AGI's Apache 2.0 OSS gives you acquisition-independence.

Does inline DLP at the gateway hop break Cody's autocomplete latency budget?

Only if the scanner chain is heavy. Cody's autocomplete is sensitive to anything over 300-500ms. Pattern-based DLP (PII regex, secret detection, regulatory keyword lists) lands in 30-80ms and is fine; a heavy semantic classifier past 200ms breaks the experience. Future AGI's Protect runs at ~65 ms inline per arXiv 2510.13351; Portkey, Kong, and TrueFoundry are in the same band; LiteLLM requires you to wire the scanner yourself.

View all

Guides

LLM Eval with Shadow Traffic and Canary Deployment in 2026

Shadow is not canary. Mirror routing with no user effect vs percentage routing with rollback. Score-attached traffic, ACC patterns, gotchas.

Rishav Hada · May 21, 2026

12 min

Guides

Evaluating Azure OpenAI LLM Apps in 2026

Azure OpenAI eval has three Azure-specific axes: deployment-name drift, region-pinning, and Content Safety precision on benign queries. Here's the pattern.

Vrinda Damani · May 20, 2026

12 min

Guides

Evaluating AWS Bedrock Agents in 2026

Bedrock's built-in eval is dev-loop only. Score action-group correctness, KB retrieval quality, and guardrail precision/recall on every release.

Rishav Hada · May 19, 2026

11 min

TL;DR

Why Cody Enterprise needs a gateway in front of it

The 7 axes we score on

How we picked

1. Future AGI Agent Command Center: Best for per-developer Cody attribution across LLM + embedding + search

2. Portkey: Best for hosted gateway with mature RBAC under a seat model

3. Kong AI Gateway: Best if you already run Kong next to your on-prem Cody server

4. LiteLLM: Best for self-hosted proxy next to an on-prem Cody server

5. TrueFoundry: Best for one MSA covering inference + gateway + on-prem Cody alongside

Capability matrix

Decision framework: Choose X if

Common mistakes when wiring Cody Enterprise through a gateway

How Future AGI closes the loop on Cody Enterprise

What we did not include

Related reading

Sources

Frequently asked questions