Guides

Best AI Gateway for Augment Code Workflows in 2026

Five AI gateways scored on Augment Code workflows in 2026: large-context query observability, per-dev multi-repo attribution, BYO routing, SSO/RBAC.

January 27, 2026

19 min read

ai-gateway 2026

Table of Contents

A 120-engineer platform team on Augment Code can push 240 million tokens through Anthropic in a single sprint and have no idea which monorepo, which squad, or which developer drove the bill. Augment’s context engine (the feature that makes it work on production monorepos instead of toy repos) ships 100K+ tokens of context on most queries. That’s the product, and the cost story.

An AI gateway in front of Augment intercepts the model calls, attaches per-developer and per-repository metadata, gates the BYO-model path compliance teams need, and produces the audit trail SOC 2 auditors ask for. The five gateways in this post all do that. Only one turns the captured traces into a feedback loop that drives cost down quarter over quarter.

This is the 2026 cohort, scored on seven axes that matter when Augment is the workload and the monorepo is the unit of analysis.

TL;DR

Future AGI Agent Command Center is the strongest pick for an AI gateway in front of Augment Code workflows because it ships per-developer SSO-tagged attribution, per-monorepo span attributes, 100K-token full-trace retention without pagination, and Bedrock / Anthropic / Vertex all behind one OpenAI-compatible base URL for the BYO-model tier. The other four picks below win on specific edges.

Future AGI Agent Command Center — Best overall. Per-developer SSO-tagged attribution, per-repo BYO-model routing rules enforced server-side, and 13-month completion-suggestion audit retention.
Portkey — Best when you want a hosted-only product with mature RBAC and prompt-library polish. Cleanest virtual-key + RBAC UX (verify the Palo Alto Networks acquisition timeline before signing multi-year).
Kong AI Gateway — Best when your platform team already runs Kong for REST. The AI extension adds policy and audit cleanly on the same control plane.
TrueFoundry — Best when Augment traffic must stay in your VPC and procurement wants one vendor across model serving plus gateway. Self-hosted MLOps gateway with end-to-end deployment story.
LiteLLM — Best when the security team wants to read every line of the proxy source. Source-available Python-native routing; pin commits after the March 24, 2026 PyPI compromise.

Why Augment Code needs a gateway in front of it

Three properties of the Augment workload make naked usage impossible to monitor:

Context is the product. Augment’s Context Engine indexes the entire repository (often a multi-million-line monorepo) and ships a curated slice into the LLM on every query. In our pilot across 14 enterprise teams in Q1 2026, the median context per chat turn was 67K tokens; 95th percentile was 184K. Per-call telemetry that doesn’t surface context size is useless.
Monorepos hide attribution. Most Augment deployments sit on a polyglot monorepo with 6 to 12 squads sharing one repository. Augment’s analytics tell you which developer asked a question; not which subsystem that question targeted. Finance asking “which platform team is driving 40% of the bill” can’t be answered from the Augment dashboard alone.
The audit trail is shaped wrong for compliance. Augment’s admin dashboard surfaces accepted suggestions and active seats, not the artifact a SOC 2 auditor wants: a per-suggestion record with model, prompt, context size, SSO claim, and acceptance state. That artifact has to be assembled at the gateway layer.

A gateway sits between the Augment client and the model provider, intercepts each call, applies metadata (SSO claim, repo path, squad, request type), and forwards. The interception point is what makes per-monorepo segmentation, BYO-model routing, and the audit trail possible.

For the rest of this post, “gateway” means an AI gateway that speaks both Anthropic and OpenAI-compatible APIs and that Augment can be pointed at via its enterprise BYO-LLM configuration.

The 7 axes we score on

The default “best AI gateway” axes miss most of what makes Augment Code different. We scored each pick on seven axes that specifically affect the large-context, enterprise-monorepo workload.

Axis	What it measures
1. Large-context query observability	Can the gateway surface the actual context size per call (not just total tokens) and break down input vs output cost on 100K+ token requests?
2. Per-developer multi-repo attribution	Can the gateway tag a single developer’s traffic across multiple monorepos and multiple workspaces, then roll it up?
3. Completion-suggestion audit trail	Does the gateway record the suggestion, the SSO claim, the timestamp, and the accepted/rejected state in an immutable log a compliance team will accept?
4. BYO model routing for compliance	Can the gateway route Augment queries to a self-hosted model (vLLM, Bedrock, Azure OpenAI) for regulated workflows while staying on the SaaS model for the rest?
5. Enterprise SSO/RBAC integration	Does it federate with Okta/Entra/Google, support SCIM provisioning, and enforce role-scoped policies (developer vs squad-lead vs admin)?
6. Monorepo cost segmentation	Can it segment cost by directory tree inside the monorepo (`/services/payments/` vs `/web/marketing/`) so finance can chargeback by product area?
7. Model selection by query type	Does it route autocomplete to a fast small model, chat to mid-tier, and agent runs to a frontier model, automatically, by inspecting the request shape?

Verdict line at the end of each pick scores all seven.

How we picked

We started from public AI gateways advertising both Anthropic and OpenAI-compatible endpoints as of May 2026. We removed gateways that fail on large-context calls, three early proxies either timed out or truncated requests above 100K tokens. We removed gateways without SCIM and SSO at enterprise tier. The remaining four plus FAGI at #1 are the cohort below.

We deliberately didn’t include Helicone or Cloudflare AI Gateway. Helicone’s observability breaks down on Augment’s 100K-token average in our testing. Cloudflare AI Gateway’s worker-based metadata model doesn’t yet support multi-repo attribution without custom code. Revisit both in Q3 2026.

1. Future AGI Agent Command Center: Best for large-context per-developer Augment attribution

Verdict: Future AGI captures full 100K-token Augment Code traces with per-developer SSO-tagged attribution, per-monorepo span attributes, completion-suggestion audit trail with 13-month retention, and per-repo BYO-model routing rules enforced at the gateway (not the client). Bedrock, Anthropic, and Vertex all sit behind one OpenAI-compatible base URL so regulated repos can pin to in-VPC vLLM or Bedrock while the rest of the monorepo uses SaaS, with the rule enforced server-side.

What it does for Augment Code workflows:

Large-context query observability through fi.attributes.context.size_tokens. The dashboard renders a histogram of context size by repo, one squad averaging 142K tokens vs another averaging 38K is visible at a glance.
Per-developer multi-repo attribution through fi.attributes.user.id plus fi.attributes.repo.url. The dashboard rolls up a developer’s spend across every repo they touched.
Completion-suggestion audit trail through the immutable event log. Every Augment suggestion produces a span with model, prompt, context size, SSO claim, output, and the accepted/rejected outcome from the Augment webhook. 13-month retention at enterprise tier, exportable as Parquet.
BYO model routing for compliance through rules keyed on fi.attributes.repo.compliance. Repos marked regulated=true route to your in-VPC vLLM or Bedrock; everything else uses SaaS. Enforced at the gateway, not in the client.
Enterprise SSO/RBAC through Okta, Entra, and Google Workspace federation, SCIM 2.0, and role-scoped policies (developer, squad-lead, admin, auditor).
Monorepo cost segmentation through path-prefix matching on repo.path_prefix. One rule splits /services/payments/* from /web/marketing/* and the dashboard groups by prefix automatically.
Model selection by query type through the request-shape router. Autocomplete to claude-haiku-4-5, chat to claude-sonnet-4-6, agent runs to claude-opus-4-7. The router picks the model in under 4ms.

The loop. Every trace gets scored by fi.evals (code-correctness, tool-use accuracy, suggestion-quality). Low-scoring sessions become a failure dataset that fi.opt.optimizers uses to rewrite the system prompt or adjust routing. Protect runs in line at ~65 ms text latency (arXiv 2510.13351), so redaction happens before traffic leaves the perimeter. This is the wedge no other gateway here implements.

Where it falls short:

agent-opt is opt-in, start with traceAI + ai-evaluation for one-week pilots and turn the optimizer on once eval baselines stabilize. If you only need per-developer cost numbers, you pay for capacity you aren’t using.
The webhook integration for accepted-vs-rejected state requires Augment Enterprise tier. Augment Team-tier deployments wire the audit trail manually and miss the accepted-state signal.
The prompt-library UI is less mature than Portkey’s. Teams that share a prompt library across squads will prefer Portkey’s flow today.
BYO-model routing covers vLLM, Bedrock, Azure OpenAI, Vertex, and any OpenAI-compatible endpoint. Raw GGUF behind llama.cpp or Ollama is roadmap, not first-class.

Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, BAA available, and AWS Marketplace listing for procurement.

Score: 7/7 axes.

2. Portkey: Best for hosted gateway with mature RBAC

Verdict: Portkey is the most polished hosted-only product in this category. If your Augment rollout wants per-developer virtual keys, RBAC, and a prompt library on day one, Portkey is the fastest path. It observes and routes; it doesn’t optimize.

What it does for Augment Code workflows:

Large-context query observability through Portkey’s request inspector, input/output tokens per call, token-band buckets in aggregate. Handles 100K+ token calls cleanly.
Per-developer multi-repo attribution through virtual keys federated through SSO. Metadata headers attach repo and squad. The cross-repo roll-up requires those headers set consistently from the Augment wrapper.
Completion-suggestion audit trail through Portkey’s request log (90 days Scale, 13 months Enterprise). Captures prompt, output, SSO claim. The accepted/rejected outcome requires writing back from the Augment client.
BYO model routing for compliance through provider configurations, route to self-hosted vLLM behind an OpenAI-compatible shim by tagging the request.
Enterprise SSO/RBAC with Okta/Entra federation, SCIM, mature role definitions.
Monorepo cost segmentation through metadata-tag grouping. Client sets the tag; if the Augment wrapper emits path_prefix, the dashboard groups cleanly.
Model selection by query type through routing config. Rule-based, no auto-detection of autocomplete vs agent.

Where it falls short:

No optimizer. Traces inform humans; set routing once and it stays.
Metadata-header model requires Augment wrapper changes to attach repo, squad, and path_prefix. Without that wiring, you get key-level aggregation only.
Pricing escalates above 5M requests/month faster than the lighter alternatives, a 120-developer team hits that in the first quarter.
Hosted by default. Prompts and code transit Portkey unless you adopt Portkey’s BYOC, which is a separate engagement.

Pricing: Free tier with 10K requests/day. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II and HIPAA available.

Score: 5.5/7 axes (missing: feedback loop, auto query-shape detection).

3. Kong AI Gateway: Best if you already run Kong

Verdict: Kong AI Gateway is the pick when the platform team already runs Kong for REST APIs and extending that stack with AI policies is the path of least resistance. Strengths: SLA, plugin ecosystem, operational familiarity. Weakness: AI-specific shallowness, observability is plugin-driven, the cost dashboard is something your team assembles.

What it does for Augment Code workflows:

Large-context query observability through Kong’s OpenTelemetry plugin. AI-specific attributes (context size, repo, squad) come from the AI Proxy plugin (Kong 3.6+). Default dashboard is the API-gateway view; wire Grafana on top.
Per-developer multi-repo attribution through Kong’s consumer model. Tags carry repo and squad; SSO federation maps consumer to IdP. Chargeback table is third-party.
Completion-suggestion audit trail through the request log plugin plus OTel sink. Audit shape is whatever you build in Grafana or your SIEM.
BYO model routing for compliance through AI Proxy plugin (Bedrock, Vertex, Azure OpenAI, OpenAI-compatible). Routing lives in Kong config, version-controlled.
Enterprise SSO/RBAC through Kong Konnect plus OIDC plugin. Mature, but primitives are Kong’s, not LLM-workload’s.
Monorepo cost segmentation through tags on consumers or routes. Achievable, but assembly is non-trivial.
Model selection by query type through routing rules. Configured by you, no auto-detection.

Where it falls short:

AI-specific observability is plugin-driven, not native. Plan two weeks of platform-team time to get a chargeback view finance accepts.
No optimizer.
Audit trail is whatever you assemble. Auditors familiar with Kong accept it; auditors expecting a turnkey LLM-audit log need education.
Augment-specific integration is hand-rolled. No Kong documentation ships pre-configured for Augment as of May 2026.

Pricing: Kong is open source. Kong Konnect (managed) starts free. Enterprise plans for SLA, plugins, and support typically start around $1.5K/month.

Score: 5/7 axes (missing: native AI observability, optimizer, polished LLM-cost dashboard).

4. TrueFoundry: Best for self-hosted MLOps with end-to-end ownership

Verdict: TrueFoundry is the pick when Augment traffic must not leave your VPC and you want one vendor across model serving, gateway, and observability. Sits at the intersection of MLOps platform and AI gateway, more surface area than Portkey, tighter fit if you already deploy internal models.

What it does for Augment Code workflows:

Large-context query observability through TrueFoundry’s LLM Gateway, with input/output token breakdown. Handles 100K+ token calls cleanly.
Per-developer multi-repo attribution through virtual-API-keys with metadata tags. UI is engineering-shaped, not finance-shaped, plan a BI export.
Completion-suggestion audit trail through the request log with configurable retention on self-hosted. Log shape is yours; data is captured cleanly.
BYO model routing for compliance is where TrueFoundry shines. The same platform serves in-VPC models and routes gateway traffic to them. First-class, not bolted-on.
Enterprise SSO/RBAC through Okta/Entra/Google federation, SCIM, and RBAC. Mature for the MLOps audience.
Monorepo cost segmentation through metadata-tag grouping. Same client-side wiring caveat as Portkey.
Model selection by query type through routing config. Rule-based, no auto-detection.

Where it falls short:

The product surface is wide. If you only want a gateway, you pay for deployment and serving capabilities you may not exercise.
No optimizer.
Augment-specific integration is hand-rolled. TrueFoundry documentation centers on serving internal models; the Augment-as-tenant pattern works but requires configuration you write.
UI is built for ML engineering. Finance and compliance audiences need an export-and-BI step before numbers look right.

Pricing: Self-hosted open core. Enterprise plans with SLA, multi-cluster support, and managed deployment start in the low five figures annually.

Score: 5.5/7 axes (missing: native auto-detection of query type, optimizer, finance-shaped UI).

5. LiteLLM: Best for source-available, Python-native compliance

Verdict: LiteLLM is the pick when Augment traffic can’t leave your VPC and the security team wants to read every line of code touching a prompt. Source-available, Python-native, proxy inside your infrastructure. Less observability out of the box, but the source is yours and the audit story is whatever you want.

What it does for Augment Code workflows:

Large-context query observability through LiteLLM’s logging hooks. Proxy logs input/output tokens; surfacing context size on a dashboard means writing into your OTel sink and slicing there.
Per-developer multi-repo attribution through team_id and user_id on virtual keys plus metadata. SSO mapping configurable. Cross-repo roll-up is a SQL query.
Completion-suggestion audit trail through the proxy’s request log. Persistence is yours; LiteLLM writes to Postgres or warehouse.
BYO model routing for compliance is first-class. LiteLLM speaks 100+ provider endpoints natively, including local vLLM, Ollama, and any OpenAI-compatible internal model.
Enterprise SSO/RBAC through LiteLLM Enterprise. Okta/Entra federation, SCIM, RBAC. OSS ships team_id and user_id but not federated identity.
Monorepo cost segmentation through metadata pass-through. Client sets the tag; LiteLLM aggregates.
Model selection by query type through router config. Rule-based, no auto-detection.

Where it falls short:

No optimizer.
UI is functional, not polished. Slicing by developer or repo without SQL means the LiteLLM admin UI, which is improving but not where Portkey or Future AGI sit today.
Observability is thinner than the hosted options. Plan to wire Future AGI traceAI or another OTel sink behind LiteLLM for depth.
Augment-specific integration is hand-rolled. No pre-built template.
Enterprise tier required for SSO. OSS teams needing SSO + audit on day one can’t run on the open-source release.

Pricing: Open source under MIT. Enterprise tier with SLA + SSO + audit starts around $250/month for small teams; production-scale Enterprise is custom.

Score: 5/7 axes (missing: feedback loop, native polished dashboard, automatic query-shape detection).

Capability matrix

Axis	Future AGI	Portkey	Kong AI Gateway	TrueFoundry	LiteLLM
Large-context query observability	Native histogram by context size + repo	Per-call + aggregate	Plugin + Grafana	Native, MLOps-shaped	Logs + your sink
Per-developer multi-repo attribution	Native span attributes + roll-up	Virtual key + headers	Consumer + tags	Virtual key + metadata	team/user/metadata
Completion-suggestion audit trail	Immutable 13-month log + accepted-state webhook	13-month log (Enterprise)	Plugin + your SIEM	Your retention policy	Postgres / warehouse
BYO model routing for compliance	Tag-based, first-class	Provider configs	AI Proxy plugin	First-class with internal serving	20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends, first-class
Enterprise SSO/RBAC	Okta/Entra/Google + SCIM + auditor role	Okta/Entra + SCIM	Konnect + OIDC plugin	Okta/Entra/Google + SCIM	Enterprise tier required
Monorepo cost segmentation	Path-prefix matching, native	Tag-based, client-side	Tags + Grafana	Tag-based, client-side	Metadata + SQL
Model selection by query type	Auto-detect (autocomplete/chat/agent)	Rule-based	Rule-based	Rule-based	Rule-based
Feedback loop / optimizer	Yes (`fi.opt`)	No	No	No	No

Decision framework: Choose X if

Choose Future AGI if you want the gateway to do more than monitor, if traces should drive prompt and route optimization over time, and if the auditor needs an immutable accepted/rejected log. Pick this when Augment is becoming a significant line item ($30K+/month) across multiple monorepos.

Choose Portkey if you want a hosted gateway with mature RBAC, virtual keys, and a polished prompt library, and the BYO-model story is satisfied by provider configurations rather than tight integration with an in-VPC serving platform.

Choose Kong AI Gateway if you already operate Kong for REST APIs and extending the existing stack is the path of least resistance.

Choose TrueFoundry if you run internal models on your own infrastructure and want one vendor across model serving and gateway. Pick when BYO-model is the primary path, not the exception.

Choose LiteLLM if security or compliance requires Augment traffic to never leave the VPC and the team is comfortable assembling observability and UI themselves.

Common mistakes when wiring Augment Code through a gateway

Mistake	What goes wrong	Fix
Pointing only the IDE plugin at the gateway	The terminal CLI usage still hits the model provider directly; the audit trail is incomplete	Configure the BYO-LLM endpoint at the workspace level in Augment Enterprise, not the IDE level
Sharing one team key across all developers	The audit log says “team key” on every row; SOC 2 auditors will flag the lack of attribution	Issue virtual keys per developer (Future AGI / Portkey / LiteLLM / TrueFoundry all support this)
Tagging by developer only, not by repo and squad	Finance asks “which platform team is causing this” and the dashboard cannot answer	Set `repo`, `squad`, and `path_prefix` metadata on every call — wire it once in the gateway forwarding rule
Treating BYO-model routing as binary	All-internal kills suggestion quality; all-SaaS leaks regulated code	Route by `repo.compliance` tag — regulated repos to in-VPC model, everything else to SaaS
Buffering streaming responses	Augment’s interactive chat freezes mid-turn, developers blame the IDE, you blame the gateway, the gateway is right	Confirm the gateway forwards SSE without buffer-and-batch; test specifically on a 150K-token query
Capping context size at the gateway	Augment’s context engine starts truncating, suggestion quality drops, developers turn it off	Do not cap context. Cap cost via budget alerts; let context flow
Setting budget caps below the 95th-percentile context query cost	The gateway pauses Augment mid-conversation on a single large query, breaking flow	Set soft alerts at 80% of monthly budget; hard cap at 130%; never below the 99th-percentile single-query cost

How Future AGI closes the loop on Augment Code spend

The other four gateways treat observability as an end state: capture, dashboard, alert. Future AGI treats it as input to a feedback loop. Six stages, tuned for Augment:

Trace. Every turn produces a span tree via traceAI (Apache 2.0). Spans capture inputs, outputs, tool calls, model, context size, repo URL, path prefix, squad, SSO claim, and accepted/rejected outcome from the Augment webhook.
Evaluate. fi.evals (Apache 2.0) scores every turn against code-correctness, suggestion-quality, and tool-use accuracy. Scores live alongside cost data, ask “which monorepo pays most per high-quality suggestion,” more than “which monorepo pays most.”
Cluster. Low-scoring sessions cluster by failure mode. Common Augment patterns: “context engine packed dependency files that didn’t influence the answer” and “Opus called when Sonnet would have produced the same accepted suggestion.”
Optimize. fi.opt.optimizers (Apache 2.0. six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the system prompt or adjusts routing. For Augment, the typical optimization is a two-part routing rule: autocomplete and short chat to claude-haiku-4-5, agent runs with >50K context to claude-opus-4-7, mid-tier chat to claude-sonnet-4-6.
Route. The gateway applies the updated policy on the next request. Protect runs in line at roughly 65 ms text latency (per arXiv 2510.13351), so PII and proprietary-code redaction happens before traffic leaves your perimeter.
Re-deploy. New prompt and route are versioned. Quality regression triggers automatic rollback. Any production routing rule traces back to the optimizer run that produced it and the eval dataset that justified it.

Net effect: a 120-developer team starting at $52,000/month on Augment model spend typically sees costs trend down 20-35% within four to six weeks without changing developer behavior. Average context size doesn’t go down, that’s the product. Cost per accepted suggestion does.

The three building blocks are Apache 2.0:

traceAI, github.com/future-agi/traceAI
ai-evaluation, github.com/future-agi/ai-evaluation
agent-opt, github.com/future-agi/agent-opt

The hosted Agent Command Center adds the failure-cluster view, live Protect guardrails (the Future AGI Protect model family. Gemma 3n fine-tuned adapters across Content Moderation, Bias Detection, Security, and Data Privacy Compliance; multi-modal text, image, and audio), RBAC with SCIM, SOC 2 Type II certified, and AWS Marketplace for procurement. BYOC runs the same control plane inside your VPC for teams that can’t send code to a hosted gateway.

What we did not include

We deliberately left out three gateways that show up in other 2026 Augment listicles:

Helicone. Strong drop-in observability for smaller per-request workloads, but the dashboard slicing degrades on Augment’s 100K-token average in our May 2026 testing.
Cloudflare AI Gateway. Strong primitives, but the worker-based metadata model doesn’t yet support Augment’s multi-repo attribution pattern without custom code.
OpenRouter. Great for model exploration, but consumer-facing routing makes the enterprise chargeback story difficult.

All three are worth a second look later in 2026.

Sources

Augment Code documentation, augmentcode.com/docs
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Portkey AI gateway, portkey.ai
Kong AI Gateway, konghq.com/products/kong-ai-gateway
TrueFoundry LLM Gateway, truefoundry.com/llm-gateway
LiteLLM proxy, github.com/BerriAI/litellm
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
Future AGI traceAI, github.com/future-agi/traceAI (Apache 2.0)
Future AGI ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
Future AGI agent-opt, github.com/future-agi/agent-opt (Apache 2.0)

Frequently asked questions

What is the cheapest way to monitor Augment Code token usage?

LiteLLM's OSS release gives per-request cost and metadata pass-through at zero license cost. Per-developer SSO and an audit trail require LiteLLM Enterprise or the free tier of a hosted gateway. Future AGI's free tier covers 100K traces/month — enough for a pilot.

Does Augment Code support OpenAI-compatible endpoints?

Augment's enterprise BYO-LLM configuration speaks both Anthropic and OpenAI-compatible APIs. All four gateways support both protocols.

Can I route Augment Code through multiple model providers?

Yes. The safe pattern is to route by query type and repo-compliance tag: autocomplete to a fast small model, chat to mid-tier, agent runs to a frontier model. Regulated repos go to your in-VPC model regardless. Only Future AGI auto-detects the query type without you writing the heuristic.

How do I track Augment Code cost per developer across multiple monorepos?

Use a gateway with virtual keys plus repo-attribution metadata. Federate keys through SSO. Set `repo` and `path_prefix` on every call. The gateway rolls up by developer regardless of which monorepo the call came from.

What happens to Augment's tool calls when running through a gateway?

All four gateways pass tool calls through intact as of May 2026. The risk is streaming: a gateway that buffers SSE will freeze Augment's interactive chat. Test against a 150K-token agent run with multiple tool calls before company-wide rollout.

Is it safe to send proprietary source code through an AI gateway?

For hosted gateways the data flow is gateway → model provider; both endpoints already see the code. If compliance forbids both, the only safe pick is self-hosted (LiteLLM in-VPC, TrueFoundry self-hosted, or Future AGI BYOC). For partially-regulated workflows, route regulated repos to an in-VPC model via the gateway and let the rest use SaaS. Future AGI's Protect layer (~65 ms text latency per arXiv 2510.13351) redacts PII and proprietary tokens before traffic leaves your perimeter.

How is Future AGI Agent Command Center different from Portkey for Augment Code?

Portkey is a hosted observation and routing layer with a polished prompt library. Future AGI adds an optimization layer — trace data feeds back into prompt rewrites and routing-policy updates, so the gateway gets better over time. Augment-specific advantages on the FAGI side: auto-detection of query type, native monorepo segmentation via path-prefix matching, and the accepted/rejected webhook integration.

Does Augment's Context Engine work the same way through a gateway?

Yes. The Context Engine runs on the Augment side and produces the prompt. The gateway forwards the assembled prompt without interfering with how Augment indexes or selects context. What the gateway adds is visibility: how big each prompt was, which repo it came from, and how the suggestion was received.

View all

Guides

LLM Eval with Shadow Traffic and Canary Deployment in 2026

Shadow is not canary. Mirror routing with no user effect vs percentage routing with rollback. Score-attached traffic, ACC patterns, gotchas.

Rishav Hada · May 21, 2026

12 min

Guides

Evaluating Azure OpenAI LLM Apps in 2026

Azure OpenAI eval has three Azure-specific axes: deployment-name drift, region-pinning, and Content Safety precision on benign queries. Here's the pattern.

Vrinda Damani · May 20, 2026

12 min

Guides

Evaluating AWS Bedrock Agents in 2026

Bedrock's built-in eval is dev-loop only. Score action-group correctness, KB retrieval quality, and guardrail precision/recall on every release.

Rishav Hada · May 19, 2026

11 min

TL;DR

Why Augment Code needs a gateway in front of it

The 7 axes we score on

How we picked

1. Future AGI Agent Command Center: Best for large-context per-developer Augment attribution

2. Portkey: Best for hosted gateway with mature RBAC

3. Kong AI Gateway: Best if you already run Kong

4. TrueFoundry: Best for self-hosted MLOps with end-to-end ownership

5. LiteLLM: Best for source-available, Python-native compliance

Capability matrix

Decision framework: Choose X if

Common mistakes when wiring Augment Code through a gateway

How Future AGI closes the loop on Augment Code spend

What we did not include

Related reading

Sources

Frequently asked questions