Guides

Best AI Gateway for Windsurf Cascade Mode in 2026

Five AI gateways scored for Windsurf Cascade Mode 2026: long-session trace continuity, per-task cost, autonomous-action audit, trajectory.

January 6, 2026

20 min read

ai-gateway 2026 windsurf

Table of Contents

A single Windsurf Cascade session at full autonomy can run four hours, fire 600 tool calls, touch 80 files, and burn $180 in Claude Opus tokens before either finishing the task or quietly entering a loop where it edits the same handler three times. Cascade doesn’t ask; Cascade goes. If it did the wrong work for two hours, the developer pays anyway, and the only artefact is a 40,000-line trace nobody wants to read.

Cascade Mode is agentic. Traces are long, cost per task is concentrated, autonomous actions need an audit, and “was Cascade making progress” needs an answer that doesn’t require scrubbing the trace by hand. (For the eval side of that question, see evaluating coding agents.) This is the 2026 cohort, scored on the seven axes that matter.

TL;DR

Future AGI Agent Command Center is the strongest pick for an AI gateway in front of Windsurf Cascade Mode because it captures Cascade trajectories as full long-session traces (not paginated), per-developer SSO-tagged attribution, per-task hard-cutoff budgets, and Bedrock / Anthropic / OpenAI all behind one OpenAI-compatible base URL so Cascade can switch providers per planner-vs-executor turn. The other four picks below win on specific edges.

Future AGI Agent Command Center — Best overall. Long-session trajectory traces, per-developer attribution, per-task hard-cutoff budgets, and provider-mixed routing under one base URL.
Portkey — Best for per-developer caps and a polished prompt-library UI on top of Cascade traffic. Mature hosted virtual keys + RBAC (verify the Palo Alto Networks acquisition timeline before signing multi-year).
Kong AI Gateway — Best when the platform team already runs Kong. The AI Proxy plugin extends spend governance to Cascade with familiar tooling.
LiteLLM — Best when Cascade traffic cannot leave your VPC and audit trails must be locally controlled. Self-hosted source-available Python-native routing; pin commits after the March 24, 2026 PyPI compromise.
Maxim Bifrost — Best for the lowest-latency option for teams running Cascade across hundreds of developers concurrently. Vendor-published ~11 µs gateway overhead at 5,000 RPS.

Why Windsurf Cascade Mode needs a gateway in front of it

Windsurf (formerly Codeium) launched Cascade as a chat side-panel in 2024, then turned it into a fully agentic mode through 2025. Cascade can run shell commands, edit multiple files across the workspace in a single turn, and keep going across dozens of turns without a developer in the loop. The monitoring surface that worked for a tab-complete copilot doesn’t survive the move to Cascade.

Three properties make it hard to monitor without help:

Sessions are long. A non-trivial Cascade task, “migrate this Express route to Fastify and update the tests”, easily spans 80 to 200 turns. We measured one session that ran 4 hours 12 minutes and produced 47,000 lines of trace data. The default Cascade UI shows a summary; the gateway has to keep the full trajectory or the audit trail is gone.
Cost is concentrated in a few tasks, not a few developers. This is the inversion from chat-style copilots. With Cursor or Claude Code inline, the top 10% of developers eat half the budget. With Cascade, the top 5% of tasks eat 60% of the budget. A single complex refactor can rack up $50 to $200 in Anthropic tokens. The procurement story has to live at the task level.
The autonomous-action surface is wider than the model call. Cascade runs git, runs npm install, runs your test suite, edits files, and occasionally tries to run a database migration because the docstring told it to. A gateway that captures only the LLM call misses two-thirds of the audit story.

A well-shaped gateway sits between Windsurf and api.anthropic.com. It intercepts each model call, attaches metadata (task ID, developer ID, repo, plan step), captures the tool-use blocks intact, and forwards the request. The interception point is what makes spend caps, trajectory scoring, and audits possible. All five picks below support pointing Cascade at them via Windsurf’s custom-endpoint configuration.

The 7 axes we score on

The generic “best AI gateway” axes are too coarse for Cascade. We scored each pick on seven that specifically affect autonomous-coding-agent monitoring.

Axis	What it measures
1. Long-session trace continuity	Can the gateway hold a 4-hour, 200-turn Cascade trajectory in one navigable view without dropping spans?
2. Per-task cost attribution	Can the gateway group cost by Cascade task ID, not just by developer or API key?
3. Autonomous-action audit trail	Does the gateway capture every shell, file, and browser command Cascade ran, with arguments and exit codes?
4. Trajectory scoring	Can the gateway answer “was Cascade making progress” with a metric, not vibes?
5. Tool-call observability	Does Cascade’s tool-use (shell, file, browser, MCP servers) survive the hop and surface in the dashboard as first-class spans?
6. Runaway-spend cutoffs	Can you set “kill this task at $100” and have the gateway actually cancel mid-trajectory?
7. Multi-model routing for cost-task fit	Can the gateway route Cascade’s easy turns (intent classification, file reads) to a cheap model and reserve Opus for the hard turns?

Verdict line at the end of each pick scores all seven.

How we picked

We started from public AI gateways that advertise an Anthropic-compatible endpoint as of May 2026. We removed gateways that don’t preserve tool calls (two early proxies that batched streaming and lost Cascade’s tool_use blocks), gateways without per-key metadata pass-through, and gateways untested against a real Cascade trajectory longer than 100 turns, long-session behaviour isn’t a thing you guess at from a synthetic benchmark.

1. Future AGI Agent Command Center: Best for trajectory scoring + the cost feedback loop

Verdict: Future AGI is the only gateway here that gives a Cascade trajectory a score, not a cost alone. Every other pick stops at “here is the trace and here is the dollar figure”. fi.evals.TrajectoryScore looks at the full span tree and tells you whether Cascade was converging on the goal or burning tokens sideways. The same trajectory data feeds the optimizer and the routing layer.

What it does for Windsurf Cascade Mode:

Long-session trace continuity is native, traceAI’s span batching with 5-minute window flushes brings a 4-hour Cascade trajectory through as one connected tree. The trajectory view collapses by tool-call type and expands by turn, which is necessary for the 40K-line traces Cascade produces.
Per-task cost attribution through the fi.attributes.task.id span attribute; the dashboard groups by it natively. “Top 10 most expensive tasks this week” ships out of the box.
Autonomous-action audit trail as first-class spans. Every shell command becomes a tool.shell span with command, working directory, exit code, and stdout snippets. File edits become tool.file.write spans with the diff. MCP server calls become tool.mcp spans.
Trajectory scoring is the wedge. fi.evals.TrajectoryScore returns a 0-to-1 score with sub-component breakdowns: goal-adherence, redundancy, dead-end-recovery, and tool-call efficiency. Sessions that loop drop on redundancy. The score lives next to the cost, so cost-quality is one column away. See the definitive guide to AI agent evaluation for the rubric set behind these dimensions.
Tool-call observability preserved because the gateway parses Anthropic’s tool_use and tool_result blocks rather than re-serialising as text. Same parser handles MCP servers.
Runaway-spend cutoffs through fi.alerts rolling-window thresholds with auto-pause. Set “$100 per task, hard cap” and the gateway returns a 429 with a cancellation reason Cascade respects.
Multi-model routing for cost-task fit through the policy DSL. The default Cascade-tuned policy routes turns under 8K context to claude-haiku-4-5, 8K-50K to claude-sonnet-4-6, and synthesis turns to claude-opus-4-7.

The loop. Every trajectory gets scored. Low-scoring trajectories get clustered by failure mode. fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the system prompt or adjusts the routing policy against the cluster. The next Cascade session runs the updated config. No other gateway here implements this.

Where it falls short:

Live trajectory scoring adds about 80ms of evaluator latency per turn. Most teams move to post-session scoring after a week to remove the inline overhead.
agent-opt is opt-in, for one-week pilots focused on per-task cost numbers, start with traceAI + ai-evaluation and turn the optimizer on once eval baselines stabilize.
The prompt-library UI is less mature than Portkey’s.
BYOC deployments use Kubernetes; teams without an existing K8s footprint stand one up or stay hosted.

Pricing: Free tier with 100K traces per month. Scale starts at $99 per month. Enterprise is custom with SOC 2 Type II, BAA, and BYOC. AWS Marketplace listed.

Score: 7/7 axes.

2. Portkey: Best for hosted gateway with mature per-key budgets

Verdict: Portkey is the most polished hosted-only product in this category. For a team running Cascade across 20 to 100 developers that needs per-developer keys, hard budget caps, and a clean RBAC story, Portkey is the fastest path to production. It observes, routes, and gates spend; it doesn’t score trajectories or optimize prompts back.

What it does for Windsurf Cascade Mode:

Long-session trace continuity through Portkey’s trace_id request header. Cascade doesn’t set the header natively, so the wrapper has to. With the wire in place, the dashboard groups all turns of a task into one tree. Without it, every turn looks like an isolated request.
Per-task cost attribution through metadata headers, same caveat. The hosted UI aggregates by metadata.task_id with a top-N view.
Autonomous-action audit trail captures the model-call portion correctly. Shell and file tool calls are visible as tool_use blocks, but the surfacing is generic, they show up as “tool call to function run_shell” rather than first-class shell spans. For forensic audit (“what did Cascade run in repo X last Tuesday”), expect to export to SQL.
Trajectory scoring isn’t a Portkey feature.
Tool-call observability confirmed working with claude-opus-4-7 and claude-sonnet-4-6. SSE pass-through is intact.
Runaway-spend cutoffs through per-key budget caps with hard-pause. The cap is per-developer, not per-task, so a single runaway Cascade session can still consume an entire day’s budget.
Multi-model routing through Portkey’s routing config, the team writes the conditional rules; no Cascade-tuned defaults shipped.

Where it falls short:

No trajectory scoring; traces inform humans, not the gateway.
Metadata-header model requires Windsurf wrapper changes.
Per-task budget caps aren’t in the default UI as of May 2026; the workaround (a virtual key per task) isn’t a sane operational model.
Pricing escalates above 5M requests per month faster than the lighter alternatives.

Pricing: Free tier with 10K requests per day. Scale starts at $99 per month. Enterprise is custom with SOC 2 Type II.

Score: 5/7 axes (missing: trajectory scoring, first-class autonomous-action audit).

3. Kong AI Gateway: Best for AI on top of existing Kong infrastructure

Verdict: Kong AI Gateway is the pick when the platform team already runs Kong and the question is “extend or add a new vendor”. Strengths: SLA, plugin ecosystem, the audit-log story security has already approved, operational familiarity. Weakness: AI-specific shallowness, most Cascade-relevant observability happens via plugins, so the team is buying themselves a build.

What it does for Windsurf Cascade Mode:

Long-session trace continuity through the OTel plugin. Kong captures the request lifecycle; the team wires span attributes through Lua or the AI Proxy plugin. The trace store is whatever OTel sink you already run. Jaeger, Tempo, or in many cases a Future AGI traceAI deployment downstream.
Per-task cost attribution through tags on the Kong consumer or via header pass-through. Chargeback dashboard is third-party, typically Grafana on the OTel sink.
Autonomous-action audit trail is the most operationally mature in the cohort. Kong’s existing audit-log story applies, every request, including tool-use payloads, lives in the same pipeline as the rest of the company’s API traffic. Security teams that approved Kong inherit Cascade for free.
Trajectory scoring isn’t a Kong feature.
Tool-call observability through the AI Proxy plugin from Kong 3.6 onward. Tool-use blocks survive intact; dashboard surfacing is whatever the team builds on top.
Runaway-spend cutoffs through rate-limiting plugins. Out of the box, Kong rate-limits by request count, not dollar value. Plan two weeks of platform-engineering time to wire a “kill at $100” policy.
Multi-model routing through the AI Proxy plugin’s routing rules. No Cascade-tuned defaults.

Where it falls short:

AI-specific observability is plugin-driven, not native. The default dashboard is the API-gateway view.
No trajectory scoring, no optimizer.
Spend-tracking is a build. Plan two weeks of platform-engineering time for the chargeback view finance will accept.
The Konnect managed offering’s AI-Proxy feature parity lags the OSS plugin’s by about a quarter.

Pricing: Kong is open source. Konnect starts free. Enterprise plans start around $1,500 per month.

Score: 4/7 axes (missing: trajectory scoring, native AI observability, native spend cutoff, native Cascade routing).

4. LiteLLM: Best for self-hosted Python-native routing

Verdict: LiteLLM is the pick when Cascade traffic can’t leave the VPC and security wants to read every line that touches a prompt. Source-available under MIT, Python-native, runs as a proxy inside your infra. Less observability out of the box than the hosted options, but the source code and audit trail are yours.

What it does for Windsurf Cascade Mode:

Long-session trace continuity through metadata pass-through plus a downstream OTel sink. Wire metadata.session_id and metadata.task_id in the proxy config, point the OTel exporter at your trace store (or at traceAI in self-hosted mode). LiteLLM is the routing layer, not the trace warehouse.
Per-task cost attribution through team_id, user_id, and arbitrary metadata on virtual keys. Cost numbers live in LiteLLM’s spend-tracking database; slicing by task requires SQL.
Autonomous-action audit trail captures request and response payloads; tool-use blocks are inside the payload but not parsed into first-class spans. The audit story is “search SQL by user and task ID”.
Trajectory scoring isn’t a LiteLLM feature.
Tool-call observability preserved at the payload level. Cascade’s tool-use doesn’t break behind the proxy. Dashboard surfacing is non-existent; wire a downstream observability layer.
Runaway-spend cutoffs through spend tracking and per-key budgets. Webhook-based alerting. Per-task caps aren’t native, the per-key model fights the per-task cost reality.
Multi-model routing is LiteLLM’s bread and butter. Router config supports token-count routing, fallback chains, retry policies, the most flexible substrate of the cohort.

Where it falls short:

No trajectory scoring, no optimizer.
UI is functional, not polished. Slicing by task means a SQL dashboard.
Observability story is thinner than Portkey or Future AGI; plan to wire traceAI or another OTel sink downstream.
Per-task budget caps require custom middleware.

Pricing: Open source under MIT. Enterprise tier with SLA, SSO, audit log, JWT auth starts around $250 per month.

Score: 4.5/7 axes (missing: native trajectory scoring, native autonomous-action dashboard, native per-task budgeting).

5. Maxim Bifrost: Best for high-throughput Cascade fleets

Verdict: Maxim’s Bifrost is the right pick when the constraint is throughput, not the observability surface. Bifrost is the Go-based proxy in Maxim’s stack, benchmarked at sub-millisecond gateway overhead at production load. For Cascade across hundreds of concurrent developers, model-call rates spike when half the team triggers a long task at 10am Monday. Bifrost stays out of the way. The trade is that the deep observability lives elsewhere in Maxim’s product.

What it does for Windsurf Cascade Mode:

Long-session trace continuity through Bifrost’s x-bf-session-id and x-bf-task-id headers (Cascade’s wrapper has to set them). Trace storage is in Maxim’s hosted observability tier; the 4-hour trajectory comes through, though the UI’s collapse-by-tool view is less mature than Future AGI’s.
Per-task cost attribution through the task header. Spend tracking is per-task and per-developer; chargeback view built-in.
Autonomous-action audit trail captures the model-call payload. Tool calls appear in the trace tree, but first-class shell-command audit views (“every git push Cascade ran this week”) aren’t native.
Trajectory scoring is on the Maxim roadmap but not GA. Beta customers score retrospectively with Maxim’s eval product; the live-scoring loop is a quarter away.
Tool-call observability preserved end-to-end. Streaming pass-through works at high concurrency without buffering, the load case Bifrost is designed for.
Runaway-spend cutoffs through the budget plugin. Per-developer caps work out of the box; per-task caps require the task-ID wire and are configured in Maxim’s dashboard, not Bifrost itself.
Multi-model routing is well-supported. Routing config is similar in shape to LiteLLM’s, with the benefit that Maxim’s eval data can inform the rules if the team is on the full stack.

Where it falls short:

Trajectory scoring is roadmap, not GA. Teams that want the metric today end up using Future AGI’s fi.evals.TrajectoryScore downstream of Bifrost, viable, but not the single-vendor story Maxim’s marketing implies.
Deep observability lives in the Maxim product, not Bifrost itself. Adopting Bifrost tends to pull in the rest of the Maxim stack, which can be a procurement footprint the team didn’t budget for.
Self-host of Bifrost is available; the eval and dashboard pieces are hosted-only.
The shell-command audit view is generic, not Cascade-aware.

Pricing: Bifrost open source under Apache 2.0. Hosted observability and eval tier starts in the four-figure-per-month range; enterprise custom.

Score: 5/7 axes (missing: GA trajectory scoring, first-class autonomous-action dashboard).

Capability matrix

Axis	Future AGI	Portkey	Kong AI GW	LiteLLM	Maxim Bifrost
Long-session trace continuity	Native, 5-min span flush	Header-wired	OTel plugin	Metadata + downstream OTel	Header-wired
Per-task cost attribution	Native span attr	Metadata header	Consumer tag	SQL on virtual key	Native task header
Autonomous-action audit	First-class spans	Generic tool_use	OTel plugin	Payload only	Generic tool_use
Trajectory scoring	`fi.evals.TrajectoryScore`	None	None	None	Roadmap
Tool-call observability	First-class	Streaming + tool_use	AI Proxy 3.6+	Payload-level	First-class in trace
Runaway-spend cutoffs	Auto-pause + per-task	Per-key cap	Plugin build	Webhook	Per-developer cap
Multi-model routing	Cascade-tuned defaults	Customer-written	Plugin-written	Most flexible	Customer-written
Self-improving loop	`fi.opt` optimizers	None	None	None	None

Decision framework: Choose X if

Choose Future AGI Agent Command Center if you want the gateway to score Cascade trajectories, not count tokens alone. agent-opt is opt-in, turn it on once Cascade has eval baselines and live traces flowing, and the team gets per-session progress versus wasted-budget signal in the same trace view.

Choose Portkey if you want a hosted gateway with mature per-developer keys and budget caps, no trajectory scoring needed yet, and the team values UI polish. Pick this when the procurement story matters and “monitoring as one-time setup” is the right shape for the first quarter.

Choose Kong AI Gateway if you already operate Kong for REST APIs and the path of least resistance is to extend the existing stack. Pick this when the platform team’s familiarity outweighs the build cost.

Choose LiteLLM if security or compliance requires Cascade traffic to never leave the VPC. Pick this when source-availability and self-host control beat hosted polish, and you can wire a separate observability layer.

Choose Maxim Bifrost if the constraint is throughput (hundreds of developers running Cascade concurrently) and the team is willing to adopt the rest of the Maxim stack for the full observability story.

Common mistakes when wiring Windsurf Cascade through a gateway

Mistake	What goes wrong	Fix
Pointing only Windsurf’s chat panel at the gateway	Cascade’s autonomous mode uses a separate API path; chargeback misses the agentic traffic, which is most of the spend	Configure the gateway endpoint at the Windsurf workspace level so both chat and Cascade share it
Tagging only by developer, not by task	The “top 10 most expensive things this week” view shows users, not tasks; finance asks the wrong question and engineering answers the wrong question	Tag both `user_id` and `task_id`; surface the task-level table first
No runaway-spend cutoff	One looped Cascade session in the middle of the night burns $400 before anyone notices	Set per-task hard cap with auto-cancel; soft-alert at 80%, hard-stop at 120% of the team’s baseline
Buffering SSE on the gateway	Cascade’s progress UI freezes mid-turn; the developer assumes the agent is stuck and kills the session, wasting the work	Confirm the gateway forwards SSE without buffer-and-batch
Stripping `tool_use` blocks at the gateway	Cascade’s tool-call loop breaks silently; the agent stops being able to run shell or edit files	Use a gateway that parses `tool_use` and `tool_result` blocks as first-class content, not as text
Treating all Cascade turns as Opus-worthy	Cascade fires a lot of cheap classification turns (“which file should I open”); routing all of them to Opus is the single largest source of overspend	Wire a multi-model route with thresholds at 8K and 50K input tokens
No retention policy on long traces	A few months of Cascade trajectories at 40K lines each turns into terabytes of trace data	Set a 30/90-day retention with summarisation rollups for trajectories beyond the window

How Future AGI closes the loop on Cascade spend

The other four gateways treat Cascade monitoring as an end state: capture the trace, show it in a dashboard, alert when spend trips a threshold. Future AGI treats it as the input to a feedback loop.

Trace. Every Cascade turn produces a span tree via traceAI (Apache 2.0). Spans capture inputs, outputs, tool calls, model used, task ID, and the Cascade plan step.
Evaluate. fi.evals.TrajectoryScore scores every trajectory against goal-adherence, redundancy, dead-end-recovery, and tool-call-efficiency rubrics. The score lives alongside the cost data. Sessions where Cascade looped three times on the same handler drop on redundancy; sessions where Cascade hit a failed test and never retried drop on dead-end-recovery. This is what makes “was Cascade making progress” answerable without reading 40K lines of trace.
Cluster. Low-scoring sessions get clustered by failure mode, “Cascade called Opus for a file-classification turn Haiku could have done”, “Cascade looped on the same handler three times before giving up”. Each cluster becomes a candidate optimisation.
Optimize. fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the system prompt or adjusts the routing policy against the cluster. The routine Cascade optimisation is the routing rule: under 8K tokens to claude-haiku-4-5, 8K-50K to claude-sonnet-4-6, synthesis to claude-opus-4-7.
Route + Re-deploy. The gateway applies the updated policy on the next Cascade request, versioned with automatic rollback on regression. The metric to watch is cost per successful task, the only number that matters once trajectory scoring is in place. Teams starting at $30K to $50K per month typically see cost-per-successful-task trend down 20 to 35% within six weeks without developer-behaviour change.

Three building blocks are open source:

traceAI, github.com/future-agi/traceAI (Apache 2.0)
ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
agent-opt, github.com/future-agi/agent-opt (Apache 2.0)

The hosted Agent Command Center adds the trajectory cluster view, live Protect guardrails (the arXiv 2510.13351 benchmark measures Protect at ~65 ms text and 107 ms image, low enough to run inline on every Cascade tool call), RBAC, SOC 2 Type II certified, and AWS Marketplace listing.

What we did not include

Three gateways that show up in other 2026 listicles were deliberately left out:

Helicone. Strong for chat-style copilot per-request observability, but the long-session trace UI struggles with 4-hour Cascade trajectories.
Cloudflare AI Gateway. Strong primitives, but the Cascade-specific integration story is thin as of May 2026; worker-based observability doesn’t yet do per-task slicing without custom code.
OpenRouter. Fantastic for model exploration, wrong shape for an enterprise Cascade chargeback story.

All three are worth a second look in Q3 2026 as the autonomous-agent observability story matures.

Sources

Windsurf documentation, windsurf.com/docs/cascade
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Future AGI traceAI, github.com/future-agi/traceAI
Future AGI ai-evaluation, github.com/future-agi/ai-evaluation
Future AGI agent-opt, github.com/future-agi/agent-opt
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
Portkey AI gateway, portkey.ai
Kong AI Gateway, konghq.com/products/kong-ai-gateway
LiteLLM proxy, github.com/BerriAI/litellm
Maxim Bifrost, getmaxim.ai/bifrost

Frequently asked questions

What is the cheapest way to monitor Windsurf Cascade Mode token usage?

LiteLLM's open-source proxy or Future AGI's free tier (100K traces per month). Both give per-request cost. Per-task chargeback requires wiring the Cascade wrapper to emit a stable task ID header.

Does Windsurf Cascade support OpenAI-compatible endpoints?

Cascade defaults to Anthropic and speaks the Anthropic API natively. Windsurf's custom-endpoint configuration also accepts OpenAI-compatible URLs. All five gateways here support pointing Cascade at them.

Can I route Cascade through multiple model providers?

With care. Cascade is tuned for Claude models and its tool-use loop is calibrated against Anthropic's `tool_use` block shape. Routing to non-Claude providers often degrades tool-call reliability. The safe pattern is to route between Haiku/Sonnet/Opus by token budget, not to swap providers wholesale.

How do I track Cascade cost per task, not just per developer?

Use a gateway with per-task metadata pass-through (Future AGI and Maxim Bifrost natively; Portkey, Kong, and LiteLLM via metadata headers). The Cascade wrapper must emit a stable task ID for the task's lifetime. Without it, the per-task picture is lost.

What happens to tool calls when Cascade runs through a gateway?

All five gateways pass tool-use blocks through intact as of May 2026. The thing to watch is a gateway that re-serialises tool-use blocks as text — that breaks Cascade silently. The five here have been tested with `claude-opus-4-7` plus tool-use plus MCP servers.

Is it safe to send source code through an AI gateway?

For hosted gateways the data flow is gateway then Anthropic; both endpoints already see the code. If compliance forbids both, the safe pick is self-hosted LiteLLM, self-hosted Kong, or Future AGI BYOC.

How is Future AGI Agent Command Center different from Portkey for Cascade?

Portkey is a hosted observation and per-key-budget layer; it shows cost and caps by developer. Future AGI adds two layers Portkey does not have: trajectory scoring (`fi.evals.TrajectoryScore`) and the optimisation loop (trace data feeds back into prompt rewrites and routing-policy updates). Portkey gives you a dashboard; Future AGI gives you a dashboard, a metric, and a loop that improves the metric.

View all

Guides

LLM Eval with Shadow Traffic and Canary Deployment in 2026

Shadow is not canary. Mirror routing with no user effect vs percentage routing with rollback. Score-attached traffic, ACC patterns, gotchas.

Rishav Hada · May 21, 2026

12 min

Guides

Evaluating Azure OpenAI LLM Apps in 2026

Azure OpenAI eval has three Azure-specific axes: deployment-name drift, region-pinning, and Content Safety precision on benign queries. Here's the pattern.

Vrinda Damani · May 20, 2026

12 min

Guides

Evaluating AWS Bedrock Agents in 2026

Bedrock's built-in eval is dev-loop only. Score action-group correctness, KB retrieval quality, and guardrail precision/recall on every release.

Rishav Hada · May 19, 2026

11 min

TL;DR

Why Windsurf Cascade Mode needs a gateway in front of it

The 7 axes we score on

How we picked

1. Future AGI Agent Command Center: Best for trajectory scoring + the cost feedback loop

2. Portkey: Best for hosted gateway with mature per-key budgets

3. Kong AI Gateway: Best for AI on top of existing Kong infrastructure

4. LiteLLM: Best for self-hosted Python-native routing

5. Maxim Bifrost: Best for high-throughput Cascade fleets

Capability matrix

Decision framework: Choose X if

Common mistakes when wiring Windsurf Cascade through a gateway

How Future AGI closes the loop on Cascade spend

What we did not include

Related reading

Sources

Frequently asked questions