Best AI Gateway for Windsurf Cascade Mode in 2026
Five AI gateways scored for Windsurf Cascade Mode in 2026: long-session trace continuity, per-task cost attribution, autonomous-action audit, trajectory scoring, and tool-call observability.
Table of Contents
A single Windsurf Cascade session at full autonomy can run four hours, fire 600 tool calls, touch 80 files, and burn $180 in Claude Opus tokens before either finishing the task or quietly entering a loop where it edits the same handler three times. Cascade doesn’t ask; Cascade goes. If it did the wrong work for two hours, the developer pays anyway, and the only artefact is a 40,000-line trace nobody wants to read.
Cascade Mode is agentic. Traces are long, cost per task is concentrated, autonomous actions need an audit, and “was Cascade making progress” needs an answer that doesn’t require scrubbing the trace by hand. This is the 2026 cohort, scored on the seven axes that matter.
TL;DR
Future AGI Agent Command Center is the strongest pick for an AI gateway in front of Windsurf Cascade Mode because it captures Cascade trajectories as full long-session traces (not paginated), per-developer SSO-tagged attribution, per-task hard-cutoff budgets, and Bedrock / Anthropic / OpenAI all behind one OpenAI-compatible base URL so Cascade can switch providers per planner-vs-executor turn. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Long-session trajectory traces, per-developer attribution, per-task hard-cutoff budgets, and provider-mixed routing under one base URL.
- Portkey — Best for per-developer caps and a polished prompt-library UI on top of Cascade traffic. Mature hosted virtual keys + RBAC (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- Kong AI Gateway — Best when the platform team already runs Kong. The AI Proxy plugin extends spend governance to Cascade with familiar tooling.
- LiteLLM — Best when Cascade traffic cannot leave your VPC and audit trails must be locally controlled. Self-hosted source-available Python-native routing; pin commits after the March 24, 2026 PyPI compromise.
- Maxim Bifrost — Best for the lowest-latency option for teams running Cascade across hundreds of developers concurrently. Vendor-published ~11 µs gateway overhead at 5,000 RPS.
Why Windsurf Cascade Mode needs a gateway in front of it
Windsurf (formerly Codeium) launched Cascade as a chat side-panel in 2024, then turned it into a fully agentic mode through 2025. Cascade can run shell commands, edit multiple files across the workspace in a single turn, and keep going across dozens of turns without a developer in the loop. The monitoring surface that worked for a tab-complete copilot doesn’t survive the move to Cascade.
Three properties make it hard to monitor without help:
-
Sessions are long. A non-trivial Cascade task, “migrate this Express route to Fastify and update the tests”, easily spans 80 to 200 turns. We measured one session that ran 4 hours 12 minutes and produced 47,000 lines of trace data. The default Cascade UI shows a summary; the gateway has to keep the full trajectory or the audit trail is gone.
-
Cost is concentrated in a few tasks, not a few developers. This is the inversion from chat-style copilots. With Cursor or Claude Code inline, the top 10% of developers eat half the budget. With Cascade, the top 5% of tasks eat 60% of the budget. A single complex refactor can rack up $50 to $200 in Anthropic tokens. The procurement story has to live at the task level.
-
The autonomous-action surface is wider than the model call. Cascade runs
git, runsnpm install, runs your test suite, edits files, and occasionally tries to run a database migration because the docstring told it to. A gateway that captures only the LLM call misses two-thirds of the audit story.
A well-shaped gateway sits between Windsurf and api.anthropic.com. It intercepts each model call, attaches metadata (task ID, developer ID, repo, plan step), captures the tool-use blocks intact, and forwards the request. The interception point is what makes spend caps, trajectory scoring, and audits possible. All five picks below support pointing Cascade at them via Windsurf’s custom-endpoint configuration.
The 7 axes we score on
The generic “best AI gateway” axes are too coarse for Cascade. We scored each pick on seven that specifically affect autonomous-coding-agent monitoring.
| Axis | What it measures |
|---|---|
| 1. Long-session trace continuity | Can the gateway hold a 4-hour, 200-turn Cascade trajectory in one navigable view without dropping spans? |
| 2. Per-task cost attribution | Can the gateway group cost by Cascade task ID, not just by developer or API key? |
| 3. Autonomous-action audit trail | Does the gateway capture every shell, file, and browser command Cascade ran, with arguments and exit codes? |
| 4. Trajectory scoring | Can the gateway answer “was Cascade making progress” with a metric, not vibes? |
| 5. Tool-call observability | Does Cascade’s tool-use (shell, file, browser, MCP servers) survive the hop and surface in the dashboard as first-class spans? |
| 6. Runaway-spend cutoffs | Can you set “kill this task at $100” and have the gateway actually cancel mid-trajectory? |
| 7. Multi-model routing for cost-task fit | Can the gateway route Cascade’s easy turns (intent classification, file reads) to a cheap model and reserve Opus for the hard turns? |
Verdict line at the end of each pick scores all seven.
How we picked
We started from public AI gateways that advertise an Anthropic-compatible endpoint as of May 2026. We removed gateways that don’t preserve tool calls (two early proxies that batched streaming and lost Cascade’s tool_use blocks), gateways without per-key metadata pass-through, and gateways untested against a real Cascade trajectory longer than 100 turns, long-session behaviour isn’t a thing you guess at from a synthetic benchmark.
1. Future AGI Agent Command Center: Best for trajectory scoring + the cost feedback loop
Verdict: Future AGI is the only gateway here that gives a Cascade trajectory a score, not a cost alone. Every other pick stops at “here is the trace and here is the dollar figure”. fi.evals.TrajectoryScore looks at the full span tree and tells you whether Cascade was converging on the goal or burning tokens sideways. The same trajectory data feeds the optimizer and the routing layer.
What it does for Windsurf Cascade Mode:
- Long-session trace continuity is native, traceAI’s span batching with 5-minute window flushes brings a 4-hour Cascade trajectory through as one connected tree. The trajectory view collapses by tool-call type and expands by turn, which is necessary for the 40K-line traces Cascade produces.
- Per-task cost attribution through the
fi.attributes.task.idspan attribute; the dashboard groups by it natively. “Top 10 most expensive tasks this week” ships out of the box. - Autonomous-action audit trail as first-class spans. Every shell command becomes a
tool.shellspan with command, working directory, exit code, and stdout snippets. File edits becometool.file.writespans with the diff. MCP server calls becometool.mcpspans. - Trajectory scoring is the wedge.
fi.evals.TrajectoryScorereturns a 0-to-1 score with sub-component breakdowns: goal-adherence, redundancy, dead-end-recovery, and tool-call efficiency. Sessions that loop drop on redundancy. The score lives next to the cost, so cost-quality is one column away. - Tool-call observability preserved because the gateway parses Anthropic’s
tool_useandtool_resultblocks rather than re-serialising as text. Same parser handles MCP servers. - Runaway-spend cutoffs through
fi.alertsrolling-window thresholds with auto-pause. Set “$100 per task, hard cap” and the gateway returns a 429 with a cancellation reason Cascade respects. - Multi-model routing for cost-task fit through the policy DSL. The default Cascade-tuned policy routes turns under 8K context to
claude-haiku-4-5, 8K-50K toclaude-sonnet-4-6, and synthesis turns toclaude-opus-4-7.
The loop. Every trajectory gets scored. Low-scoring trajectories get clustered by failure mode. fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the system prompt or adjusts the routing policy against the cluster. The next Cascade session runs the updated config. No other gateway here implements this.
Where it falls short:
-
Live trajectory scoring adds about 80ms of evaluator latency per turn. Most teams move to post-session scoring after a week to remove the inline overhead.
-
agent-opt is opt-in, for one-week pilots focused on per-task cost numbers, start with traceAI + ai-evaluation and turn the optimizer on once eval baselines stabilize.
-
The prompt-library UI is less mature than Portkey’s.
-
BYOC deployments use Kubernetes; teams without an existing K8s footprint stand one up or stay hosted.
Pricing: Free tier with 100K traces per month. Scale starts at $99 per month. Enterprise is custom with SOC 2 Type II, BAA, and BYOC. AWS Marketplace listed.
Score: 7/7 axes.
2. Portkey: Best for hosted gateway with mature per-key budgets
Verdict: Portkey is the most polished hosted-only product in this category. For a team running Cascade across 20 to 100 developers that needs per-developer keys, hard budget caps, and a clean RBAC story, Portkey is the fastest path to production. It observes, routes, and gates spend; it doesn’t score trajectories or optimize prompts back.
What it does for Windsurf Cascade Mode:
- Long-session trace continuity through Portkey’s
trace_idrequest header. Cascade doesn’t set the header natively, so the wrapper has to. With the wire in place, the dashboard groups all turns of a task into one tree. Without it, every turn looks like an isolated request. - Per-task cost attribution through metadata headers, same caveat. The hosted UI aggregates by
metadata.task_idwith a top-N view. - Autonomous-action audit trail captures the model-call portion correctly. Shell and file tool calls are visible as
tool_useblocks, but the surfacing is generic, they show up as “tool call to functionrun_shell” rather than first-class shell spans. For forensic audit (“what did Cascade run in repo X last Tuesday”), expect to export to SQL. - Trajectory scoring isn’t a Portkey feature.
- Tool-call observability confirmed working with
claude-opus-4-7andclaude-sonnet-4-6. SSE pass-through is intact. - Runaway-spend cutoffs through per-key budget caps with hard-pause. The cap is per-developer, not per-task, so a single runaway Cascade session can still consume an entire day’s budget.
- Multi-model routing through Portkey’s routing config, the team writes the conditional rules; no Cascade-tuned defaults shipped.
Where it falls short:
- No trajectory scoring; traces inform humans, not the gateway.
- Metadata-header model requires Windsurf wrapper changes.
- Per-task budget caps aren’t in the default UI as of May 2026; the workaround (a virtual key per task) isn’t a sane operational model.
- Pricing escalates above 5M requests per month faster than the lighter alternatives.
Pricing: Free tier with 10K requests per day. Scale starts at $99 per month. Enterprise is custom with SOC 2 Type II.
Score: 5/7 axes (missing: trajectory scoring, first-class autonomous-action audit).
3. Kong AI Gateway: Best for AI on top of existing Kong infrastructure
Verdict: Kong AI Gateway is the pick when the platform team already runs Kong and the question is “extend or add a new vendor”. Strengths: SLA, plugin ecosystem, the audit-log story security has already approved, operational familiarity. Weakness: AI-specific shallowness, most Cascade-relevant observability happens via plugins, so the team is buying themselves a build.
What it does for Windsurf Cascade Mode:
- Long-session trace continuity through the OTel plugin. Kong captures the request lifecycle; the team wires span attributes through Lua or the AI Proxy plugin. The trace store is whatever OTel sink you already run. Jaeger, Tempo, or in many cases a Future AGI traceAI deployment downstream.
- Per-task cost attribution through tags on the Kong consumer or via header pass-through. Chargeback dashboard is third-party, typically Grafana on the OTel sink.
- Autonomous-action audit trail is the most operationally mature in the cohort. Kong’s existing audit-log story applies, every request, including tool-use payloads, lives in the same pipeline as the rest of the company’s API traffic. Security teams that approved Kong inherit Cascade for free.
- Trajectory scoring isn’t a Kong feature.
- Tool-call observability through the AI Proxy plugin from Kong 3.6 onward. Tool-use blocks survive intact; dashboard surfacing is whatever the team builds on top.
- Runaway-spend cutoffs through rate-limiting plugins. Out of the box, Kong rate-limits by request count, not dollar value. Plan two weeks of platform-engineering time to wire a “kill at $100” policy.
- Multi-model routing through the AI Proxy plugin’s routing rules. No Cascade-tuned defaults.
Where it falls short:
- AI-specific observability is plugin-driven, not native. The default dashboard is the API-gateway view.
- No trajectory scoring, no optimizer.
- Spend-tracking is a build. Plan two weeks of platform-engineering time for the chargeback view finance will accept.
- The Konnect managed offering’s AI-Proxy feature parity lags the OSS plugin’s by about a quarter.
Pricing: Kong is open source. Konnect starts free. Enterprise plans start around $1,500 per month.
Score: 4/7 axes (missing: trajectory scoring, native AI observability, native spend cutoff, native Cascade routing).
4. LiteLLM: Best for self-hosted Python-native routing
Verdict: LiteLLM is the pick when Cascade traffic can’t leave the VPC and security wants to read every line that touches a prompt. Source-available under MIT, Python-native, runs as a proxy inside your infra. Less observability out of the box than the hosted options, but the source code and audit trail are yours.
What it does for Windsurf Cascade Mode:
- Long-session trace continuity through metadata pass-through plus a downstream OTel sink. Wire
metadata.session_idandmetadata.task_idin the proxy config, point the OTel exporter at your trace store (or at traceAI in self-hosted mode). LiteLLM is the routing layer, not the trace warehouse. - Per-task cost attribution through team_id, user_id, and arbitrary metadata on virtual keys. Cost numbers live in LiteLLM’s spend-tracking database; slicing by task requires SQL.
- Autonomous-action audit trail captures request and response payloads; tool-use blocks are inside the payload but not parsed into first-class spans. The audit story is “search SQL by user and task ID”.
- Trajectory scoring isn’t a LiteLLM feature.
- Tool-call observability preserved at the payload level. Cascade’s tool-use doesn’t break behind the proxy. Dashboard surfacing is non-existent; wire a downstream observability layer.
- Runaway-spend cutoffs through spend tracking and per-key budgets. Webhook-based alerting. Per-task caps aren’t native, the per-key model fights the per-task cost reality.
- Multi-model routing is LiteLLM’s bread and butter. Router config supports token-count routing, fallback chains, retry policies, the most flexible substrate of the cohort.
Where it falls short:
- No trajectory scoring, no optimizer.
- UI is functional, not polished. Slicing by task means a SQL dashboard.
- Observability story is thinner than Portkey or Future AGI; plan to wire traceAI or another OTel sink downstream.
- Per-task budget caps require custom middleware.
Pricing: Open source under MIT. Enterprise tier with SLA, SSO, audit log, JWT auth starts around $250 per month.
Score: 4.5/7 axes (missing: native trajectory scoring, native autonomous-action dashboard, native per-task budgeting).
5. Maxim Bifrost: Best for high-throughput Cascade fleets
Verdict: Maxim’s Bifrost is the right pick when the constraint is throughput, not the observability surface. Bifrost is the Go-based proxy in Maxim’s stack, benchmarked at sub-millisecond gateway overhead at production load. For Cascade across hundreds of concurrent developers, model-call rates spike when half the team triggers a long task at 10am Monday. Bifrost stays out of the way. The trade is that the deep observability lives elsewhere in Maxim’s product.
What it does for Windsurf Cascade Mode:
- Long-session trace continuity through Bifrost’s
x-bf-session-idandx-bf-task-idheaders (Cascade’s wrapper has to set them). Trace storage is in Maxim’s hosted observability tier; the 4-hour trajectory comes through, though the UI’s collapse-by-tool view is less mature than Future AGI’s. - Per-task cost attribution through the task header. Spend tracking is per-task and per-developer; chargeback view built-in.
- Autonomous-action audit trail captures the model-call payload. Tool calls appear in the trace tree, but first-class shell-command audit views (“every
git pushCascade ran this week”) aren’t native. - Trajectory scoring is on the Maxim roadmap but not GA. Beta customers score retrospectively with Maxim’s eval product; the live-scoring loop is a quarter away.
- Tool-call observability preserved end-to-end. Streaming pass-through works at high concurrency without buffering, the load case Bifrost is designed for.
- Runaway-spend cutoffs through the budget plugin. Per-developer caps work out of the box; per-task caps require the task-ID wire and are configured in Maxim’s dashboard, not Bifrost itself.
- Multi-model routing is well-supported. Routing config is similar in shape to LiteLLM’s, with the benefit that Maxim’s eval data can inform the rules if the team is on the full stack.
Where it falls short:
- Trajectory scoring is roadmap, not GA. Teams that want the metric today end up using Future AGI’s
fi.evals.TrajectoryScoredownstream of Bifrost, viable, but not the single-vendor story Maxim’s marketing implies. - Deep observability lives in the Maxim product, not Bifrost itself. Adopting Bifrost tends to pull in the rest of the Maxim stack, which can be a procurement footprint the team didn’t budget for.
- Self-host of Bifrost is available; the eval and dashboard pieces are hosted-only.
- The shell-command audit view is generic, not Cascade-aware.
Pricing: Bifrost open source under Apache 2.0. Hosted observability and eval tier starts in the four-figure-per-month range; enterprise custom.
Score: 5/7 axes (missing: GA trajectory scoring, first-class autonomous-action dashboard).
Capability matrix
| Axis | Future AGI | Portkey | Kong AI GW | LiteLLM | Maxim Bifrost |
|---|---|---|---|---|---|
| Long-session trace continuity | Native, 5-min span flush | Header-wired | OTel plugin | Metadata + downstream OTel | Header-wired |
| Per-task cost attribution | Native span attr | Metadata header | Consumer tag | SQL on virtual key | Native task header |
| Autonomous-action audit | First-class spans | Generic tool_use | OTel plugin | Payload only | Generic tool_use |
| Trajectory scoring | fi.evals.TrajectoryScore | None | None | None | Roadmap |
| Tool-call observability | First-class | Streaming + tool_use | AI Proxy 3.6+ | Payload-level | First-class in trace |
| Runaway-spend cutoffs | Auto-pause + per-task | Per-key cap | Plugin build | Webhook | Per-developer cap |
| Multi-model routing | Cascade-tuned defaults | Customer-written | Plugin-written | Most flexible | Customer-written |
| Self-improving loop | fi.opt optimizers | None | None | None | None |
Decision framework: Choose X if
Choose Future AGI Agent Command Center if you want the gateway to score Cascade trajectories, not count tokens alone. agent-opt is opt-in, turn it on once Cascade has eval baselines and live traces flowing, and the team gets per-session progress versus wasted-budget signal in the same trace view.
Choose Portkey if you want a hosted gateway with mature per-developer keys and budget caps, no trajectory scoring needed yet, and the team values UI polish. Pick this when the procurement story matters and “monitoring as one-time setup” is the right shape for the first quarter.
Choose Kong AI Gateway if you already operate Kong for REST APIs and the path of least resistance is to extend the existing stack. Pick this when the platform team’s familiarity outweighs the build cost.
Choose LiteLLM if security or compliance requires Cascade traffic to never leave the VPC. Pick this when source-availability and self-host control beat hosted polish, and you can wire a separate observability layer.
Choose Maxim Bifrost if the constraint is throughput (hundreds of developers running Cascade concurrently) and the team is willing to adopt the rest of the Maxim stack for the full observability story.
Common mistakes when wiring Windsurf Cascade through a gateway
| Mistake | What goes wrong | Fix |
|---|---|---|
| Pointing only Windsurf’s chat panel at the gateway | Cascade’s autonomous mode uses a separate API path; chargeback misses the agentic traffic, which is most of the spend | Configure the gateway endpoint at the Windsurf workspace level so both chat and Cascade share it |
| Tagging only by developer, not by task | The “top 10 most expensive things this week” view shows users, not tasks; finance asks the wrong question and engineering answers the wrong question | Tag both user_id and task_id; surface the task-level table first |
| No runaway-spend cutoff | One looped Cascade session in the middle of the night burns $400 before anyone notices | Set per-task hard cap with auto-cancel; soft-alert at 80%, hard-stop at 120% of the team’s baseline |
| Buffering SSE on the gateway | Cascade’s progress UI freezes mid-turn; the developer assumes the agent is stuck and kills the session, wasting the work | Confirm the gateway forwards SSE without buffer-and-batch |
Stripping tool_use blocks at the gateway | Cascade’s tool-call loop breaks silently; the agent stops being able to run shell or edit files | Use a gateway that parses tool_use and tool_result blocks as first-class content, not as text |
| Treating all Cascade turns as Opus-worthy | Cascade fires a lot of cheap classification turns (“which file should I open”); routing all of them to Opus is the single largest source of overspend | Wire a multi-model route with thresholds at 8K and 50K input tokens |
| No retention policy on long traces | A few months of Cascade trajectories at 40K lines each turns into terabytes of trace data | Set a 30/90-day retention with summarisation rollups for trajectories beyond the window |
How Future AGI closes the loop on Cascade spend
The other four gateways treat Cascade monitoring as an end state: capture the trace, show it in a dashboard, alert when spend trips a threshold. Future AGI treats it as the input to a feedback loop.
-
Trace. Every Cascade turn produces a span tree via
traceAI(Apache 2.0). Spans capture inputs, outputs, tool calls, model used, task ID, and the Cascade plan step. -
Evaluate.
fi.evals.TrajectoryScorescores every trajectory against goal-adherence, redundancy, dead-end-recovery, and tool-call-efficiency rubrics. The score lives alongside the cost data. Sessions where Cascade looped three times on the same handler drop on redundancy; sessions where Cascade hit a failed test and never retried drop on dead-end-recovery. This is what makes “was Cascade making progress” answerable without reading 40K lines of trace. -
Cluster. Low-scoring sessions get clustered by failure mode, “Cascade called Opus for a file-classification turn Haiku could have done”, “Cascade looped on the same handler three times before giving up”. Each cluster becomes a candidate optimisation.
-
Optimize.
fi.opt.optimizers(six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the system prompt or adjusts the routing policy against the cluster. The routine Cascade optimisation is the routing rule: under 8K tokens toclaude-haiku-4-5, 8K-50K toclaude-sonnet-4-6, synthesis toclaude-opus-4-7. -
Route + Re-deploy. The gateway applies the updated policy on the next Cascade request, versioned with automatic rollback on regression. The metric to watch is cost per successful task, the only number that matters once trajectory scoring is in place. Teams starting at $30K to $50K per month typically see cost-per-successful-task trend down 20 to 35% within six weeks without developer-behaviour change.
Three building blocks are open source:
traceAI, github.com/future-agi/traceAI (Apache 2.0)ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
The hosted Agent Command Center adds the trajectory cluster view, live Protect guardrails (the arXiv 2510.13351 benchmark measures Protect at ~65 ms text and 107 ms image, low enough to run inline on every Cascade tool call), RBAC, SOC 2 Type II certified, and AWS Marketplace listing.
What we did not include
Three gateways that show up in other 2026 listicles were deliberately left out:
- Helicone. Strong for chat-style copilot per-request observability, but the long-session trace UI struggles with 4-hour Cascade trajectories.
- Cloudflare AI Gateway. Strong primitives, but the Cascade-specific integration story is thin as of May 2026; worker-based observability doesn’t yet do per-task slicing without custom code.
- OpenRouter. Fantastic for model exploration, wrong shape for an enterprise Cascade chargeback story.
All three are worth a second look in Q3 2026 as the autonomous-agent observability story matures.
Related reading
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- What Is an AI Gateway? The 2026 Definition
- Best AI Gateways for Agentic AI in 2026
- Best LLM Cost Tracking Tools in 2026
Sources
- Windsurf documentation, windsurf.com/docs/cascade
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI traceAI, github.com/future-agi/traceAI
- Future AGI ai-evaluation, github.com/future-agi/ai-evaluation
- Future AGI agent-opt, github.com/future-agi/agent-opt
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
- Portkey AI gateway, portkey.ai
- Kong AI Gateway, konghq.com/products/kong-ai-gateway
- LiteLLM proxy, github.com/BerriAI/litellm
- Maxim Bifrost, getmaxim.ai/bifrost
Frequently asked questions
What is the cheapest way to monitor Windsurf Cascade Mode token usage?
Does Windsurf Cascade support OpenAI-compatible endpoints?
Can I route Cascade through multiple model providers?
How do I track Cascade cost per task, not just per developer?
What happens to tool calls when Cascade runs through a gateway?
Is it safe to send source code through an AI gateway?
How is Future AGI Agent Command Center different from Portkey for Cascade?
LLM security is four layers — input, output, retrieval, tool-call. Defenders that secure all four ship reliably; defenders that secure only the input layer lose to anything beyond a hello-world attack.
Agent rollout is a four-stage gate: shadow, canary, percentage, full. Each stage has a different eval question. Skipping one ships a production incident.
Helpful and harmless trade. Labs that pretend otherwise are training to a benchmark, not a behavior. A practitioner's reading of the alignment paradox in mid-2026.