Home / Changelog / 2026 Week 10

Feb 17 – Mar 2, 2026 2026 W10

Agent Command Center, Agent Playground, and ClickHouse Migration

Agent Command Center: routing, guardrails, fallbacks, per-key cost controls. Agent Playground: visual multi-step graph builder. Plus ClickHouse migration.

Platform Guard Agents Monitor Evaluate API

15+ LLM providers

6 load balancing strategies

2 data regions

What's in this digest

ClickHouse migration for trace storage

Dataset and simulation analytics

Guard Improved

Real-time updates for key revocation

API Improved

Per-key authentication for Agent Command Center

Agent Command Center: The Gateway Behind Every Scalable AI Application

Scaling AI in production means scaling the layer in front of every model call. Agent Command Center is that layer: a single API that sits between your application and every LLM provider (OpenAI, Anthropic, Google, and 15+ others) and handles the things every production AI app eventually needs:

Routing. Picking which provider handles each request.
Reliability. Retries, fallbacks, and provider failover when one is slow or down.
Guardrails. Content, PII, and output checks that run on every request and response.
Cost and usage controls. Per-team and per-key budgets, alerts, and live cost tracking.

A single API. A single dashboard. The control plane every scalable AI application eventually needs, ready on day one.

What’s new

Drop-in replacement for the OpenAI API. Point your existing OpenAI client at Agent Command Center and your code keeps working. No SDK changes, no rewrites, just a new base URL.
15+ providers under one endpoint. OpenAI, Anthropic, Google, Azure, and more, all reachable from the same API call.
6 routing strategies. Send each request to the cost-optimized provider, the fastest, the least-latency one, round-robin across providers, weighted by your priorities, or adaptive based on live signal. Pick one, or combine several for different routes.
Guardrails that run inline. Content policies, PII detection, and output validation execute before a response leaves the gateway, so a blocked output never reaches your application.
Automatic fallbacks. If a provider errors out or gets too slow, the request reroutes to a backup model automatically. Your app sees a successful response; the failover is invisible.
Per-API-key budgets. Give each team or workload its own key, each with a monthly spend limit. Owners get alerted as a key nears its budget. Past the limit, requests are rejected, so there are no surprise invoices.
One live dashboard. Request volume, response times (median and tail), error rates, and cost per provider, updated in real time. No CSV exports or external tools needed.

Why it matters

Every team that scales AI in production ends up needing the same things: routing, fallbacks, guardrails, cost tracking, and a single place to see what’s happening. Agent Command Center ships those as one product so you don’t build, maintain, and quietly inherit a home-grown version of each. Governance, reliability, cost controls, and observability are built in from day one, not retrofitted after the first incident.

Who it’s for

Teams running more than one LLM in production.
Orgs where different teams use different models, each with different budgets and different tolerance for failure.
Engineers whose finance or security teams have been asking for cost visibility or policy enforcement, who can now point them at a dashboard instead of exporting CSVs.

Read the docs →

Agent Playground: Build Agents Without Writing Agent Code

Building a multi-step AI agent has had two paths until now, and neither is comfortable:

Adopt an orchestration framework (LangGraph, LlamaIndex) and spend your first week learning its state model.
Write the glue between LLM calls by hand and spend the next month chasing bugs in state that only exists in process memory.

Agent Playground is the third path. Design the graph, configure each step, publish, all from the browser, with no framework sitting between you and your logic.

What’s new

Visual canvas editor. Drop a node, drop another, draw an edge between them.
Two node types.
LLM prompt node. A model call with its configured prompt and parameters.
Agent node. A sub-graph that runs as a single unit, so you can compose a multi-step agent inside another agent.
Per-node configuration. Click any node; a side panel opens with its full setup.
Global variables. Share data across the whole graph instead of passing it through every connection by hand.
Typed input/output ports. Each node exposes ports with defined types (text, structured data, tool results, image content). Draw connections between compatible ports to chain retrieval, generation, evaluation, and conditional logic into multi-step pipelines.
Design-time validation. The editor blocks type mismatches, circular dependencies, and unexitable loops while you build, not at 2 a.m. when requests start hanging in production.
Live port labels and execution traces. Port labels update in real time as data flows. When you run the graph, traces highlight the active path so you can see exactly how data moved through your agent.
Workflow execution control. Monitor, pause, resume, or cancel runs directly from the Playground.
Version management. Create, browse, compare, restore, and activate named versions with one-click switching, with auto-save and conflict resolution.
Templates and drafts. Start blank when the agent is clear in your head, from a template when it isn’t. Iterate on a draft while the published version keeps serving traffic. Promotion is one click.
Programmatic graph API. APIs for node connections, ports, and edge mappings, for teams that generate agents from code or keep configurations in version control.
Editor errors that name the failing component. Cache invalidation runs after every state change, recovery flows let you keep editing in place.

Why it matters

Agent orchestration code is hard to debug and hard to review. Replacing it with a graph you can see (and validate before you ship) turns one of the trickier parts of an AI stack into something a non-engineer can read, a reviewer can sign off on, and a new hire can understand on day one.

Who it’s for

Teams carrying agent orchestration code as technical debt.
Teams standing up their first agent who would rather not learn a framework before their first demo.
Product managers and designers who want to propose agent logic without waiting on engineering to wire it up.

Read the docs →

ClickHouse Migration: Trace Queries at a Different Scale

Trace storage has moved from PostgreSQL to ClickHouse. (A trace is the step-by-step recording of an AI call or agent run, with every model call, retrieval, and tool invocation captured in order.) For teams ingesting thousands of traces per minute, the change is order-of-magnitude: complex queries go from seconds to milliseconds, and aggregations across millions of spans are fast enough to feel interactive when you are chasing a production regression.

The migration was executed with zero downtime and full data continuity. Every historical trace is queryable through the same interfaces you already use.

Annotation Queue: Structured Human Review

The annotation queue introduces a formal workflow for human review across every data type in the platform. Create a queue, assign reviewers, add items (traces, sessions, dataset rows, or simulation outputs), and track completion. Each queue keeps its own review criteria and progress metrics, so different workflows don’t share state.

This fills the gap between automated evaluation and human judgment. When an LLM judge (one LLM scoring another’s output against criteria) flags something borderline, route it to an annotation queue instead of forcing a binary pass/fail. The annotations feed back into your evaluation pipeline, improving automated scoring over time.

Multi-Region and Security

Multi-region support is live with US and EU availability. Pick a region at workspace creation and all traces, evaluations, datasets, and simulation results stay within it. Useful for GDPR, data residency requirements, and internal policies that mandate geographic boundaries.

Instant API key revocation via real-time updates closes the window between revoking a key and enforcement. When you revoke a key from the dashboard or API, every Agent Command Center replica sees the event within milliseconds, and the auth middleware validates it against every incoming request.

Also

Dataset and simulation analytics API. A unified endpoint covering dataset quality metrics and simulation result trends, so analytics dashboards can pull both from one place.

Deep Space dark mode polish. Follow-up theme polish on the Deep Space migration: refined contrast, fixed rendering issues, and final visual consistency pass.

Voice call filter. Filter voice observability traces by call-specific attributes.

Chat simulation runs on a dedicated path. Chat simulation now runs on its own dedicated runtime, decoupled from the voice infrastructure stack. Chat-only workspaces get a leaner runtime and independent reliability.

Optimisation stop flow. Dataset optimisation runs can now be stopped mid-flight without leaving the workspace in a half-state. Useful when an optimiser is clearly heading the wrong direction and you want to abort before it finishes.

Older

ai-evaluation 1.0, Deep Space Theme, Multi-Language SDKs, and Multimodal Workbench

Newer

Custom Dashboards, MCP Server, 2FA with Passkeys, and Annotation Queues

All changelog entries