Production monitoring without alerting is just logging with extra steps. You can have the most detailed traces (end-to-end records of how your agent handled each request) and evaluations (tests that score agent outputs against criteria you define) in the world — but if nobody sees a problem until a user complains, you’re always reacting instead of preventing it.

Alerts and monitors close that gap. Define thresholds on any evaluation metric, trace property, or system measurement, and get notified the moment something crosses the line.

What’s new

Pick the metric. Hallucination rate, average latency, error count, evaluation score, or any custom metric you track.
Set the threshold and condition. Above, below, or percentage change relative to a baseline.
Choose the channel. Slack for team visibility, email for individual alerts. The monitor checks your metric at configurable intervals and fires when the condition is met.
Actionable context in every alert. The notification includes the metric value that triggered it, a link to the relevant traces or evaluations, and a sparkline of recent metric history — so you can see whether it’s a spike or a trend.

Why it matters

For teams running agents in production, alerting is the difference between catching a regression in minutes and hearing about it from a customer in hours. Set an alert when hallucination rates exceed 5%, when p95 latency crosses 3 seconds, or when error rates spike above baseline.

Who it’s for

MLOps and platform engineering teams responsible for agent uptime and quality in production, and quality assurance (QA) teams setting up automated quality gates on live traffic.

Read the docs →

gRPC Trace Ingestion — 60% Less Latency

Every millisecond of overhead in your trace ingestion pipeline is a millisecond added to your application’s request path. HTTP/JSON has been the transport since day one and works well, but for high-throughput production deployments, the serialization overhead and connection management add up.

What’s new

Protocol Buffers replace JSON for serialization — smaller payloads, faster encoding.
HTTP/2 multiplexing eliminates connection overhead for concurrent trace submissions.
60% lower latency for trace ingestion compared to the HTTP/JSON path.
Supported in Python and TypeScript SDKs. Switch with a single configuration change:

from traceai import configure
configure(transport="grpc")

Why it matters

HTTP/JSON remains the default and is fully supported. gRPC is recommended for production deployments processing more than 1,000 traces per minute, where the cumulative latency savings add up into a meaningful budget.

Who it’s for

Platform and infrastructure teams running high-volume trace ingestion in production — especially on latency-sensitive request paths where tracing overhead would otherwise be visible to end users.

Observe Graph — Agent Execution as a Graph

Trace waterfalls show timing. The new Observe graph shows structure. It renders your agent’s execution as an interactive directed graph where each node is a span (an individual step inside a trace — an LLM call, a tool invocation, a retrieval, a piece of branching logic) and each edge is the flow of execution.

What’s new

Color-coded nodes by type. LLM calls, tool invocations, retrieval operations, and custom spans each have distinct visual treatments.
Click-to-expand details. Every node shows inputs, outputs, latency, token counts, and evaluation scores when available.
Zoom, pan, and collapse. Navigate large graphs at scale; collapse subtrees to focus on the part that matters.

Why it matters

For simple linear LLM chains, a timeline view is fine. For complex agents with tool calls, branching logic, parallel execution, and retry loops, the graph reveals the architecture of your agent’s decisions in a way a timeline can’t. When your agent picks the wrong tool or enters an unexpected branch, the graph makes the decision path immediately visible.

Who it’s for

Agent developers debugging complex multi-step agents, and engineering teams running production agents with dynamic tool selection where the path taken matters as much as the final output.

Read the docs →

Developer Keys and Infrastructure

Developer keys. The old single-API-key model is replaced with proper key management. Create multiple API keys with scoped permissions — read-only for dashboards, write for trace ingestion, admin for configuration. Each key tracks its own usage metrics, and key rotation is a one-click operation with no downtime.

Model serving infrastructure. New internal infrastructure powers both evaluation runs and Protect Flash guardrails (runtime checks that block or flag bad agent outputs before they reach users). Autoscaling based on demand and optimized for low-latency inference, so evaluation runs and guardrail checks stay fast under peak load.

Eval tab revamp. Better data visualization for evaluation results: improved charts, more flexible filtering, one-click export to CSV and JSON.

Prompt evaluation updates. Batch execution added — score multiple prompt versions in a single run and compare results side by side.

Older

Breaking Bad UI Redesign, Custom Model Endpoints, and Observe Enhancements

Newer

System Metrics in Observe, Multimodal Bedrock Tracing, and Eval Playground Upgrades

All changelog entries