Home / Changelog / 2025 Week 34
2025 W34
Share

Summary Dashboards, Alerts Revamp, and Prompt SDK

Rebuilt summary dashboards with rich visualizations, a completely revamped alerts system, and powerful new Prompt SDK capabilities.

Monitor SDK Platform Evaluate
Summary Screen Revamp
4 Role levels
3 Chart types

What's in this digest

Monitor Summary screen revamp New
Monitor Alerts revamp with Slack and email New
SDK Prompt SDK upgrades New
Platform Workspaces RBAC New
Platform AWS Marketplace integration Improved
SDK Error localizer via SDK Improved
Evaluate Critical issue detection on datasets Improved
Monitor Prompt metrics in Observe Improved
SDK traceAI optional dependencies cleanup Fixed

Summary Screen Revamp

Data without visualization is just noise. The rebuilt summary screen transforms raw evaluation and monitoring data into clear, actionable visual stories. Three new chart types give you multiple perspectives on your agent’s performance.

Spider charts map multi-dimensional evaluation scores onto a single view. See how your agent performs across faithfulness, relevance, completeness, and safety simultaneously. Bar charts break down performance by category, time period, or prompt version. Pie charts show distribution of quality tiers, error types, or evaluation outcomes.

The comparison view is where this gets powerful. Place any two evaluation runs, prompt versions, or time periods side-by-side and see exactly what changed. No more switching between tabs and trying to hold numbers in your head. The visual diff makes regressions obvious and improvements measurable.

Alerts Revamp

The previous alerting system was functional but limited. The revamped alerts infrastructure is built for production teams that need to know the moment quality degrades.

Slack and email notification channels are now first-class citizens. Set thresholds on any metric — hallucination rate exceeds 5%, latency spikes above 2 seconds, evaluation scores drop below baseline — and get notified instantly in the channels your team already uses. Intelligent alert grouping prevents notification fatigue by consolidating related alerts into a single, actionable message.

Alert rules are composable. Combine multiple conditions with AND/OR logic. Set different severity levels. Route critical alerts to PagerDuty while sending informational alerts to a Slack channel. The alerting system adapts to your operational workflow, not the other way around.

Prompt SDK — Production-Grade Prompt Management

Prompts in production need more than version control. The upgraded Prompt SDK introduces three capabilities that teams have been requesting since launch.

Caching eliminates redundant prompt fetches. The SDK caches prompt versions locally with configurable TTL, reducing API calls and improving response times for high-throughput applications. A/B testing is now built into the SDK itself. Define traffic splits between prompt versions and the SDK handles routing automatically, collecting performance data for each variant. Multi-environment deployment lets you promote prompts through development, staging, and production environments with explicit gates between each stage.

Workspaces RBAC

As organizations scale their AI operations, access control becomes critical. The new RBAC framework introduces four distinct role levels: Owner has full administrative control, Admin manages team members and configurations, Member can create and run evaluations and simulations, and Viewer has read-only access to dashboards and results.

Permissions are granular. Control who can modify evaluation criteria, who can trigger simulations, and who can access production traces. For regulated industries, this provides the audit trail and access governance that compliance requires.

Additional Updates

Future AGI is now available on AWS Marketplace, enabling procurement through existing AWS agreements with consolidated billing. The error localizer is available as a standalone SDK function for both sync and async workflows, helping teams pinpoint exactly where in an agent’s execution chain a failure occurred. Datasets now surface critical issues automatically with specific mitigation advice. And traceAI has slimmed down its dependency tree by making framework-specific packages optional, so you only install what you actually use.