Home / Changelog / 2025 Week 48

Nov 24 – Nov 28, 2025 2025 W48

Scenario Branches and Custom Background Noises

Branching scenarios reveal how agents handle divergent conversations, and custom background noises push simulations closer to production reality.

Simulate Agents Platform Evaluate Monitor

10+ Background noise profiles

3 Scenario branch types

What's in this digest

Simulate Scenario generation with branch visibility New

Agents Enable Others option for agent definition New

Simulate Custom background noises for simulation New

Simulate Multi-branch scenario generation New

Platform JSON input/output in session view Improved

Evaluate Eval explanation summary for simulations Improved

Platform Prompt WebSocket streaming Improved

Evaluate Edit evaluation variable remapping Improved

Monitor Observe enhancements Improved

Simulate Simulated assistant call ending fixes Fixed

Multi-Branch Scenario Generation

Real conversations are not linear. A customer asking about billing might pivot to a feature question, circle back to pricing, then escalate to a complaint — all within a single session. Linear test scenarios miss these transitions entirely.

Multi-branch scenario generation changes how teams think about simulation coverage. When you generate a scenario, the system now produces a branching conversation tree rather than a single path. Each branch represents a plausible fork: the user changes their mind, introduces a new topic, asks a clarifying question, or expresses frustration.

Three branch types are supported out of the box. Intent forks model cases where the user’s goal shifts mid-conversation. Escalation branches simulate increasing urgency or emotional intensity. Digression branches test the agent’s ability to handle off-topic detours and return to the main thread.

The scenario viewer has been updated with branch visibility, letting you navigate the full tree structure, inspect individual paths, and run targeted simulations against specific branches. This is especially powerful for identifying failure modes that only emerge in multi-turn, non-linear conversations.

Custom Background Noises

Voice agents in production do not operate in silence. They handle calls from busy offices, moving cars, crowded restaurants, and noisy streets. Testing in pristine audio conditions produces results that do not transfer to the real world.

Custom background noises bring over 10 ambient noise profiles to the simulation engine. Coffee shop chatter, traffic noise, keyboard typing, wind, crowds, office HVAC hum — each profile has been calibrated to match common real-world conditions. Layer them onto any voice simulation to test how your agent performs when the audio signal is degraded.

The impact on evaluation scores is often revealing. Agents that perform flawlessly in clean conditions can struggle dramatically with ambient noise, particularly around entity extraction and intent classification.

Enable Others: Custom Provider Support

Agent definitions are no longer limited to Vapi and Retell. The new “Enable Others” option opens agent configuration to any voice or LLM provider. If your stack uses a custom-built provider, a niche vendor, or an internal model serving infrastructure, you can now define agents against it and run the full simulation and evaluation suite.

This is a foundational change for teams with heterogeneous infrastructure. Test the same agent definition against multiple providers, compare performance, and make data-driven decisions about your stack.

Eval Explanation Summaries

Evaluation scores are more useful when you understand why they landed where they did. Every simulation evaluation now includes a human-readable explanation summary. Instead of just seeing “Faithfulness: 0.72,” you see a breakdown of which claims were supported by context, which were not, and what the model’s reasoning was for each judgment.

These summaries make it practical for non-technical stakeholders to review evaluation results. Product managers and quality leads can read the explanations and understand agent quality without interpreting raw metric outputs.

Platform and Observe Updates

The backend migration from string to JSON fields for session data is complete. Session views now render structured JSON input and output natively, with collapsible trees and syntax highlighting. Prompt execution gets real-time streaming via WebSocket connections, eliminating the wait-for-completion pattern.

Observe receives a batch of usability improvements: filters now persist across page navigations, pagination handles large result sets more gracefully, metadata is displayed inline, and pricing logic has been updated to reflect current provider rates.

Older

Logs, Latency, and the Simulate Revamp

Newer

Fix My Agent -- AI-Powered Debugging

All changelog entries

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Platform

Audience

LEARN

DEVELOPERS

Featured

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Scenario Branches and Custom Background Noises

What's in this digest

Multi-Branch Scenario Generation

Custom Background Noises

Enable Others: Custom Provider Support

Eval Explanation Summaries

Platform and Observe Updates

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Scenario Branches and Custom Background Noises

What's in this digest

Multi-Branch Scenario Generation

Custom Background Noises

Enable Others: Custom Provider Support

Eval Explanation Summaries

Platform and Observe Updates

FutureAGI AI Assistant