Scenario Builder and Session Tracking
Upload SOPs and transcripts to auto-generate test scenarios with edge cases, plus simplified session-level observability.
What's in this digest
Automated Scenario and Workflow Builder
Writing test scenarios by hand is one of the biggest bottlenecks in voice agent testing. A typical enterprise voice agent handles dozens of conversation flows, each with its own edge cases, error paths, and compliance requirements. Manually authoring scenarios for all of these is a project in itself.
The Scenario Builder eliminates this bottleneck. Upload your Standard Operating Procedures, call transcripts, or product documentation, and the system automatically generates comprehensive test scenarios. It does not just extract the happy path. It identifies edge cases your team might miss: what happens when a caller interrupts mid-sentence, provides contradictory information, or asks a question outside the agent’s domain.
Each generated scenario includes branching conversation flows, expected agent behaviors at each decision point, and success criteria for evaluation. The result is 10x faster scenario generation compared to manual authoring, with better coverage of edge cases and failure modes.
Generated scenarios are fully editable. Refine the auto-generated output, add custom edge cases, or merge multiple scenarios into complex multi-turn workflows. The Scenario Builder handles the heavy lifting while you retain full control over what gets tested.
Agent Definition Versioning
Agent configurations evolve continuously. Prompts change, tools get added, guardrails are tuned. Without version control, it is impossible to know which configuration produced which results.
Agent definition versioning brings commit-style version control to your agent configurations. Every change gets a commit message describing what changed and why. Consolidated test reports show how each version performed across your evaluation suite. Roll back to any previous version with a single click if a change degrades performance.
This is particularly valuable for teams with multiple people iterating on the same agent. The version history serves as a changelog for your agent’s behavior, making it clear who changed what and what impact it had.
Simplified Session Tracking
Tracing individual requests is valuable. Understanding complete user sessions is transformative. The new session tracking feature makes this simple: add a single session.id attribute to your spans and Future AGI automatically groups all related traces into a coherent session view.
No complex session management code. No custom middleware. One attribute on your spans and you get session-level dashboards showing how users navigate through multi-turn conversations, where they drop off, and which conversation patterns lead to successful outcomes.
Voice Simulation Enhancements
The multi-channel audio player separates agent and caller audio streams during playback. Listen to just the agent’s responses to evaluate tone and accuracy, or focus on the caller’s input to understand how the agent handles different speaking patterns. Call recordings are now downloadable in three audio formats, supporting offline analysis, compliance archiving, and cross-team sharing.
Platform and Performance
Behind the scenes, this release includes significant infrastructure work. A dedicated Celery worker pool for trace ingestion separates trace processing from other background jobs, eliminating resource contention during high-volume ingestion periods. The trace ingestion pipeline itself has been optimized end-to-end, reducing the time between when a trace is emitted and when it appears in the dashboard.
Prompt collaboration features enable team-based prompt development with real-time collaborative editing and commenting. Multiple team members can work on the same prompt simultaneously, leave feedback, and track changes through the revision history. Evaluation group management now supports full CRUD operations, allowing CI/CD pipelines to create, update, and delete evaluation groups programmatically as part of automated testing workflows.