Home / Changelog / 2025 Week 38

Sep 15 – Sep 19, 2025 2025 W38

Scenario Builder and Session Tracking

Upload SOPs and transcripts to auto-generate test scenarios with edge cases, plus simplified session-level observability.

Simulate Agents Monitor Evaluate Platform

10x Faster scenario generation

3 Audio download formats

What's in this digest

Simulate Automated scenario and workflow builder New

Agents Agent definition versioning New

Monitor Simplified session tracking Improved

Evaluate Advanced evaluation group management Improved

Simulate Enhanced call management with multi-channel audio player Improved

Simulate Flexible call recording downloads Improved

Platform Prompt collaboration features Improved

Platform Annotation and prompt import fixes in datasets Fixed

Platform Dedicated Celery worker pool for trace ingestion Improved

Platform Optimized trace ingestion pipeline Improved

Automated Scenario and Workflow Builder

Writing test scenarios by hand is one of the biggest bottlenecks in voice agent testing. A typical enterprise voice agent handles dozens of conversation flows, each with its own edge cases, error paths, and compliance requirements. Manually authoring scenarios for all of these is a project in itself.

The Scenario Builder eliminates this bottleneck. Upload your Standard Operating Procedures, call transcripts, or product documentation, and the system automatically generates comprehensive test scenarios. It does not just extract the happy path. It identifies edge cases your team might miss: what happens when a caller interrupts mid-sentence, provides contradictory information, or asks a question outside the agent’s domain.

Each generated scenario includes branching conversation flows, expected agent behaviors at each decision point, and success criteria for evaluation. The result is 10x faster scenario generation compared to manual authoring, with better coverage of edge cases and failure modes.

Generated scenarios are fully editable. Refine the auto-generated output, add custom edge cases, or merge multiple scenarios into complex multi-turn workflows. The Scenario Builder handles the heavy lifting while you retain full control over what gets tested.

Agent Definition Versioning

Agent configurations evolve continuously. Prompts change, tools get added, guardrails are tuned. Without version control, it is impossible to know which configuration produced which results.

Agent definition versioning brings commit-style version control to your agent configurations. Every change gets a commit message describing what changed and why. Consolidated test reports show how each version performed across your evaluation suite. Roll back to any previous version with a single click if a change degrades performance.

This is particularly valuable for teams with multiple people iterating on the same agent. The version history serves as a changelog for your agent’s behavior, making it clear who changed what and what impact it had.

Simplified Session Tracking

Tracing individual requests is valuable. Understanding complete user sessions is transformative. The new session tracking feature makes this simple: add a single session.id attribute to your spans and Future AGI automatically groups all related traces into a coherent session view.

No complex session management code. No custom middleware. One attribute on your spans and you get session-level dashboards showing how users navigate through multi-turn conversations, where they drop off, and which conversation patterns lead to successful outcomes.

Voice Simulation Enhancements

The multi-channel audio player separates agent and caller audio streams during playback. Listen to just the agent’s responses to evaluate tone and accuracy, or focus on the caller’s input to understand how the agent handles different speaking patterns. Call recordings are now downloadable in three audio formats, supporting offline analysis, compliance archiving, and cross-team sharing.

Platform and Performance

Behind the scenes, this release includes significant infrastructure work. A dedicated Celery worker pool for trace ingestion separates trace processing from other background jobs, eliminating resource contention during high-volume ingestion periods. The trace ingestion pipeline itself has been optimized end-to-end, reducing the time between when a trace is emitted and when it appears in the dashboard.

Prompt collaboration features enable team-based prompt development with real-time collaborative editing and commenting. Multiple team members can work on the same prompt simultaneously, leave feedback, and track changes through the revision history. Evaluation group management now supports full CRUD operations, allowing CI/CD pipelines to create, update, and delete evaluation groups programmatically as part of automated testing workflows.

Older

Agent Compass and Enterprise Security

Newer

Voice Observability via Vapi

All changelog entries

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Platform

Audience

LEARN

DEVELOPERS

Featured

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Scenario Builder and Session Tracking

What's in this digest

Automated Scenario and Workflow Builder

Agent Definition Versioning

Simplified Session Tracking

Voice Simulation Enhancements

Platform and Performance

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Scenario Builder and Session Tracking

What's in this digest

Automated Scenario and Workflow Builder

Agent Definition Versioning

Simplified Session Tracking

Voice Simulation Enhancements

Platform and Performance

FutureAGI AI Assistant