Home / Changelog / 2025 Week 30

Jul 21 – Jul 25, 2025 2025 W30

Voice Simulation is Here

Test your voice agents with real AI-conducted phone calls powered by ultra-low-latency LiveKit infrastructure.

Simulate Platform SDK Evaluate

60-70% Cost reduction vs manual QA

sub-second Voice latency

What's in this digest

Simulate Call Simulation New

Simulate LiveKit-based ultra-low-latency voice testing New

Simulate Simulator agent form and agent definition dropdowns Improved

Simulate Add scenarios from datasets Improved

Platform Refresh token cycle for session management Fixed

Platform Mixpanel analytics integration Improved

SDK traceAI TypeScript Vercel instrumentor Improved

Evaluate CRUD on custom evaluations Improved

Platform Span name display in traces Fixed

Evaluate Add feedback to evals Improved

Call Simulation — A New Category of Testing

Manual QA for voice agents is expensive, slow, and inconsistent. You hire testers, write scripts, schedule calls, and hope the coverage is broad enough. It never is. Starting today, Future AGI can conduct those calls for you.

Call Simulation introduces AI-powered agents that place real phone calls to your voice agents. These simulator agents follow conversation scenarios, probe edge cases, and evaluate responses in real time. They handle interruptions, long pauses, accent variations, and the kind of conversational chaos that real users bring to every interaction.

The result is a 60-70% reduction in QA costs with dramatically better coverage. Where a human tester might run through 20 scenarios in a day, Call Simulation handles hundreds in the same timeframe with full reproducibility.

LiveKit-Powered Infrastructure

Voice testing demands performance that web-based testing never needed. A 500-millisecond delay in a text response is invisible. A 500-millisecond delay in a phone call is a dealbreaker.

We built Call Simulation on LiveKit infrastructure specifically for this reason. Every simulation call operates at sub-second latency, ensuring the conversational dynamics mirror what your real customers experience. Turn-taking, interruptions, and natural speech patterns all behave correctly because the infrastructure treats latency as a first-class concern.

Scenario Management

Building test scenarios from scratch is tedious. The new “Add scenarios from datasets” feature lets you import conversation patterns directly from your existing datasets. Have a collection of real customer transcripts? Turn them into simulation scenarios with a few clicks. Each scenario becomes a repeatable test case that your simulator agents execute faithfully.

The simulator agent form and agent definition dropdowns make configuration straightforward. Select your target agent, define the simulator persona, choose your scenarios, and launch. No YAML files, no deployment scripts.

Platform and SDK Updates

This release also brings meaningful improvements across the broader platform. Mixpanel analytics integration is now live, giving teams visibility into how their organization uses Future AGI. Every feature interaction, evaluation run, and simulation session is tracked to help you understand adoption and identify workflow bottlenecks.

For TypeScript teams deploying on Vercel, the new traceAI Vercel instrumentor brings automatic observability to serverless AI functions. Import the instrumentor, wrap your handler, and every LLM call, tool invocation, and response is captured as a trace span without manual instrumentation.

Evaluation Improvements

Custom evaluations now support full CRUD operations. Create evaluations tailored to your specific quality criteria, iterate on scoring rubrics, and manage your evaluation library as it grows. Combined with the new feedback attachment feature — which lets you link human judgments directly to evaluation results — teams can build a continuous improvement loop where human expertise refines automated evaluation over time.

Span names now display directly in trace views, a small change that makes a real difference when navigating traces with dozens of nested operations. No more clicking into each span to figure out what it represents.

Older

System Metrics, Multimodal Tracing, and Eval Playground

Newer

Document Intelligence and Async Evaluations

All changelog entries

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Platform

Audience

LEARN

DEVELOPERS

Featured

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Voice Simulation is Here

What's in this digest

Call Simulation — A New Category of Testing

LiveKit-Powered Infrastructure

Scenario Management

Platform and SDK Updates

Evaluation Improvements

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Voice Simulation is Here

What's in this digest

Call Simulation — A New Category of Testing

LiveKit-Powered Infrastructure

Scenario Management

Platform and SDK Updates

Evaluation Improvements

FutureAGI AI Assistant