Voice AI Production-to-Simulation, Annotation Queue Assignment, and API Docs Improvements

Turn any live voice call into a simulation test case, manually assign annotation queue items, and navigate API docs with full context on a single page.

Simulate API Evaluate

1-click production call to simulation

days → seconds regression to reproducible test

What's in this digest

Simulate New

Voice AI: Production to Simulation in One Click

API New

API Reference: Everything You Need on One Page

Evaluate New

Annotation Queue: Faster Review, Stronger Agreement

Simulate Improved

Voice Metrics in Call Lists

Voice AI: Production to Simulation in One Click

Chat agents have had this for a while: pick any real conversation from Observe and turn it into a simulation test case in one click. Voice now gets the same flow.

Every voice agent team hits the same wall: the most valuable test cases are the ones that already happened in production, but reproducing them by hand is slow, miss-the-details work that usually gets skipped.

Voice AI Production-to-Simulation closes that gap. Pick any real call from Observe (the live traffic view) and turn it into a repeatable test case you can rerun against any prompt version, prompt chain, or agent definition.

What’s new

One-click conversion. Select a call in Observe and the platform extracts its scenario (the conversation flow, the caller’s intent, the edge cases that surfaced) into a simulation test case.
Rerun against any prompt version. Compare outputs side by side and catch regressions before they reach the next real caller.
No manual reproduction. The test case carries the real inputs, real turn-taking, and real edge cases with it. Nothing has to be guessed or re-authored.

Why it matters

Your test coverage grows with your traffic. Every interesting call, every unexpected edge case, every failure worth investigating becomes a repeatable scenario you can run on demand. The feedback loop from “something went wrong in production” to “here is a test that catches it” collapses from days to seconds.

Who it’s for

Voice agent teams shipping into live traffic, especially those whose hardest test cases live in production but who have never had a clean way to bring them back into the test suite.

Read the docs →

API Reference: Everything You Need on One Page

The previous API reference split every endpoint across separate views (overview, parameters, and response in different tabs) so writing integration code meant constantly clicking between them. The redesign puts the whole endpoint on a single page.

What’s new

cURL example and expected response at the top of every endpoint page, ready to copy-paste.
Full parameter details render inline right below: types, descriptions, enum values, and auth requirements, all visible at once.
No more context switching. Everything an endpoint needs, from the request signature to the response schema, lives in a single scroll.

Why it matters

The time from “I want to call this endpoint” to “it’s working in my code” was mostly navigation, not typing. With the whole endpoint on one page, you can read the signature, copy the cURL, run it, and keep moving, with no bouncing between tabs to stitch the call together in your head.

Annotation Queue: Faster Review, Stronger Agreement

The annotation queue (where humans review and label agent outputs) gets a significant upgrade:

Multi-assignment. Route the same item to multiple reviewers for consensus-based review, where agreement between reviewers produces the final label.
No delays between reviews. Items load instantly as you move through the queue, with no waiting between one and the next.
Manual assignment. Send a specific item to a specific annotator, useful for expert review, escalations, or when one reviewer has the right context to sign off faster.
Optional reviewer approval. A reviewer-approval step is available as an optional toggle per workflow. Turn it on for high-stakes workflows, leave it off when speed is the priority.

Net effect: faster reviews when workflows can self-certify, stronger agreement when they can’t, and no more manager-as-annotator bottleneck.

Read the docs →

Voice Metrics in Call Lists

Voice-specific metrics (agent latency, agent stop latency, interruptions, turn count, words per minute, talk ratio (agent vs customer), CSAT score, and more) are now columns in the call list. Filter and triage across every voice call on the platform: surface high-latency calls, low CSAT scores, frequent interruptions, etc. in seconds, then jump straight into annotations or QC on the calls worth reviewing.

Older

Falcon AI and 4x Faster Frontend

Newer

Evals Revamp, Experiment V2, Observe Revamp, and Error Feed

All changelog entries