Test with real calls,
not just transcripts

Run hundreds of simulated phone calls and chat conversations against your voice or chat agent. Test accents, interruptions, SOPs, and edge cases - with per-call audio recordings, transcripts, and evaluation scores.

Test Run: Insurance Claim Flow Running 38 / 50 calls
Pass Rate 87%
Avg CSAT 4.1
Avg Latency 1.2s
Avg Duration 2:18
Interrupts 3.2/call
# Persona Scenario Status CSAT Latency Duration Compliance
1 Elderly Caller Slow · Soft-spoken File new claim Pass 4.5 0.9s 3:12 100%
2 Angry Customer Fast · Heavy accent · Interrupts Dispute denial Fail 2.1 2.8s 1:47 60%
3 Confused First-timer Slow · Asks many questions Check status Pass 4.2 1.1s 2:58 100%
4 Price-sensitive Normal · Negotiates Premium quote Pass 3.8 1.0s 2:05 95%
5 Background Noise Noisy · Normal speed File new claim Fail 2.8 3.1s 1:22 40%
6 Tech-savvy Fast · Precise · Direct Update policy Pass 4.7 0.8s 1:45 100%
7 Emotional Caller Slow · Upset · Pauses Dispute denial Pass 3.9 1.4s 3:41 90%
8 Non-native Speaker Heavy accent · Slow Check status Pass 4.0 1.3s 2:52 100%
Agent: Vapi · Insurance Claims v2 Personas: 6 active Scenarios: 4 types
Elapsed 6m 14s Audio: 38 recordings
Core Features

Real calls, real audio,
real evaluation

Voice & Chat Simulation
live call
VOICE CALL - VAPI AGENT 00:47 / 02:34 TRANSCRIPT 00:02 Caller: I need to file a claim... 00:08 Agent: I can help with that... 00:15 Caller: [interrupts] But what about- 00:18 Agent: Of course, let me address... CSAT 4.2 Latency 1.1s WPM 142 Compliance Greeted ✓ Verified ✓ Discl. ✓ Status PASS - 100% CHAT - SDK AGENT How do I reset my pass? Go to Settings → Security I don't see that option Are you on the mobile app? Steps differ slightly... PER-TURN EVAL T1: 9.2 T2: 8.8 T3: 6.1 T4: 8.5 Avg: 8.15 · Tone: Neutral · Task: Resolved SUPPORTED Vapi Retell LiveKit ElevenLabs SDK
Scenario Builder
4 methods
CREATE SCENARIOS Visual Workflow Builder Start Greet Identify Resolve Variables: {{persona}} {{situation}} Upload Call Scripts Drop files here TXT · DOCX · PDF claim_flow.pdf refund_script.txt 2 scripts → 14 scenarios auto-parsed Import from Dataset customer-support-v2.4 1,247 rows input → persona expected → assert Each row becomes a scenario with expected outcome Paste SOP → AI Generates 1. Greet caller 2. Verify identity 3. Log complaint 4. Offer resolution auto-graph 4 methods · template variables · data-driven scenarios
Persona Library
18+ built-in
BUILT-IN PERSONAS Angry Customer Fast · Interrupts Heavy accent Speed: 1.8x Elderly Caller Slow · Soft-spoken Repeats often Speed: 0.6x Confused Many questions Goes off-topic Speed: 0.9x Tech-savvy Direct · Precise Expects speed Speed: 1.3x Non-native Heavy accent Slow · Pauses Speed: 0.7x Emotional Upset · Pauses Needs empathy Speed: 0.8x Background Noise Noisy environment Hard to hear Speed: 1.0x +11 more... Price-sensitive VIP · Multilingual + Custom CUSTOM PERSONA BUILDER Name Impatient Exec Style Curt, demands manager Voice Speed: 1.5x Interrupts Accent US East Coast Noise Office background Emoji None Typos Low 18 built-in · unlimited custom · voice-native traits
Call Analytics
50 calls
AGGREGATE METRICS Pass Rate 87% Avg CSAT 4.1 Avg Latency 1.2s Interrupts/Call 3.2 CSAT DISTRIBUTION 20 10 5 0 1-2 2-3 3-4 4-5 PASS RATE BY PERSONA Elderly 95% Tech-savvy 90% Confused 80% Angry 60% Noisy 50% EVALUATION RESULTS Metric Pass Fail Rate Compliance 42 8 84% Sales Effectiveness 38 12 76% Product Knowledge 46 4 92% 50 calls · 6 personas · 3 evaluators · full audio + transcripts

Test voice agents over actual phone calls (Vapi, Retell, LiveKit, ElevenLabs) and chat agents via SDK - in the same platform. Voice simulations record per-participant audio, produce transcripts, and evaluate every turn. Not transcript-only testing - real calls with real audio.

Set up your first simulation

Create test scenarios with a visual workflow builder, upload call scripts (TXT/DOCX/PDF), import from datasets, or paste your SOP and let AI generate the graph. Each scenario supports template variables - {{persona}}, {{situation}}, {{outcome}} - for dynamic, data-driven test cases.

Create a scenario

Use pre-built personas or create custom ones with personality, communication style, accent, conversation speed, background noise, interrupt sensitivity, and emoji/typo patterns. Voice personas simulate real caller behaviors - fast talkers, elderly callers, heavy accents, emotional conversations.

Explore personas

Every test run produces call logs with full transcripts, audio playback, CSAT scores, agent latency, words-per-minute, talk ratio, and interrupt counts. Attach evaluation metrics - compliance, sales effectiveness, product knowledge - and get pass/fail scores automatically.

See analytics dashboard
Use Cases

Every failure found
before your users call

Call #1 PASS Call #2 FAIL Call #3 PASS Call #4
RUNNING 50 calls · 6 personas · 4 scenarios Accents · Interruptions · Background noise

QA voice agents before launch

Run hundreds of simulated phone calls against your Vapi, Retell, or LiveKit agent. Test accents, interruptions, background noise, and edge cases - with per-call audio recordings and transcripts.

Vapi Retell LiveKit
How do I cancel? 9.1 Go to Account → Billing 8.7 What about my data? 7.3 Data retained 30 days after cancel 9.0 Persona-driven · Multi-turn · Per-turn scoring

Test chatbots with realistic conversations

Define persona-driven scenarios and run multi-turn chat simulations via SDK. Evaluate every turn for tone, accuracy, and task completion - not just the final response.

SDK LiteLLM Custom
COMPLIANCE CHECKLIST 1. Greeting script followed PASS 2. Identity verification PASS 3. Disclaimer read FAIL 4. Resolution offered PASS 5. Closing confirmation PASS SOP → auto-test → exportable evidence

Validate compliance and SOPs

Paste your Standard Operating Procedure, let AI generate the test graph and scenario rows. Run simulations to prove your agent follows every step - with exportable evidence for auditors.

SOC 2 HIPAA Export
Agent v1.2 CSAT 3.8 Compliance 82% Latency 1.4s Pass Rate 78% vs Agent v2.0 CSAT 4.3 Compliance 94% Latency 0.9s Pass Rate 91% Same scenarios · versioned definitions · deploy winner

A/B test agent versions

Run the same scenarios against two agent versions with versioned definitions. Compare CSAT, conversion rate, latency, and compliance scores side-by-side - deploy the version that wins.

Versioning Compare
Fast Talker Elderly Heavy Accent Angry Price-sensitive Confused +12 more + Custom 18+ built-in + unlimited custom personas

Cover every persona type

Test with fast talkers, elderly callers, heavy accents, angry customers, price-sensitive shoppers, and confused first-timers. 18 pre-built personas plus unlimited custom ones with voice-native traits.

18+ Personas Custom
TOOL CALL EVALUATION lookup_order(id="ORD-847") Correct get_policy(type="refund") Correct create_ticket(priority="low") Wrong expected: priority="high" Mock responses active - testing agent logic, not APIs Right tool · Right params · Mock external APIs

Evaluate tool calling in agents

Test whether your agent calls the right tools with the right parameters. Mock tool responses in simulations so you test agent logic - not external APIs.

Tool Calls Mocking
How It Works

From agent definition
to call-level confidence

01

Define agent, scenarios, and personas

Connect your voice or chat agent (Vapi, Retell, LiveKit, ElevenLabs, or SDK). Build scenarios with the visual workflow builder, upload call scripts, or paste your SOP and let AI generate the test graph. Pick personas with accents, speed, and interruption patterns.

Define Agent, Scenarios & Personas
Agent Connection Platform Vapi Agent ID ins-claim-v2 Phone +1 555-0142 Type Voice Chat Evaluators Compliance CSAT Latency Product WPM Scenario Flow Start Greet caller Verify ID Skip Handle claim Close Vars: {{persona}} · {{situation}} · {{outcome}} Personas (6) Angry Customer 1.8x · Acc Elderly Caller 0.6x · Slow Confused 0.9x · Qs Background Noise 1.0x · Noisy Non-native 0.7x · Acc Price-sensitive 1.0x · Neg
02

Run calls and chats at scale

Execute hundreds of real phone calls and chat conversations in parallel. Every simulation records per-participant audio, produces full transcripts, and scores each turn against your evaluation metrics - compliance, CSAT, product knowledge.

Running 50 Calls in Parallel
38/50 completed
#1 PASS #2 FAIL #3 PASS #4 PASS #5 PASS #6 FAIL #7 PASS #8 PASS #39 #40 ··· 38 completed · 12 remaining ··· CALL #17 - ANGRY CUSTOMER PERSONA 01:47 / ... LIVE 01:42 Caller: This is unacceptable! 01:45 Agent: I understand your frustration... REAL-TIME METRICS Latency 2.8s Interrupts 4 Talk Ratio 62:38 WPM 168 CSAT (est) 2.1
03

Drill into results and fix

Review per-call analytics: audio playback, transcripts, CSAT, latency, words-per-minute, talk ratio, and interrupt counts. Spot failure patterns, fix your agent, and re-run the same scenarios to verify.

Results & Fix
FAILURE PATTERNS Angry + Interrupts 6 / 8 calls failed 25% Background Noise 5 / 10 calls failed 50% Disclaimer Step 8 / 50 calls missed 84% FIX → RE-RUN CYCLE Drill In Call #17 audio Full transcript Turn-by-turn eval Failure at turn 3 → missed disclaimer Fix Agent Update prompt: "Always read disclaimer before proceeding to resolution step." Re-run Same Same scenarios Same personas Same evaluators 50 calls queued 95% Pass +8% from last run

Powering teams from
prototype to production

From ambitious startups to global enterprises, teams trust Future AGI to ship AI agents confidently.