Future AGI vs Cekura: 2026 Voice Testing and Evaluation Comparison
Future AGI vs Cekura scored on voice simulation, native observability, evaluation breadth, inline guardrails, optimization, deployment, and compliance. The honest engineering read, May 2026 pricing, where each one falls short, and how the loop changes the math.
Table of Contents
If you have to pick today: Pick Future AGI if you want one platform that covers voice simulation, native observability, evaluation, inline guardrails, and prompt optimization, with Apache 2.0 building blocks (traceAI, ai-evaluation, agent-opt) plus the Agent Command Center as the hosted control plane. Pick Cekura if a hosted voice and chat QA runner with a published persona library and a Cisco partnership for enterprise telephony is the procurement driver, and you already have observability, eval cataloging, inline guardrails, and prompt optimization wired elsewhere.
Future AGI ranks first when the workload is a continuous voice or chat agent and the team needs every layer in one project. Cekura is a credible focused QA platform when the procurement driver is automated test case generation against an agent definition inside a hosted runner with Cisco-aligned telephony.
Six axes, honest engineering scoring, May 2026 pricing on both sides, where each one falls short, and how the closed loop changes the math.
TL;DR: capability snapshot
| Capability | Future AGI | Cekura |
|---|---|---|
| Core identity | Full-stack platform: simulate + observe + evaluate + protect + optimize | Hosted voice and chat QA with automated test framework |
| License | traceAI, ai-evaluation, agent-opt Apache 2.0; Agent Command Center closed | Closed-source SaaS |
| Voice simulation | 18 pre-built personas plus unlimited custom-authored; Workflow Builder graph editor with auto-generated branching scenarios (20/50/100 rows) | Persona library plus automated test case generation from agent definition |
| Native voice observability | Vapi, Retell, LiveKit via dashboard credentials, zero SDK; Enable Others mode covers any other provider | Hosted dashboard with provider-based call ingestion |
| SDK instrumentation | traceAI 30+ documented integrations across Python and TypeScript, OpenInference-compatible spans | Closed SaaS; no published OpenInference span model |
| Evaluation | 70+ built-in templates in Apache 2.0 ai-evaluation; in-product agent authors custom evaluators | Hosted rubrics tied to the test framework |
| Inline guardrails | Future AGI Protect on Gemma 3n with LoRA adapters per arXiv 2510.13351; 4 documented dimensions; ProtectFlash for sub-100ms binary classification | Test-time risk surfacing; no published inline runtime enforcement model |
| Prompt optimization | agent-opt with 6 optimizers: Bayesian Search, Meta-Prompt, ProTeGi, GEPA, Random Search, PromptWizard; UI inside Dataset plus Python SDK | Out of scope |
| Telephony | Native Vapi, Retell, LiveKit; Enable Others mobile-number simulation; Indian phone-number simulation native | Cisco partnership for enterprise telephony |
| Pricing entry | Free to start with the full platform; pay-as-you-go scales with usage; compliance and enterprise add-ons (SOC 2 Type II, HIPAA BAA, SAML + SCIM, dedicated CSM) layer on per tier (pricing) | Credit-consumption model; quote-driven |
| Deployment | Managed cloud, BYOC, OSS self-host on Apache 2.0 triad | Managed cloud |
| Compliance | SOC 2 Type II, HIPAA, GDPR, CCPA, ISO 27001 certified; ISO 42001 in progress | SOC 2 reporting and HIPAA / BAA documented |
| Rank in 2026 | #1 for full-stack voice and chat agent platforms | #2 or #3 for focused voice and chat QA platforms with Cisco-aligned procurement |
One-line verdict: Future AGI is the broader platform with native voice observability across Vapi, Retell, and LiveKit, an Apache 2.0 eval template catalog, inline multi-modal guardrails, and the agent-opt closed loop. Cekura ships a focused voice and chat QA runner with a published persona library, multi-turn scenario authoring, and Cisco-aligned telephony procurement. Future AGI covers every layer of the lifecycle; Cekura ships the QA layer with its own enterprise procurement story.
Two positioning facts to start with
Future AGI is the only Apache 2.0 OSS layer in the voice eval, observability, and simulation market in 2026. Cekura, Coval, Hamming, and Bluejay are closed-source SaaS. Future AGI publishes traceAI (instrumentation), ai-evaluation (70+ rubrics), and agent-opt (six optimizers) under Apache 2.0. The hosted Agent Command Center sits on top of that OSS trio. Run the stack inside your own VPC, fork the eval rubrics, audit the trace pipeline; no vendor lock-in.
Each competitor in this category partially solves the problem. Cekura ships a focused voice and chat QA runner with a published persona library and multi-turn scenario authoring, but doesn’t ship a 70+ rubric Apache 2.0 catalog, an inline guardrail model, or a six-optimizer prompt-tuning library. Coval covers simulation with the Three-Layer brand. Hamming polishes post-call analytics and SIP/DTMF. Bluejay covers monitoring and A/B. Future AGI is the only product that closes the full loop (trace, eval, simulate, cluster, guard, optimize) in one project, with the source available.
What each product actually is
Future AGI is a full-stack platform for voice and text agents. The hosted Agent Command Center is the control plane. The building blocks are three Apache 2.0 libraries:
traceAIis OpenTelemetry-native and OpenInference-compatible, with first-party SDKs in Python and TypeScript. 30+ documented integrations cover the major LLM SDKs (anthropic, openai, mistralai, vertexai, bedrock, groq, google-adk, google_genai), agent frameworks (crewai, autogen, langgraph, langchain, llama_index, smolagents, openai-agents, dspy, mcp), and dedicated voice packagestraceAI-pipecatandtraceai-livekit.ai-evaluationships 70+ built-in eval templates called by slug. Voice and conversation slugs includeaudio_transcription,audio_quality,conversation_coherence,conversation_resolution,task_completion,evaluate_function_calling,is_polite,is_helpful,is_concise,translation_accuracy, andcultural_sensitivity. Retrieval templates includegroundedness,context_relevance,chunk_attribution, andchunk_utilization. Safety templates includepii,data_privacy_compliance, andprompt_injection. Custom evaluators are authored by an in-product agent that reads your traces and proposes templates end-to-end.agent-optis the optimizer. Six algorithms (Bayesian Search, Meta-Prompt per arXiv 2505.09666, ProTeGi, GEPA Genetic-Pareto per arXiv 2507.19457, Random Search per arXiv 2311.09569, PromptWizard) consume a labelled dataset fromai-evaluationand propose the next prompt version. UI inside Dataset and Python SDK both ship.
Add native voice observability for Vapi, Retell, and LiveKit. Provider API key plus Assistant ID into an Agent Definition starts call capture within minutes; every captured call gets recording download, auto transcript, and the configured evaluators applied against the trace. Enable Others mode handles any other voice provider via mobile-number simulation. Indian phone-number simulation is wired into Run Prompt and Experiments. Custom voices from ElevenLabs and Cartesia plug into the same dashboard path.
Add simulation. Workflow Builder is the visual graph editor with three node types (Conversation, End Call, Transfer Call); auto-generated branching scenarios at 20, 50, or 100 rows with branch visibility; Dataset scenarios with CSV / JSON / Excel upload plus synthetic generation; Upload Script and Call/Chat SOP modes alongside. The 4-step Run Tests wizard handles config, scenario selection, eval configuration, and review-plus-execute. Error Localization pinpoints the exact failing turn in a multi-turn conversation. The Show Reasoning column surfaces the eval rationale for debug.
Add the Future AGI Protect model family for inline guardrails. Built on Google’s Gemma 3n with LoRA-trained category adapters per arXiv 2510.13351, Protect is natively multi-modal across text, image, and audio. Two surfaces ship: rule-based Protect across four documented dimensions (content_moderation, bias_detection, security, data_privacy_compliance) and ProtectFlash as the ultra-fast binary classifier for sub-100ms inline budgets. Agent Command Center adds gateway routing: the same control plane that captures traces picks the cheaper model for easy turns, falls back on rate limits, and splits traffic by metadata.
Cekura is a voice and chat QA platform. The product surface is a hosted dashboard combining a persona library, multi-turn scenario authoring, an automated test framework, and provider-based call ingestion. The marketed focus is automated test case generation from an agent definition. Cekura’s Cisco partnership targets Cisco Webex and Cisco Contact Center procurement, and the platform ships cross-industry customer references in CX, healthcare, and fintech. Cekura is closed-source SaaS.
The two products aren’t on the same axis. Cekura is a focused QA runner with enterprise telephony procurement. Future AGI is the platform that ships the QA workflow plus observability, eval cataloging, inline guardrails, and prompt optimization in one project.
Head-to-head on the seven axes
1. Voice simulation surface
Cekura ships a real persona library and an automated test framework that generates regression cases from an agent definition. Multi-turn scenario authoring is a documented strength, and the platform ships cross-industry references in CX, healthcare, and fintech.
Future AGI ships 18 pre-built personas plus unlimited custom-authored personas. Each custom persona configures:
- Identity: name, description, gender (male / female / both)
- Age range: six explicit buckets (18-25, 25-32, 32-40, 40-50, 50-60, 60+) with multi-select
- Location: United States, Canada, United Kingdom, Australia, India with multi-select
- Behavioural: personality traits, communication style, accent
- Conversation: conversation speed, response style, background noise
- Language: multilingual toggle across many popular languages
- Custom properties: arbitrary key-value attributes the team adds for vertical-specific testing
- Additional instructions: free-form behavioural prompt that flows into the simulator agent at run time
Workflow Builder is the visual graph editor. Three node types ship (Conversation in purple, End Call in red, Transfer Call in orange). Auto-generate scenarios at 20, 50, or 100 rows with full branch visibility, per-row personas, situations, and outcomes. Three other scenario modes ride alongside: Dataset (CSV / JSON / Excel upload plus synthetic generation), Upload Script, and Call/Chat SOP. The 4-step Run Tests wizard handles configuration, selection, evaluation, and review-plus-execute. Error Localization pinpoints the failing turn. The evaluate_function_calling template scores tool-calling correctness against expected calls.
Verdict. Both products ship deep simulation. Future AGI’s library is parameterized across nine authoring axes and grows from production traces via the same project, with Workflow Builder layered on top. Future AGI has the broader authoring surface; the two products are close on persona depth alone.
2. Native voice observability
Future AGI’s voice observability is dashboard-driven and zero-SDK for the three modern voice runtimes that cover the majority of greenfield 2026 deployments. Add provider-specific credentials to an Agent Definition (Vapi or Retell API key plus assistant ID; LiveKit URL plus API key, secret, and agent name) and auto call log capture starts within minutes. Every captured call gets recording download, an auto-generated transcript, and the configured evaluators applied against the trace. Enable Others mode covers any other voice provider via mobile-number simulation.
The captured spans are OpenInference-compatible. Voice attributes land on the gen_ai.voice.* namespace: gen_ai.voice.stt.provider, gen_ai.voice.tts.voice_id, gen_ai.voice.latency.transcriber_avg_ms, gen_ai.voice.latency.turn_avg_ms, gen_ai.voice.interruptions.user_count, gen_ai.voice.recording.assistant_url, and gen_ai.voice.recording.customer_url. Evaluation results attach under gen_ai.evaluation.*: gen_ai.evaluation.name, gen_ai.evaluation.score.value, gen_ai.evaluation.score.label, and gen_ai.evaluation.target_span_id. Joining the two namespaces in the Observe filter view gives per-eval call clustering with no schema work.
Cekura ships call ingestion through its hosted dashboard with provider integrations. The platform shape stays QA-runner-first; native production observability across the Vapi, Retell, and LiveKit stack is not the marketed focus.
Verdict. Future AGI is stronger on the zero-SDK dashboard path from a Vapi or Retell agent to scored, recorded, transcribed, and clusterable calls because the provider integrations are native to the Agent Definition surface, not a separate ingestion step.
3. SDK instrumentation
Future AGI’s traceAI ships 30+ documented integrations across Python and TypeScript. The dedicated voice packages are traceAI-pipecat and traceai-livekit. Spans are OpenInference-compatible, the SDK is Apache 2.0, and the instrumentation library is readable on GitHub before adoption.
LiveKit registration uses the in-process pattern to avoid worker pickling issues:
from fi_instrumentation import register
from fi_instrumentation.fi_types import ProjectType
from traceai_livekit import enable_http_attribute_mapping
register(
project_name="livekit-voice-agent",
project_type=ProjectType.OBSERVE,
set_global_tracer_provider=True,
)
enable_http_attribute_mapping()
Pipecat is the same shape and does not require a tracing extra:
from fi_instrumentation import register
from fi_instrumentation.fi_types import ProjectType
from traceai_pipecat import enable_http_attribute_mapping
register(
project_type=ProjectType.OBSERVE,
project_name="pipecat-voice-app",
set_global_tracer_provider=True,
)
enable_http_attribute_mapping()
Cekura is closed-source SaaS. There is no published OpenInference span model or open-source instrumentation library that you can audit before procurement.
Verdict. Future AGI is the only product in the comparison with an Apache 2.0 OpenInference library covering both Python and TypeScript across 30+ documented integrations including dedicated voice packages.
4. Evaluation engine
Future AGI’s ai-evaluation ships 70+ built-in templates in an Apache 2.0 library. Voice and conversation slugs include audio_transcription (ASR / STT scoring), audio_quality (TTS output quality), conversation_coherence, conversation_resolution, task_completion, evaluate_function_calling (tool-calling correctness), is_polite, is_helpful, is_concise, translation_accuracy, and cultural_sensitivity. Retrieval templates include groundedness, context_relevance, chunk_attribution, and chunk_utilization. Safety templates include pii, data_privacy_compliance, and prompt_injection. Multi-turn scoring runs through the conversation_coherence and conversation_resolution templates:
from fi.evals import Evaluator
evaluator = Evaluator()
result = evaluator.evaluate(
eval_templates="conversation_coherence",
inputs={
"conversation": (
"User: My Wi-Fi keeps disconnecting every few minutes.\n"
"Assistant: You can try restarting your router and updating your network drivers.\n"
"User: I restarted the router and it's stable now. Thanks!\n"
"Assistant: Glad to hear that! Let me know if you need anything else."
)
},
model_name="turing_flash",
)
print(result.eval_results[0].output)
print(result.eval_results[0].reason)
Custom evaluators are authored by an in-product agent that reads your traces and proposes templates. The same library powers offline batch scoring, CI gates, prompt-linked promotion checks, and live continuous evaluation. The audio modality supports multiple formats (.mp3, .wav, .ogg, .m4a, .aac, .flac, .wma) through MLLMAudio(url="path/to/audio.wav", local=True) for local files or MLLMAudio(url="https://...") for remote.
Cekura’s eval surface is tied to its hosted QA runner. Rubrics ship inside the product, and the platform is known for solid multi-turn QA scoring. The catalog is not published as an open-source library you can read or call by slug against arbitrary traces.
Verdict. Future AGI ships the broader evaluation surface. It is the only product in the comparison with an Apache 2.0 eval template catalog of 70+ slugs, an in-product agent that authors custom evaluators from your traces, and multi-format audio modality support.
5. Inline guardrails
The Future AGI Protect model family covers the runtime enforcement layer. Built on Google’s Gemma 3n with LoRA-trained category adapters per arXiv 2510.13351, Protect is native multi-modal across text, image, and audio. Two surfaces ship:
- Rule-based
Protect: scan across four documented dimensions (content_moderation,bias_detection,security,data_privacy_compliance) with per-metric rule configuration. Content moderation flags toxicity and harmful language; bias detection flags sexism and discrimination; security flags prompt-injection and adversarial manipulation; data privacy compliance flags PII and regulatory exposure. ProtectFlash: ultra-fast binary guardrail for sub-100ms inline budgets when per-metric granularity is not required. The arXiv paper details the binary classifier architecture.
The same four dimensions double as offline eval templates, so runtime policy and eval rubric stay in sync. Real Python signature:
from fi.evals import Protect
protector = Protect()
response = protector.protect(
inputs="AI Generated Message",
protect_rules=[
{"metric": "content_moderation"},
{"metric": "bias_detection"},
{"metric": "security"},
{"metric": "data_privacy_compliance"},
],
action="I'm sorry, I can't help you with that.",
reason=True,
timeout=25000,
)
print(response)
Cekura focuses on test-time risk surfacing through its QA framework rather than inline runtime enforcement. The product does not publish a runtime guardrail model family or a sub-100ms inline binary path.
Verdict. Future AGI is the only product in the comparison with a published multi-modal inline guardrail family plus a sub-100ms binary classifier on the same model lineage.
6. Prompt optimization
agent-opt ships six prompt optimizers, each with its own loop:
- Bayesian Search: smart few-shot optimization that uses Bayesian methods to select and format example sets.
- Meta-Prompt: deep reasoning refinement using bilevel optimization that rewrites the entire prompt against failed examples (arXiv 2505.09666).
- ProTeGi: Prompt Optimization with Textual Gradients; beam search plus targeted critique on failures.
- GEPA: Genetic-Pareto reflective prompt evolution with evolutionary search and reflection mutation (arXiv 2507.19457).
- Random Search: baseline that generates random variations with a teacher model (arXiv 2311.09569).
- PromptWizard: production-grade prompt optimization combining mutation across thinking styles with critique and refinement of top performers.
Two surfaces ship:
- UI inside Dataset. Point an optimization run at a dataset, select an evaluator, pick one of the six optimizers, and run. The dashboard surfaces optimizer iterations, candidate prompts, and final scores. Iterate, gate the winner with a separate eval run, then ship.
- SDK via Python.
agent-optexposes the same six optimizers programmatically for teams that want to wire optimization into a CI or research workflow.
Cekura is a focused QA platform. Prompt optimization is out of scope.
Verdict. Future AGI is the only product in the comparison with a closed-loop prompt optimization surface, and it ships six algorithms in both UI and SDK form.
7. Pricing and deployment
Future AGI publishes a usage-based pricing page at futureagi.com/pricing with five tiers: Free, Pay-as-you-go, Boost, Scale, and Enterprise. Self-host runs anywhere Python or TypeScript runs through the Apache 2.0 triad (traceAI, ai-evaluation, agent-opt) with the hosted Agent Command Center available SaaS or BYOC. AWS Marketplace is live.
Cekura ships a credit-consumption model with quote-driven enterprise tiers; the published surface is the hosted runner. Self-host is not the marketed shape.
Verdict. Future AGI is the only product in the comparison with managed cloud, BYOC, and Apache 2.0 OSS self-host paths all available.
Pricing snapshot: May 2026
Future AGI starts free with the full platform and scales on usage; compliance and enterprise add-ons layer on as the team needs them. Cekura prices on a credit-consumption model with enterprise pricing quote-driven. Pulled from each vendor’s published pricing page on 2026-05-19. The line items below name the Future AGI tier structure side-by-side with what Cekura ships publicly.
| Tier | Future AGI | Cekura |
|---|---|---|
| Free | $0; 50 GB, 100K gateway requests, 60 min voice sim, 30-day retention | Trial available; verify with vendor |
| Entry | Pay-as-you-go ($0 plus usage) | Credit-consumption (quote-driven) |
| Mid | Boost $250/mo; SOC 2 Type II, OAuth SSO, 90-day retention, 99.5% SLA | Verify with vendor |
| Production | Scale $750/mo; HIPAA BAA, SAML SSO plus SCIM, 1-year retention, 99.9% SLA | Verify with vendor |
| Enterprise | $2,000/mo; custom retention, ABAC, dedicated CSM | Quote-driven; Cisco-aligned procurement available |
| Self-host | OSS Apache 2.0 (traceAI + ai-evaluation + agent-opt); BYOC; AWS Marketplace | Managed cloud |
The shapes don’t line up cleanly. Cekura prices the hosted QA runner as a focused workflow product. Future AGI prices the whole platform across observe, eval, simulate, protect, and optimize in one bill. The Apache 2.0 triad means teams can run trace, eval, and optimizer libraries without any contract, and the hosted Agent Command Center adds gateway routing and the Protect inline layer on top.
Where each one falls short
Future AGI: three deliberate tradeoffs
- Federal procurement runs via BYOC self-host, not FedRAMP. FedRAMP authorization is not on the published cert list yet. Teams with federal procurement requirements run Future AGI on the BYOC path in their own VPC (or fully air-gapped) and combine that with the certified five-cert set (SOC 2 Type II, HIPAA, GDPR, CCPA, ISO 27001).
- Async eval gating is explicit by design. The agent-opt loop never auto-rewrites prompts in production without an explicit run plus a human approval gate. The optimizer proposes candidates from your dataset; the team gates the winners with an evaluation run before they ship. Silent in-production self-rewriting introduces the kind of drift that an eval-gated platform exists to prevent.
- Native voice observability ships for Vapi, Retell, and LiveKit out of the box. Any other voice provider runs through one of two paths: the Enable Others mode (mobile-number simulation) or
traceAISDK instrumentation (Apache 2.0, 30+ documented integrations includingtraceAI-pipecatandtraceai-livekit). The dashboard path stays zero-SDK for the three majority runtimes; the SDK path covers the long tail.
Three deliberate tradeoffs in pursuit of the closed loop. Each one has a clear path or workaround for buyers who need it today.
Cekura: four honest limitations
- No native voice observability across Vapi, Retell, and LiveKit with zero SDK. Cekura’s product shape stays QA-runner-first. Production calls from Vapi or Retell agents do not flow into a Cekura dashboard the way they do into a Future AGI Agent Definition through provider API key plus assistant ID. Production observability typically lives in a separate tool.
- No Apache 2.0 eval template catalog. Cekura’s rubrics ship inside the hosted runner. There is no public Apache 2.0 catalog you can read, fork, or call by slug against arbitrary traces. Future AGI’s
ai-evaluationships 70+ slugs you can pip install today. - No inline guardrail layer. Cekura does not ship a sub-100ms PII redactor or prompt-injection filter. Teams that need to enforce policy at the request boundary in production wire that to a separate vendor.
- No prompt optimization surface. Cekura is a focused QA platform. Prompt optimization, routing policy revisions, and closed-loop self-improvement are out of scope.
agent-optis the layer Cekura leaves open.
Choose Future AGI if
- Your voice or chat workload needs every layer in one project: simulate, observe, evaluate, protect, optimize.
- Native voice observability across Vapi, Retell, and LiveKit with zero SDK is the dashboard path you want from day one.
- You want 70+ built-in eval templates in an Apache 2.0 library, called by slug, with voice-specific names like
audio_transcription,audio_quality,conversation_coherence,conversation_resolution,evaluate_function_calling,translation_accuracy, andcultural_sensitivity. - Inline AI guardrails at sub-100ms latency at the request boundary are a requirement, not a wish.
- Six prompt optimizers (Bayesian Search, Meta-Prompt per arXiv 2505.09666, ProTeGi, GEPA Genetic-Pareto per arXiv 2507.19457, Random Search per arXiv 2311.09569, PromptWizard) inside UI and SDK matter for your eval-gated iteration loop.
- The five-cert compliance set (SOC 2 Type II + HIPAA + GDPR + CCPA + ISO 27001) plus BYOC for regulated or federal-adjacent deployments is on the procurement checklist.
Choose Cekura if
- A hosted voice and chat QA runner with automated test case generation from an agent definition is the highest-value workflow on the procurement checklist.
- A published persona library plus multi-turn scenario authoring inside a closed hosted product is the procurement priority.
- Cisco-aligned enterprise telephony procurement through Cekura’s Cisco partnership lines up with your Cisco Webex or Cisco Contact Center stack.
- You already have native production voice observability, an Apache 2.0 eval template catalog, an inline guardrail layer, and a prompt optimization loop wired into other tools, and you want a focused QA runner that does not duplicate any of those.
Verdict matrix: when to pick which
| Situation | Best pick | Why |
|---|---|---|
| Full-stack voice or chat agent platform in one project | Future AGI | Native observability + 70+ built-in eval templates + simulation + Protect + agent-opt all ship in one product |
| Native voice observability across Vapi, Retell, LiveKit with zero SDK | Future AGI | Provider API key plus assistant ID into an Agent Definition starts call capture in minutes; Cekura’s shape is QA-runner-first |
| Inline AI guardrails at sub-100ms (prompt injection, PII) | Future AGI | Future AGI Protect on Gemma 3n with LoRA adapters per arXiv 2510.13351 across four dimensions plus ProtectFlash for sub-100ms binary; Cekura does not ship a runtime enforcement model |
| Apache 2.0 eval template catalog called by slug | Future AGI | ai-evaluation ships 70+ templates including audio_transcription, audio_quality, conversation_coherence, conversation_resolution, evaluate_function_calling, translation_accuracy, cultural_sensitivity; pip install today |
| Closed-loop prompt optimization with six algorithms in UI and SDK | Future AGI | agent-opt ships Bayesian Search, Meta-Prompt, ProTeGi, GEPA, Random Search, PromptWizard with eval-gated promotion; out of scope on Cekura |
| Deep persona simulation with parameterized authoring | Future AGI | 18 pre-built plus unlimited custom across nine authoring axes (identity, age range, location, behavioural, conversation, language, custom properties, free-form, accent / noise) inside Workflow Builder |
| Workflow Builder auto-generated branching scenarios with branch visibility | Future AGI | 20 / 50 / 100-row auto-suites with three node types and branch visibility; four scenario modes ship inside one project |
| OpenInference span model for ASR / LLM / TTS instrumentation | Future AGI | traceAI Apache 2.0, 30+ documented integrations, dedicated traceAI-pipecat and traceai-livekit packages; Cekura does not publish OpenInference spans |
| Five-cert compliance set for regulated buyers | Future AGI | SOC 2 Type II, HIPAA, GDPR, CCPA, ISO 27001 certified today; ISO 42001 in progress; BYOC and AWS Marketplace available |
| Hosted voice and chat QA runner with Cisco-aligned procurement | Shortlist Cekura | Cekura’s focus area: automated test framework, persona library, Cisco partnership for Webex and Contact Center stacks |
| Procurement driver requires Cisco Webex or Cisco Contact Center alignment | Shortlist Cekura | Cekura’s Cisco partnership is the lever buyers reach for in those stacks |
How the closed loop changes the math
The closed loop in practice. Production calls flow into a Future AGI Agent Definition through provider credentials (Vapi or Retell API key plus assistant ID; LiveKit URL plus API key, secret, and agent name). traceAI emits OpenInference span trees that carry gen_ai.voice.* attributes for ASR, LLM, TTS, and tool spans, plus gen_ai.evaluation.* attributes for scored evaluators. ai-evaluation scores each captured call against rubrics drawn from the 70+ built-in catalog or any custom evaluator authored by the in-product agent. Low-scoring sessions cluster by failure mode in Error Feed without configuration; clusters become signal for the team to carry back into Workflow Builder for regression coverage or to promote into the dataset that feeds agent-opt. agent-opt proposes prompt candidates against the eval-scored dataset across six optimizer algorithms; the team gates the winners with a separate eval run before shipping. The Future AGI Protect family enforces inline at the request boundary across four dimensions per arXiv 2510.13351, with ProtectFlash on the sub-100ms binary path when budget matters more than per-dimension granularity.
Net effect for continuous voice and chat workloads: failure modes get named, scenarios get authored from real production traces, prompt candidates get scored before they ship, and inline policy enforcement stays aligned with offline eval rubrics. The loop closes inside one project, on one bill, with the Apache 2.0 triad readable on GitHub before procurement.
For teams already running Cekura, the composition pattern is clean: keep Cekura’s hosted QA runner as the pre-launch test workflow, and add Future AGI for native voice observability, the Apache 2.0 eval catalog, Error Feed clustering, inline Protect guardrails, and the agent-opt loop. The OpenInference contract and shared provider integrations let the two stacks compose without duplicating instrumentation. For greenfield teams, picking Future AGI standalone gives you the whole lifecycle in one product.
For the wider landscape, the best Cekura AI alternatives in 2026 listicle covers the cohort.
Related reading
- Best Cekura AI alternatives in 2026
- Best voice AI in May 2026
- How to monitor AI voice agents in production in 2026
- Three-layer voice testing in 2026
- Agent passes evals but fails in production: a 2026 diagnosis
Sources
- Future AGI ai-evaluation Apache 2.0 catalog,
github.com/future-agi/ai-evaluation - Future AGI traceAI Apache 2.0 integrations,
github.com/future-agi/traceAI - Future AGI agent-opt Apache 2.0 optimizers,
github.com/future-agi/agent-opt - Future AGI Protect paper,
arxiv.org/abs/2510.13351 - agent-opt GEPA optimizer,
arxiv.org/abs/2507.19457 - agent-opt Meta-Prompt optimizer,
arxiv.org/abs/2505.09666 - agent-opt Random Search baseline,
arxiv.org/abs/2311.09569 - Future AGI Simulation personas documentation,
docs.futureagi.com/product/simulation/personas - Future AGI Workflow Builder scenarios documentation,
docs.futureagi.com/product/simulation/scenarios - Future AGI Run Tests wizard documentation,
docs.futureagi.com/product/simulation/run-tests - Future AGI optimization optimizers overview,
docs.futureagi.com/future-agi/get-started/optimization/optimizers/overview - Future AGI trust portal,
futureagi.com/trust - Future AGI pricing,
futureagi.com/pricing - Cekura product surface and persona library,
cekura.ai(snapshot 2026-05-19)
Frequently asked questions
What is the main difference between Future AGI and Cekura for voice agent testing?
Is Future AGI open-source? Is Cekura open-source?
Which platform has stronger persona simulation?
Can I monitor a Vapi or Retell agent with Future AGI without instrumenting code?
How do guardrails compare?
What compliance certifications do both vendors hold?
Can I use Future AGI alongside Cekura instead of replacing it?
How does pricing compare?
Future AGI vs Bluejay on simulation, native voice observability, eval depth, inline guardrails, the optimizer loop, pricing, and compliance. The honest verdict for 2026 voice teams.
Future AGI vs Coval scored on simulation, native voice observability, evaluation, inline guardrails, optimization, pricing, and compliance. Honest verdict, May 2026 pricing, where each one falls short, and how the loop changes the math.
Future AGI vs Hamming compared across eval rubrics, native voice observability, simulation depth, inline guardrails, optimization, and compliance. Where each platform actually fits in 2026.