Home / Changelog / 2025 Week 44

Oct 27 – Oct 31, 2025 2025 W44

Credit Usage Revamp and Multi-Language Agents

Redesigned credit tracking, a guided agent builder, and multi-language simulation support ship in one massive release.

Platform Agents Simulate SDK

4 New TTS providers

15+ Languages supported

What's in this digest

Platform Credit usage summary redesign New

Agents New agent definition UX New

Platform Prompt Workbench revamp New

Agents Multi-language support in agent definition New

Simulate Add columns to scenarios via AI and manual inputs Improved

Simulate Enhanced language and accent support in simulation Improved

Simulate Simulate metrics revamp Improved

SDK ai-evaluation v0.2.2 New

Platform Call analytics integration Improved

Simulate Detailed voice provider logs Improved

Simulate New TTS model integrations New

SDK traceAI LiveKit SDK New

Credit Usage Summary Redesign

Every team eventually asks the same question: where is our compute going? The previous credit dashboard gave you a number. The new one gives you a story.

The redesigned credit usage summary introduces workspace-level attribution. Every credit consumed is tagged to a specific feature — whether it was an evaluation run, a simulation batch, or an agent test. Drill into any time period, filter by team member or project, and see exactly what drove usage spikes. Finance teams finally get the granularity they need to forecast AI spend, and engineering teams get the visibility they need to optimize their workflows.

Historical trend lines show usage patterns over time, making it straightforward to catch anomalies before they become budget problems.

New Agent Definition UX

Building an agent on Future AGI used to require bouncing between multiple configuration screens. The new 3-step guided flow consolidates everything into a single, linear experience.

Step one: define the agent’s identity, language, and behavioral constraints. Step two: configure tools, knowledge bases, and provider integrations. Step three: preview the agent in a sandbox environment before deploying. Each step includes inline validation, so misconfigurations surface immediately rather than at runtime.

This is paired with multi-language support in agent definitions. Agents can now operate natively in over 15 languages, with locale-aware behavior that goes beyond simple translation. The agent understands cultural norms, date formats, and conversational patterns specific to each language.

Prompt Workbench Revamp

Prompt engineering is iterative by nature, and iteration without version control is chaos. The revamped Prompt Workbench introduces commit-based version history — think git, but for prompts.

Every change to a prompt is captured as a discrete commit. You can diff any two versions, roll back to a known-good state, and branch prompts for A/B testing. Teams working on the same agent can now collaborate on prompt development without overwriting each other’s work.

ai-evaluation v0.2.2

The SDK gets a significant upgrade. LLM-as-a-Judge is now a first-class evaluation method, letting you use a language model to score outputs against custom rubrics. On the heuristic side, new metrics cover JSON schema validation, string similarity scoring, exact match checking, and aggregation functions for batch evaluations.

These metrics are composable. Chain them together to build evaluation pipelines that match your specific quality bar.

Voice Simulation Expansion

Four new TTS providers — Cartesia, Hume, Neuphonics, and LMNT — join the simulation engine. Each provider brings distinct voice characteristics, from ultra-low-latency synthesis to emotionally expressive speech. Combined with enhanced language and accent support, simulations now cover a far broader range of real-world conversational scenarios.

Detailed voice provider logs capture every request and response, giving teams full observability into how voice synthesis behaves under different conditions. The new traceAI LiveKit SDK extends this observability to real-time voice and video agents built on LiveKit infrastructure.

Simulate Metrics Revamp

The simulation metrics dashboard has been rebuilt from the ground up. Real-time pass/fail rates update as simulations run, with drill-down capabilities that let you inspect individual test cases directly from the metrics view. Custom columns can now be added to scenarios via AI-powered generation or manual input, making it possible to enrich test data without leaving the platform.

Older

Outbound Calls, Retell, and Tool Evaluation

Newer

Logs, Latency, and the Simulate Revamp

All changelog entries

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Platform

Audience

LEARN

DEVELOPERS

Featured

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Credit Usage Revamp and Multi-Language Agents

What's in this digest

Credit Usage Summary Redesign

New Agent Definition UX

Prompt Workbench Revamp

ai-evaluation v0.2.2

Voice Simulation Expansion

Simulate Metrics Revamp

Mastering AI Agent Evaluation

The Agentic RAG Playbook

Credit Usage Revamp and Multi-Language Agents

What's in this digest

Credit Usage Summary Redesign

New Agent Definition UX

Prompt Workbench Revamp

ai-evaluation v0.2.2

Voice Simulation Expansion

Simulate Metrics Revamp

FutureAGI AI Assistant