Home / Changelog

Changelog

Weekly digests of everything we ship. New features, improvements, and fixes to the Future AGI platform.

Follow
W14 Mar 30 – Apr 3, 2026

Agent Playground Node-to-Node and Speed Improvements

Chain prompts visually with input/output port mapping between nodes and experience a 4x faster frontend across the platform.

Agent Playground Node-to-Node Connections
Agents New

Agent Playground Node-to-Node Connections

Chain prompts visually with input/output port mapping between nodes, enabling complex multi-step agent workflows without writing orchestration code.

4x faster page loads
0 code agent building
100% auto-save coverage
Agents

Agent Playground node-to-node connections

New

Chain prompts visually with input/output port mapping between nodes for complex multi-step agent workflows.

Evaluate

Annotation queue UX overhaul

Improved

Multi-assignment and prefetching for annotation queues, reducing reviewer wait times and enabling parallel review workflows.

Platform

Frontend speed improvements

New

Four separate optimization PRs delivering 4x faster page loads across the platform.

Simulate

Voice observe-to-simulation bridge

Improved

Test voice agents using real call data captured through Observe, turning production conversations into simulation scenarios.

Evaluate

Task description field for optimizers

Improved

Add task descriptions to all optimizer types for better context and documentation of optimization goals.

Agents

Agent playground save and auto-save

Improved

Automatic saving with conflict resolution and manual save checkpoints in the Agent Playground.

Agents

Agent playground version management

Improved

Create, browse, compare, and restore named versions of agent playground graphs.

Agents

Enhanced error handling in Agent Playground

Fixed

Improved error messages, cache invalidation, and recovery flows across all Agent Playground operations.

Monitor

ClickHouse Replicated MergeTree migration

Improved

Trace storage upgraded to Replicated MergeTree for high availability and automatic failover.

API

Granular CRUD APIs for Agent Playground

Improved

Fine-grained APIs for node connections, ports, and edge mappings enabling programmatic agent graph management.

Platform

Prompt generation and improvement

Improved

AI-assisted prompt generation from task descriptions and one-click prompt improvement suggestions.

Read full digest
W12 Mar 23 – Mar 27, 2026

Dashboards, Falcon AI, and the MCP Server

Custom dashboards for agent performance, an AI assistant embedded in the platform, and an MCP server that brings Future AGI into your IDE.

Custom Dashboards
Platform New

Custom Dashboards

Track agent performance across evaluation scores, system metrics, cost, and experiment progress with a drag-and-drop dashboard builder.

5 IDE integrations
8 external platform integrations
4 language SDKs
Platform

Falcon AI

New

Context-aware AI assistant embedded in the platform for trace debugging, simulation creation, evaluation building, and dataset construction.

SDK

MCP Server

New

Connect Future AGI to Cursor, Claude Code, VS Code, Claude Desktop, and Windsurf for access to evaluations, datasets, experiments, traces, and prompts.

Platform

Role-based access control

New

Four roles at organization and workspace levels with two-factor authentication, passkeys, and recovery codes.

Platform

Integrations hub

New

Connect Langfuse, Datadog, PostHog, PagerDuty, Mixpanel, S3, Azure Blob, and GCS from a single configuration page.

SDK

traceAI Java and C# support

Improved

traceAI instrumentation now supports Java and C# in addition to Python and TypeScript.

SDK

futureagi-mcp-server tools

Improved

MCP server ships with evaluate, protect, datasets, and synthetic data generation tool endpoints.

Read full digest
W10 Mar 9 – Mar 13, 2026

Command Center Gateway and ClickHouse Migration

A new LLM gateway with multi-provider routing, guardrails, and cost controls, backed by a ClickHouse migration that transforms trace query performance.

Prism Gateway (Command Center)
Platform New

Prism Gateway (Command Center)

A new LLM gateway with multi-provider support, API key management, guardrails, fallbacks, cost tracking, budgets, and analytics through an OpenAI-compatible endpoint.

15+ LLM providers
9 load balancing strategies
2 data regions
Monitor

ClickHouse migration for trace storage

New

Trace storage moved to ClickHouse for massive query performance improvements on high-volume workloads.

Evaluate

Annotation queue

New

Create annotation queues for traces, sessions, datasets, and simulation outputs to organize human review workflows.

Platform

Multi-region support

New

Deploy and store data in US or EU regions for compliance with data residency requirements.

Agents

Agent playground granular CRUD

Improved

Fine-grained create, read, update, and delete operations for nodes, ports, and connections in the Agent Playground.

Evaluate

Dataset and simulation analytics

Improved

Unified analytics dashboard API covering dataset quality metrics and simulation result trends.

Guard

Redis pub/sub for key revocation

Improved

Instant API key revocation across all replicas via Redis pub/sub, closing the window between revocation and enforcement.

API

Auth middleware for gateway

Improved

API key validation middleware for the Prism Gateway with rate limiting and usage tracking per key.

Read full digest
W8 Feb 23 – Feb 27, 2026

Deep Space Theme and ai-evaluation 1.0

A comprehensive dark mode redesign across the entire platform and the 1.0 release of the ai-evaluation SDK with 72+ metrics and multimodal judging.

ai-evaluation v1.0.0
Evaluate New

ai-evaluation v1.0.0

The unified evaluate() API with multimodal LLM judging, auto-generated grading criteria, feedback loops, 72+ local metrics, and OpenTelemetry integration.

72+ evaluation metrics
4 programming languages
31 new TS packages
Platform

Deep Space dark mode migration

New

Comprehensive monochrome theme applied across the entire platform for reduced eye strain and visual consistency.

SDK

traceAI C# SDK

New

Full traceAI instrumentation support for C# applications and .NET environments.

SDK

traceAI Java SDK

New

Comprehensive Java SDK with 25 instrumentation modules covering major frameworks and libraries.

SDK

31 new TypeScript instrumentor packages

New

Massive expansion of TypeScript instrumentation covering frameworks, databases, HTTP clients, and more.

SDK

traceAI Python SDK update

Improved

New framework support and end-to-end tests for the Python instrumentation SDK.

Platform

Per-tab workspace context

Improved

Each browser tab maintains independent workspace context via sessionStorage, preventing cross-tab interference.

Agents

Agent playground version activation

Improved

Activate and manage specific versions of agent playground graphs with one-click version switching.

Platform

RBAC workspace settings with usage summary

Improved

Role-based access control settings page with workspace-level usage metrics and member management.

Evaluate

Function parameters in evaluations

Improved

Pass function parameters directly to evaluation metrics for dynamic scoring configurations.

Evaluate

Reasoning parameters support in prompts

Improved

Configure reasoning-specific parameters when working with chain-of-thought models in the Prompt Workbench.

Read full digest
W6 Feb 9 – Feb 13, 2026

Simulate from Prompt Workbench

Launch simulations directly from the Prompt Workbench and annotate voice calls with structured human feedback.

Simulate from Prompt Workbench
Simulate New

Simulate from Prompt Workbench

Add and configure simulations directly through the Prompt Workbench, eliminating context switching between prompt engineering and testing.

5 label types for voice annotations
2x faster file processing
Simulate

Simulate using Prompt Workbench

New

Add and configure simulations directly through the Prompt Workbench without switching contexts.

Evaluate

Human annotations for voice calls

New

Structured feedback with five label types and support for multiple reviewers on voice agent transcripts.

Monitor

Agent health monitoring for voice agents

New

Agent Compass now supports voice agents, providing real-time health metrics for call-based AI systems.

Evaluate

Multi-image support in evaluations

Improved

Evaluations now accept and score multi-image inputs for comprehensive multimodal assessment.

Platform

Reasoning model support

Improved

First-class support for reasoning models with chain-of-thought visibility in traces and evaluations.

Simulate

WebSocket simulation grid updates

Improved

Simulation results stream to the grid in real time via WebSocket, eliminating manual refreshes.

Simulate

Image and audio output support in workbench

Improved

The Prompt Workbench now renders image and audio outputs inline for multimodal prompt iteration.

Platform

Azure endpoint type selector

Improved

Select Azure-specific endpoint types when configuring custom models, with proper API format handling.

Agents

Workflow execution management in Agent Playground

Improved

Monitor, pause, resume, and cancel workflow executions directly from the Agent Playground interface.

Platform

Read-write database split

Improved

Separate read and write database connections for improved query performance under load.

Platform

Polars-based file processing

Improved

CSV, Excel, and JSON file processing rebuilt on Polars for dramatically faster dataset imports.

Simulate

Faster simulation results and evaluations dashboard

Improved

Optimized queries and pagination for simulation results and the evaluations dashboard.

Read full digest
W4 Jan 26 – Jan 30, 2026

Agent Playground - Build Multi-Step Agents Visually

Design, test, and deploy multi-step AI agents with a visual graph builder that requires zero code.

Agent Playground
Agents New

Agent Playground

A visual graph builder for multi-step AI agents with a node-based workflow editor, global variables, template management, and a draft/publish workflow.

3 node types
0 code required
4 agent framework wrappers
Evaluate

Dataset optimization with direct evaluation

Improved

Run evaluations directly from datasets and manage trial items without leaving the dataset view.

Evaluate

Image output support in datasets and Prompt Workbench

Improved

Datasets and the Prompt Workbench now render image outputs natively, supporting multimodal evaluation workflows.

Evaluate

Multiple image upload in datasets

Improved

Upload multiple images at once when building datasets, eliminating one-by-one file selection.

Simulate

Baseline chat comparison from Observe to Simulation

Improved

Compare real production conversations against simulated outputs to identify drift and regressions.

Simulate

Bulk delete and bulk rerun test executions

Improved

Select multiple test executions and delete or rerun them in a single action.

Agents

Agent builder framework

New

Core Graph, Node, and Execution models that power the Agent Playground with cycle detection and validation for graph connections.

SDK

simulate-sdk v0.1.2

Improved

Cloud mode, agent wrappers for OpenAI, LangChain, Gemini, and Anthropic, plus tool call support.

Read full digest
W2 Jan 12 – Jan 16, 2026

Chat Sim via Observe and Pre-Built Eval Groups

Simulate directly from real customer interactions and evaluate with 10 ready-to-use evaluation groups -- no configuration required.

Chat Simulation via Observe
Simulate New

Chat Simulation via Observe

Launch chat simulations directly from real customer interactions in Observe, with auto-generated transcripts and scenarios that mirror production conversations.

10 Pre-built eval groups
1-click Simulate from Observe
Simulate

Chat simulation via Observe

New

Simulate directly from real customer interactions with auto-generated transcripts and scenarios.

Evaluate

Pre-built evaluation groups for simulations

New

10 ready-to-use evaluation groups covering common quality dimensions for agent testing.

Agents

Fix My Agent support for chat agents

New

AI-powered debugging now works for chat-based agents with text-specific diagnostic capabilities.

Agents

Agent prompt optimization on the platform

Improved

Run prompt optimization strategies directly from the platform UI without API calls.

Platform

Dynamic model params update based on API

Improved

Model parameters automatically refresh based on provider API capabilities and availability.

Platform

Audio content validation for audio models

Improved

Automatic validation of audio content format and quality before submission to audio models.

API

Replay sessions CRUD APIs

Improved

Full create, read, update, and delete API endpoints for managing replay sessions programmatically.

Agents

Enhanced optimization workflow

Improved

Streamlined optimization pipeline with progress tracking and intermediate result previews.

Simulate

Streamlined persona management in scenarios

Improved

Assign and swap personas within scenarios using a simplified inline interface.

Simulate

Complete simulation status visibility

Improved

Real-time status tracking for all simulation runs with stage-level progress indicators.

Platform

API key management

Improved

Delete API keys from the dashboard to revoke access and maintain security hygiene.

Read full digest
W52 Dec 22 – Dec 26, 2025

Chat Simulation V1 and Replay Sessions

Simulate chat-based agents with realistic conversations and replay real production sessions to reproduce and fix issues.

Chat Simulation V1
Simulate New

Chat Simulation V1

A complete simulation engine for chat-based agents -- generate realistic text conversations with personas and scenarios, then evaluate quality at scale.

4 Optimization strategies
200+ Conversation turns per simulation
Simulate

Replay sessions from real traces

New

Re-run historical production conversations through your current agent to test regressions and improvements.

Agents

Agent prompt optimiser

New

Automated prompt optimization with GEPA, MetaPrompt, ProTeGI, and Bayesian strategies.

Agents

Optimize My Agent V3

New

Select specific calls for targeted optimization rather than optimizing across the full dataset.

Platform

PDF preview across platform

Improved

Inline PDF rendering in traces, sessions, and evaluation results.

Platform

Dot notation JSON support

Improved

Reference nested JSON fields with dot notation in prompt templates and configurations.

Simulate

Chat simulation persona section

Improved

Dedicated persona configuration panel for chat simulation with role, personality, and behavior settings.

Simulate

Call analytics drawer for chat sim

Improved

Inline analytics drawer showing cost, latency, and quality metrics for each chat simulation run.

Simulate

Instruction input in scenario creation

Improved

Provide natural language instructions to guide AI-powered scenario generation.

Simulate

Create scenario from Observe

Improved

Convert real production sessions into reusable simulation scenarios with one click.

Platform

Temporal migration for async workloads

Improved

Core simulation and optimization workloads migrated to Temporal for reliability and observability.

Read full digest
W50 Dec 8 – Dec 12, 2025

Fix My Agent -- AI-Powered Debugging

Stop guessing why your agent failed. Fix My Agent analyzes simulation results and tells you exactly what went wrong and how to fix it.

Fix My Agent
Agents New

Fix My Agent

AI-powered debugging that analyzes simulation failures, identifies root causes at both the agent and infrastructure level, and generates actionable fix suggestions.

0 Manual debugging needed
3x Faster issue resolution
Simulate

Persona management suite

New

Full lifecycle management for simulation personas: view, duplicate, edit, and delete.

Evaluate

Edit experiment configuration after starting

Improved

Modify experiment parameters mid-run without restarting from scratch.

Platform

JSON dot notation in Run Prompts and Experiments

Improved

Reference nested JSON fields using dot notation in prompt templates and experiment configs.

Platform

Enhanced table rendering in traces

Improved

Structured data in traces now renders as formatted tables with sorting and filtering.

Platform

PDF and document preview

Improved

Preview PDFs and documents inline across the platform without downloading.

Platform

Enhanced audio player with lazy loading

Improved

Redesigned audio player with lazy loading for faster page loads on session-heavy views.

Simulate

Real-time loading states for calls

Improved

Live progress indicators for ongoing calls with estimated time remaining.

Agents

Fetch agent definition from providers

Improved

Import agent configurations directly from Vapi and Retell with one click.

Agents

Agent prompt optimiser backend

Improved

Backend infrastructure for the prompt optimiser including models, views, serializers, and Temporal migration.

Read full digest
W48 Nov 24 – Nov 28, 2025

Scenario Branches and Custom Background Noises

Branching scenarios reveal how agents handle divergent conversations, and custom background noises push simulations closer to production reality.

Multi-Branch Scenario Generation
Simulate New

Multi-Branch Scenario Generation

Generate test scenarios that branch into multiple conversation paths, revealing how your agent handles forks in user intent, topic switches, and unexpected detours.

10+ Background noise profiles
3 Scenario branch types
Simulate

Scenario generation with branch visibility

New

Visualize and navigate branching conversation paths within generated test scenarios.

Agents

Enable Others option for agent definition

New

Define agents using custom providers beyond the standard Vapi and Retell integrations.

Simulate

Custom background noises for simulation

New

Add realistic ambient noise profiles to voice simulations for production-accurate testing.

Simulate

Multi-branch scenario generation

New

AI-powered generation of branching conversation trees covering multiple user intent paths.

Platform

JSON input/output in session view

Improved

Backend migration from string to JSON fields enables structured data display in session views.

Evaluate

Eval explanation summary for simulations

Improved

Human-readable summaries explaining why each evaluation scored the way it did.

Platform

Prompt WebSocket streaming

Improved

Real-time prompt execution with streaming responses via WebSocket connections.

Evaluate

Edit evaluation variable remapping

Improved

Remap evaluation variables after creation without rebuilding the entire evaluation configuration.

Monitor

Observe enhancements

Improved

Sticky filters, pagination improvements, metadata display, and updated pricing logic in Observe.

Simulate

Simulated assistant call ending fixes

Fixed

Fixed edge cases where simulated assistant calls would not terminate cleanly.

Read full digest
W46 Nov 10 – Nov 14, 2025

Logs, Latency, and the Simulate Revamp

Full simulation observability with cost breakdowns, sub-100ms latency tracking, and a revamped experiment workflow.

Simulation Call Observability
Simulate New

Simulation Call Observability

Every simulation call now surfaces logs, latency metrics, and cost breakdowns -- giving teams complete visibility into what happened, how fast it happened, and what it cost.

3 New voice providers
sub-100ms Latency tracking
Simulate

Logs, latency metrics, and cost breakdown in simulation calls

New

Full observability for every simulation call with detailed logs, per-call latency, and granular cost attribution.

Simulate

Run Prompt and Experiment revamp

New

Contextual provider selection and streamlined configuration for prompt runs and experiments.

Monitor

Expanded evaluation attributes in voice observability

Improved

New evaluation dimensions for voice quality, latency, and naturalness in voice agent monitoring.

Simulate

Reasoning column in Simulate

Improved

View the reasoning trace behind each simulation decision directly in the results table.

Simulate

Custom voices in Run Prompt and Experiments

Improved

Use custom voices from Eleven Labs and Cartesia in prompt runs and experiment workflows.

Evaluate

Updated performance metrics in Run Test

Improved

Revised metrics suite in Run Test with more actionable performance indicators.

Evaluate

Edit evaluations within experiment page

Improved

Modify evaluation configurations inline without leaving the experiment view.

API

Configure and re-run evaluations via API

Improved

Programmatically configure evaluation parameters and trigger re-runs through the API.

Simulate

Error localization in Simulate

Improved

Errors in simulation runs are now pinpointed to the exact step and provider that caused them.

Platform

Session history enhancements

Improved

Improved session history with Indian language support and full transcript rendering.

Monitor

Observe homepage revamp

Improved

Redesigned Observe landing page with faster load times and better navigation.

Read full digest
W44 Oct 27 – Oct 31, 2025

Credit Usage Revamp and Multi-Language Agents

Redesigned credit tracking, a guided agent builder, and multi-language simulation support ship in one massive release.

Credit Usage Summary Redesign
Platform New

Credit Usage Summary Redesign

Workspace-level credit attribution gives teams full visibility into exactly where compute is spent across agents, simulations, and evaluations.

4 New TTS providers
15+ Languages supported
Platform

Credit usage summary redesign

New

Workspace-level credit attribution with per-feature breakdowns and historical trends.

Agents

New agent definition UX

New

A 3-step guided flow for building agents from scratch with inline validation and previews.

Platform

Prompt Workbench revamp

New

Commit-based version history brings git-style prompt management to the Workbench.

Agents

Multi-language support in agent definition

New

Define agents that operate natively in 15+ languages with locale-aware behavior.

Simulate

Add columns to scenarios via AI and manual inputs

Improved

Enrich simulation scenarios with custom data columns using AI generation or manual entry.

Simulate

Enhanced language and accent support in simulation

Improved

Broader dialect and accent coverage for realistic multi-language voice simulations.

Simulate

Simulate metrics revamp

Improved

Redesigned metrics dashboard with real-time pass/fail rates and drill-down capabilities.

SDK

ai-evaluation v0.2.2

New

LLM-as-a-Judge, heuristic metrics for JSON, similarity, string matching, and aggregation.

Platform

Call analytics integration

Improved

Unified analytics for voice agent calls with cost, duration, and quality breakdowns.

Simulate

Detailed voice provider logs

Improved

Full request and response logs for every voice provider interaction during simulation.

Simulate

New TTS model integrations

New

Added Cartesia, Hume, Neuphonics, and LMNT as text-to-speech providers.

SDK

traceAI LiveKit SDK

New

Native tracing support for LiveKit-powered real-time voice and video agents.

Read full digest
W42 Oct 13 – Oct 17, 2025

Outbound Calls, Retell, and Tool Evaluation

Test outbound voice flows, simulate with Retell agents, verify tool calls in simulation, and ship with 50+ evaluation templates in the new ai-evaluation SDK.

Outbound Calling in Simulation
Simulate New

Outbound Calling in Simulation

Test outbound voice agent flows where your agent initiates calls to customers, prospects, or patients.

2 Voice providers
50+ Eval templates
3 Persona sources
Simulate

Outbound calling support in simulation

New

Simulate outbound voice flows where your agent places the call, testing appointment reminders, sales outreach, and proactive support scenarios.

Simulate

Retell integration for agent simulation

New

Native Retell support for voice agent simulation, joining Vapi as the second supported voice provider.

Evaluate

Tool evaluation in Simulate

New

Verify that voice agents call the correct tools and functions during simulation, catching integration errors before production.

Evaluate

Provider transcript as evaluation attribute

Improved

Use the voice provider's native transcript as an evaluation input for comparing ASR accuracy and response quality.

Simulate

Pre-built and custom persona feature

Improved

Choose from pre-built caller personas or create custom ones with specific demographics, behaviors, and communication styles.

Platform

Enhanced user onboarding flow

Improved

Streamlined onboarding with role-specific paths, interactive tutorials, and faster time-to-first-evaluation.

Platform

Updated pricing calculation in Observe

Fixed

Cost tracking calculated at ingestion time for accurate, real-time usage reporting without post-processing delays.

Simulate

Voice output in Run Prompt and Run Experiment

Improved

Generate and evaluate voice outputs directly from the prompt playground and experiment workflows.

Simulate

Add rows in simulate scenarios

Improved

Add scenario rows manually, generate them with AI, or import from existing datasets for flexible test case management.

Evaluate

Run evaluations for completed test runs

Improved

Apply new evaluation criteria to previously completed simulation test runs without re-executing the calls.

Agents

Agent definition version selection

Improved

Select and test against specific agent definition versions for precise regression testing and A/B comparisons.

SDK

ai-evaluation v0.1.5

New

Initial SDK release with 50+ evaluation templates covering faithfulness, relevance, safety, and domain-specific quality metrics.

SDK

ai-evaluation v0.2.1

Improved

Batch evaluation support and bias detection capabilities added to the ai-evaluation SDK.

SDK

traceAI OpenAI Agents support

Improved

Native instrumentation for OpenAI's Agents SDK, capturing tool calls, handoffs, and multi-agent orchestration traces.

Read full digest
W40 Sep 29 – Oct 3, 2025

Voice Observability via Vapi

Full visibility into voice agent performance through native Vapi integration, plus SDK-powered simulation and eval group optimization.

Voice Observability through Vapi
Monitor New

Voice Observability through Vapi

Native Vapi integration delivers complete visibility into voice agent performance, from call-level metrics to utterance-level analysis.

100% Voice agent visibility
60-70% Cost reduction via SDK simulation
Monitor

Voice observability through Vapi integration

New

Full observability for Vapi-powered voice agents with call metrics, transcript analysis, and utterance-level performance tracking.

Evaluate

Eval groups in experiments and optimization

New

Run evaluation groups within experiments and use group-level scores for automated prompt and agent optimization.

SDK

Simulate via SDK

New

Trigger ultra-low-latency customer call simulations programmatically through the SDK against LiveKit agents.

Simulate

Selective test rerun in Simulate

Improved

Rerun specific failed or flagged test cases without re-executing the entire simulation suite.

Evaluate

Default eval groups

Improved

Pre-built evaluation groups for common use cases including RAG, Computer Vision, conversational AI, and more.

Simulate

Advanced simulation management

Improved

Auto-refresh dashboards, stop running simulations, and visual workflow tracing for simulation orchestration.

Platform

Workbench revamp

Improved

Redesigned Workbench with integrated code drawer and refreshed header for a streamlined development experience.

Platform

Agent definition design changes

Fixed

Updated agent definition interface with clearer layout and improved navigation for complex agent configurations.

SDK

traceAI session support

Improved

Native session.id attribute support in traceAI for automatic session grouping across all instrumented frameworks.

Read full digest
W38 Sep 15 – Sep 19, 2025

Scenario Builder and Session Tracking

Upload SOPs and transcripts to auto-generate test scenarios with edge cases, plus simplified session-level observability.

Automated Scenario and Workflow Builder
Simulate New

Automated Scenario and Workflow Builder

Upload SOPs or call transcripts and automatically generate comprehensive test scenarios with edge cases and failure modes.

10x Faster scenario generation
3 Audio download formats
Simulate

Automated scenario and workflow builder

New

Upload SOPs, transcripts, or documentation to auto-generate simulation scenarios with edge cases and branching conversation flows.

Agents

Agent definition versioning

New

Version control for agent definitions with commit messages and consolidated test reports across versions.

Monitor

Simplified session tracking

Improved

Track user sessions with a single session.id attribute on spans, enabling session-level observability without complex setup.

Evaluate

Advanced evaluation group management

Improved

Full CRUD operations for evaluation groups, enabling teams to organize and manage evaluation suites programmatically.

Simulate

Enhanced call management with multi-channel audio player

Improved

Listen to simulation calls with a multi-channel audio player that separates agent and caller audio streams.

Simulate

Flexible call recording downloads

Improved

Download call recordings in multiple audio formats for offline analysis, compliance archiving, and sharing.

Platform

Prompt collaboration features

Improved

Collaborative editing and commenting on prompts, enabling team-based prompt development workflows.

Platform

Annotation and prompt import fixes in datasets

Fixed

Resolved issues with importing annotations and prompt data into datasets for smoother data pipeline operations.

Platform

Dedicated Celery worker pool for trace ingestion

Improved

Dedicated background worker pool for trace ingestion, improving throughput and reducing processing latency.

Platform

Optimized trace ingestion pipeline

Improved

End-to-end optimization of the trace ingestion pipeline for faster data availability and reduced resource consumption.

Read full digest
W36 Sep 1 – Sep 5, 2025

Agent Compass and Enterprise Security

Zero-config, trace-level performance insights for AI agents plus enterprise-grade security with comprehensive RBAC.

Agent Compass
Agents New

Agent Compass

Zero-config, trace-level performance insights that automatically detect issues in your AI agents without any evaluation setup.

0 Config required for Agent Compass
5 Eval group templates
Evaluate

Annotation quality dashboard

New

Comprehensive dashboard for annotator agreement metrics including Cohen's kappa scores and inter-rater reliability analysis.

Platform

Enterprise multi-workspace security

New

Enterprise-grade security framework with multi-workspace isolation, RBAC policies, and comprehensive audit logging.

Monitor

Feed insights with error clusters

Improved

Advanced observability feed with automatic error clustering and trend analysis across your agent's execution history.

Platform

Intelligent onboarding navigation

Improved

Guided onboarding flow that adapts to your role and use case, getting teams productive faster.

Simulate

Enhanced voice agent testing and analytics

Improved

Dashboard metrics and scenario columns for voice agent simulations, providing deeper visibility into call performance.

Platform

Intelligent prompt organization

Improved

Folder-based prompt architecture with templates for organizing large prompt libraries across teams and projects.

Platform

Enhanced plans and pricing experience

Fixed

Redesigned pricing page with clearer plan comparisons and streamlined upgrade flows.

Evaluate

Eval grouping API integration

Improved

API support for evaluation grouping, enabling programmatic organization of related evaluations into logical sets.

Read full digest
W34 Aug 18 – Aug 22, 2025

Summary Dashboards, Alerts Revamp, and Prompt SDK

Rebuilt summary dashboards with rich visualizations, a completely revamped alerts system, and powerful new Prompt SDK capabilities.

Summary Screen Revamp
Monitor New

Summary Screen Revamp

Redesigned summary dashboards with spider charts, bar charts, pie charts, and side-by-side comparison views for instant quality insights.

4 Role levels
3 Chart types
Monitor

Summary screen revamp

New

Completely rebuilt summary dashboards with spider, bar, and pie chart visualizations plus side-by-side comparison views.

Monitor

Alerts revamp with Slack and email

New

Rebuilt alerting system with Slack and email notification channels, customizable thresholds, and intelligent alert grouping.

SDK

Prompt SDK upgrades

New

Caching, A/B testing, and multi-environment deployment support in the Prompt SDK for production-grade prompt management.

Platform

Workspaces RBAC

New

Role-based access control with Owner, Admin, Member, and Viewer roles for fine-grained workspace permissions.

Platform

AWS Marketplace integration

Improved

Purchase and manage your Future AGI subscription directly through AWS Marketplace with consolidated billing.

SDK

Error localizer via SDK

Improved

Synchronous and asynchronous standalone error localization through the SDK to pinpoint failures in agent execution chains.

Evaluate

Critical issue detection on datasets

Improved

Automatic detection of critical issues in datasets with actionable mitigation advice for data quality problems.

Monitor

Prompt metrics in Observe

Improved

Track trace performance per prompt version in Observe to measure the real-world impact of prompt changes.

SDK

traceAI optional dependencies cleanup

Fixed

Reduced install bloat by making framework-specific dependencies optional, cutting package size significantly.

Read full digest
W32 Aug 4 – Aug 8, 2025

Document Intelligence and Async Evaluations

Process documents natively in your datasets, run evaluations asynchronously via SDK, and compare prompt performance across experiments.

Document Column Support
Platform New

Document Column Support

Upload and process TXT, DOC, DOCX, PDF, and scanned documents directly in your datasets with built-in OCR.

5 Document types supported
50+ Eval templates
Evaluate

Function evaluations

New

Define custom function-type evaluations that execute arbitrary logic against agent outputs for precise, deterministic quality checks.

Platform

Edit synthetic data after generation

Improved

Modify AI-generated synthetic data before committing it to your datasets, giving you full control over data quality.

Platform

Document column support in datasets

New

Native support for TXT, DOC, DOCX, and PDF files as dataset columns, enabling document-centric evaluation workflows.

Monitor

User tab in Dashboard and Observe

Improved

New user-level views in Dashboard and Observe surfaces aggregate metrics per end-user across sessions and traces.

Monitor

Timestamp column in trace/spans

Fixed

Precise timestamp columns in trace and span views for accurate timing analysis and debugging.

Platform

Configure labels per prompt version

Improved

Tag each prompt version with custom labels to track experiments, A/B tests, and rollout stages.

SDK

Async evals via SDK

New

Submit evaluations asynchronously through the SDK, enabling non-blocking evaluation pipelines in production systems.

Monitor

Video support in Observe

Improved

Capture and replay video outputs from multimodal agents directly within the Observe interface.

Platform

OCR support for document processing

Improved

Optical character recognition for scanned documents and images, extracting text for evaluation and analysis.

Evaluate

Comparison summary

Improved

Compare evaluation results and prompt summaries across two datasets side-by-side to measure improvement.

SDK

Bulk annotation and user feedback via API/SDK

Improved

Submit annotations and feedback in bulk through the API and SDK for high-throughput labeling workflows.

Evaluate

JSON view for evals log

Fixed

Inspect raw evaluation log data in structured JSON format for debugging and integration purposes.

SDK

traceAI v0.1.10 with LLM prompt template labels

Improved

New traceAI release adds automatic prompt template labels to LLM spans for better trace organization.

SDK

traceAI Pipecat integration

Improved

Native Pipecat instrumentation for tracing voice and multimodal AI pipelines built with the Pipecat framework.

SDK

traceAI LlamaIndex TypeScript

Improved

TypeScript instrumentation for LlamaIndex, bringing observability to the popular RAG framework in Node.js environments.

Read full digest
W30 Jul 21 – Jul 25, 2025

Voice Simulation is Here

Test your voice agents with real AI-conducted phone calls powered by ultra-low-latency LiveKit infrastructure.

Call Simulation
Simulate New

Call Simulation

AI agents conduct real phone calls to test your voice agents end-to-end, powered by LiveKit for sub-second latency.

60-70% Cost reduction vs manual QA
sub-second Voice latency
Simulate

LiveKit-based ultra-low-latency voice testing

New

Sub-second voice latency powered by LiveKit infrastructure ensures simulation calls feel indistinguishable from real customer interactions.

Simulate

Simulator agent form and agent definition dropdowns

Improved

Configure simulation agents through an intuitive form interface with dropdown-based agent definition selection.

Simulate

Add scenarios from datasets

Improved

Import test scenarios directly from your existing datasets to run voice simulations against real-world conversation patterns.

Platform

Refresh token cycle for session management

Fixed

Automatic token refresh cycle ensures uninterrupted simulation sessions without manual re-authentication.

Platform

Mixpanel analytics integration

Improved

Full Mixpanel analytics integration across the platform for tracking usage patterns and feature adoption.

SDK

traceAI TypeScript Vercel instrumentor

Improved

First-class Vercel instrumentation for traceAI in TypeScript, enabling seamless observability for serverless AI deployments.

Evaluate

CRUD on custom evaluations

Improved

Full create, read, update, and delete operations for custom evaluations, giving teams complete control over their evaluation workflows.

Platform

Span name display in traces

Fixed

Span names now appear directly in the trace view, making it faster to navigate complex agent execution trees.

Evaluate

Add feedback to evals

Improved

Attach human feedback directly to evaluation results to build labeled datasets and improve evaluation accuracy over time.

Read full digest
W28 Jul 7 – Jul 11, 2025

System Metrics, Multimodal Tracing, and Eval Playground

Production-grade system metrics dashboards in Observe, multimodal tracing for AWS Bedrock, and a refined eval playground with standalone evaluation and feedback loops.

System Metrics in Observe
Monitor New

System Metrics in Observe

Monitor CPU, memory, latency, and throughput alongside your agent traces with built-in dashboards that give you the full production picture.

4 chart types
25+ instrumentors
Monitor

System metrics in Observe

New

Built-in dashboards for CPU, memory, latency, and throughput metrics alongside agent traces.

Monitor

Multimodal support for AWS Bedrock

New

Image tracing for AWS Bedrock models with full input/output capture of multimodal interactions.

Evaluate

Eval playground improvements

Improved

Standalone evaluation mode, feedback collection, and improved scoring visualization in the playground.

Evaluate

Multi-line graphs in evaluations

Improved

Plot multiple evaluation metrics on a single chart to correlate trends and spot regressions across dimensions.

Platform

API key management revamp

Improved

Redesigned API key management with improved UI, bulk operations, and clearer permission displays.

SDK

Google GenAI instrumentor

Improved

Automatic tracing for Google Generative AI SDK with support for Gemini models and function calling.

Evaluate

Langfuse evals integration

Improved

Backend integration enabling Langfuse evaluation data to flow into Future AGI for unified analysis.

Monitor

Annotation notes

Fixed

Add free-form text notes to annotations for richer context on human feedback.

Platform

Draft prompts

Fixed

Save work-in-progress prompts as drafts without publishing them to the shared prompt library.

SDK

traceAI v0.1.11

Improved

Google Gen AI instrumentor and multimodal AWS Bedrock support in the core SDK.

Read full digest
W26 Jun 23 – Jun 27, 2025

Alerts, gRPC, and the Observe Graph

Real-time alerting with Slack and email notifications, gRPC trace ingestion for 60% less latency, and a visual graph view of agent execution in Observe.

Alerts and Monitors
Monitor New

Alerts and Monitors

Set metric thresholds on your agent performance and get notified instantly via Slack or email when something goes wrong in production.

60% less latency with gRPC
2 notification channels
Monitor

Alerts and monitors

New

Configure metric-based alerts with Slack and email notifications for proactive production monitoring.

Platform

gRPC support for trace ingestion

New

High-performance gRPC transport for trace data with 60% lower latency and reduced bandwidth overhead.

Monitor

Observe graph visualization

New

Interactive directed graph view of agent execution showing the flow of calls, branches, and dependencies.

Platform

Developer keys

Improved

Dedicated API key management with scoped permissions, rotation, and usage tracking.

Platform

Model serving infrastructure

Improved

Internal infrastructure for serving evaluation and guardrail models with autoscaling and low-latency inference.

SDK

traceAI gRPC transport support

Improved

Python and TypeScript SDKs now support gRPC as the trace transport protocol alongside HTTP.

Evaluate

Eval tab revamp

Improved

Redesigned evaluation results tab with better data visualization, filtering, and export options.

Evaluate

Prompt eval updates

Fixed

Improved prompt evaluation workflow with batch execution and result comparison across prompt versions.

Read full digest
W24 Jun 9 – Jun 13, 2025

Eval Playground and Inline Evaluations

An interactive sandbox for testing evaluations in real time, inline eval scoring on traces, and broad provider support with Google ADK and custom model endpoints.

Evals Playground
Evaluate New

Evals Playground

An interactive environment to test, iterate, and validate evaluations before deploying them to production -- no dataset required.

5 new eval types
3x faster dataset loading
Evaluate

Inline evaluations for tracing

New

Evaluate individual spans directly in the trace view with on-demand scoring and result display.

Evaluate

Experiment page redesign

Improved

Rebuilt experiment view with better comparison layouts, sortable metrics, and run history navigation.

Monitor

Sentry error monitoring integration

Improved

Connect Sentry to surface application errors alongside agent traces for full-stack debugging.

Evaluate

Custom model dropdown with Azure and custom endpoints

Improved

Use Azure OpenAI, custom API endpoints, and self-hosted models as evaluation judges.

Evaluate

Image and audio support in eval log table

Improved

View image and audio outputs directly in evaluation log tables with inline rendering.

Platform

Faster dataset loading

Improved

3x performance improvement for dataset loading through pagination optimization and lazy rendering.

Evaluate

Eval template validation

Fixed

Automatic validation of evaluation templates before execution to catch configuration errors early.

Monitor

Attribute filters in Observe

Improved

Filter traces by custom attributes, metadata, and span properties in the Observe view.

Monitor

Provider logos for tracing

Fixed

Visual provider identification in trace views with logos for OpenAI, Anthropic, Google, and other LLM providers.

SDK

traceAI Google ADK support

Improved

Automatic instrumentation for Google Agent Development Kit with full span capture and tool call tracing.

SDK

traceAI new eval support in TypeScript

Improved

Expanded evaluation capabilities in the TypeScript SDK with new metric types and batch submission.

Read full digest
W22 May 26 – May 30, 2025

Breaking Bad -- A Complete UI Overhaul

A comprehensive redesign of the entire platform UI, the first TypeScript SDK, and flash-speed guardrails with Protect Flash.

Breaking Bad UI Redesign
Platform New

Breaking Bad UI Redesign

A comprehensive visual and interaction overhaul across every surface of the platform, delivering a faster, cleaner, and more consistent experience.

1st TypeScript SDK
100% UI redesigned
Platform

Breaking Bad UI redesign

New

Comprehensive redesign of every platform surface with new navigation, component library, and interaction patterns.

Evaluate

Custom evals in Observe

New

Run configurable custom evaluations directly on production traces from the Observe view.

SDK

TypeScript @traceai/fi-core v0.1.0

New

First TypeScript SDK release enabling trace instrumentation and evaluations for Node.js and Deno applications.

Guard

Protect Flash implementation

New

Ultra-fast guardrails engine that screens LLM outputs in under 50ms for real-time protection.

Platform

API-based pricing for evals and error localizer

Improved

Pay-per-use pricing for evaluation runs and error localization, replacing fixed tier limits.

Platform

Stop streaming for long-running prompts

Improved

Cancel in-progress LLM generations mid-stream to save time and tokens on runaway outputs.

Evaluate

Evaluations in prompt workbench

Improved

Run evaluations directly within the workbench to score prompt outputs without leaving the editor.

Evaluate

Feedback enhancement system

Fixed

Structured feedback collection on evaluation results to continuously improve eval accuracy.

Read full digest
W20 May 12 – May 16, 2025

Workbench V2 -- The Prompt Engineering Revolution

A ground-up rebuild of the prompt workbench with a new editor, playground layout, and prompt cards, plus major SDK releases and annotation improvements.

Workbench V2
Platform New

Workbench V2

A completely rebuilt prompt engineering environment with a new editor, playground layout, prompt cards, and inline cell editing for faster iteration cycles.

3 new SDK versions
12 prompt templates
Platform

Workbench V2 complete rebuild

New

New prompt editor, playground layout, prompt cards, and cell editing for a fundamentally better prompt engineering workflow.

Evaluate

Custom eval revamp with model dropdown

New

Redesigned custom evaluation builder with model selection dropdown for choosing which LLM judges your outputs.

Monitor

Annotations revamp with add/compare flow

Improved

Overhauled annotation interface with streamlined add and side-by-side comparison workflows.

Evaluate

Sheet UI revamp for datasets

Improved

Refined spreadsheet interface with improved cell navigation, keyboard shortcuts, and performance for large datasets.

Evaluate

Import saved prompts into datasets

Improved

Pull prompts from your saved prompt library directly into dataset rows for evaluation.

Evaluate

Column configure dropdown in compare view

Fixed

Choose which columns to display in comparison views to focus on the metrics that matter.

Evaluate

Delete dataset functionality

Fixed

Clean up old or unused datasets with a delete option and confirmation safeguard.

SDK

traceAI core v0.1.4

Improved

Audio evaluations support and prototype eval validation in the core SDK.

SDK

traceAI OpenAI v0.1.3

Improved

Audio and image generation model support for OpenAI instrumentor.

SDK

traceAI LangChain v0.1.4

Improved

Image extraction and OpenAI CUA (Computer Use Agent) support for LangChain instrumentor.

Read full digest
W18 Apr 28 – May 2, 2025

Annotations Flow and Error Localization

Find the root cause of agent failures instantly with error localization in Observe, plus a complete annotations flow for human feedback on traces.

Error Localization in Observe
Monitor New

Error Localization in Observe

Automatically pinpoint the exact step in a trace where your agent went wrong, cutting debugging time from hours to seconds.

70+ annotation filters
5x rate limit increase
Monitor

Error localization in Observe

New

Automatically identifies the root cause of failures in agent traces, highlighting the exact span where things went wrong.

Monitor

Annotations flow for trace view

New

Complete human feedback workflow for traces with multi-label annotations, reviewer assignment, and approval states.

Evaluate

Updated dataset layout with sheet view

Improved

Spreadsheet-style dataset editor with resizable columns, frozen headers, and bulk editing capabilities.

Evaluate

Diff view in experiments

Improved

Compare experiment runs side by side with highlighted differences in outputs, scores, and configurations.

Platform

Audio support across Observe and datasets

Improved

Native audio rendering in trace views and dataset tables for complete voice agent observability.

Platform

Higher rate limits

Improved

5x increase in API rate limits across all endpoints to support high-throughput production workloads.

Evaluate

Synthetic data UI improvements

Fixed

Cleaner interface for synthetic data generation with better progress indicators and batch controls.

Read full digest
W16 Apr 14 – Apr 18, 2025

Prototype V2 and Audio Evaluations

A rebuilt Prototype experience with knowledge base UI, plus first-class audio evaluations and a smoother onboarding flow for new users.

Prototype V2
Platform New

Prototype V2

A completely rebuilt prototyping experience with an integrated knowledge base UI, tutorial videos, and the ability to push results directly to datasets.

6 new eval templates
2x faster dataset loading
Platform

Prototype V2 with knowledge base UI

New

Rebuilt prototyping experience with integrated knowledge base, tutorial video, and streamlined prompt iteration workflow.

Evaluate

Audio evaluations

New

Evaluate audio agent conversations with conversational completeness metrics that measure how well agents handle spoken interactions.

Evaluate

Compare datasets with diff view

Improved

Side-by-side diff comparison between dataset versions to track how your evaluation data evolves over time.

Evaluate

Search in datasets

Improved

Full-text search across dataset rows to quickly find specific test cases, prompts, or expected outputs.

Platform

Gmail signup option

Improved

One-click signup with Gmail for faster onboarding, no password required.

Platform

First-time user walkthrough onboarding

Improved

Guided walkthrough for new users covering key platform features, eval setup, and trace inspection.

Monitor

Quick filters for annotations

Improved

Filter annotations by label, reviewer, and status directly from the annotation panel.

Evaluate

Run insight views for evals

Improved

Visual summaries of evaluation runs showing pass/fail distributions, score trends, and outlier detection.

Platform

Add to dataset from Prototype

Improved

Push prompt-response pairs directly from Prototype into your evaluation datasets with one click.

Evaluate

Audio cell renderer in datasets

Fixed

Inline audio playback in dataset tables so you can listen to audio samples without leaving the dataset view.

Read full digest